xavier collantes

Measuring Tokens in LLMs

By Xavier Collantes

What are Tokens in LLMs?

Tokens are the fundamental units that large language models (LLMs) process text with. When you input text to an LLM, it first breaks down your text into "tokens". These "tokens" are not exactly words but sometimes can be:

Full words ("hello")
Parts of words ("un" + "likely")
Characters ("a", "!")
Spaces and punctuation

For English text, a rough estimate is 1 token equals about 4 characters or 3/4 of a word. This means a typical page of text (about 500 words) would be approximately 650-700 tokens.

Why Tokens Matter

Cost calculation: Most API-based LLM services charge based on token usage.
Context window limits: Every model has a maximum number of tokens it can process at once (its "context window").
Performance impact: More tokens generally mean more processing time and higher computational costs in terms of memory and time.

The Ambiguity Problem of Tokens

One of the most confusing aspects of working with different LLMs is that tokens are not standardized across models. Different LLMs have different tokenization algorithms, which means the same text can be split into different numbers of tokens depending on which model processes it.

Different tokenization algorithms: GPT models use tiktoken, Claude uses its own tokenizer, Llama uses SentencePiece, etc.
i18n differences: Some models tokenize certain languages more efficiently than others.
Special tokens: Models handle special tokens (like those for code, formatting, or system instructions) differently.

For example, the phrase "I love machine learning" might be:

4 tokens in GPT-4
5 tokens in Claude
6 tokens in a different model

This inconsistency creates practical challenges:

Cost comparisons become difficult
Context window utilization varies by model
Performance benchmarks can be misleading if not accounting for tokenization differences

Measuring Tokens in Practice

Tokens are important for choosing the best LLM for your use case. There are tools that can help you measure tokens for your specific model.

GPT for Work: Tokenizer: Get stats for your tokens.

Claude Tokenizer: For Claude models.

OpenAI Tokenizer: For OpenAI models.

SentencePiece: For Llama models.

Using tiktoken to Count Tokens

tiktoken: For OpenAI as a Python package.

Example of how to count tokens for different OpenAI models:

🐍

Python3

1import tiktoken
2
3def count_tokens(text, model="gpt-4"):
4    """Count the number of tokens in a text string."""
5    encoder = tiktoken.encoding_for_model(model)
6    tokens = encoder.encode(text)
7    return len(tokens)
8
9# Example usage.
10sample_text = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
11
12print(f"GPT-4: {count_tokens(sample_text, 'gpt-4')} tokens")
13print(f"GPT-3.5: {count_tokens(sample_text, 'gpt-3.5-turbo')} tokens")
14print(f"davinci: {count_tokens(sample_text, 'text-davinci-003')} tokens")
15

snippet hosted withby Xavier

Practical Tips for Working with Tokens

Always measure before sending: Count tokens before sending requests to avoid errors or unexpected costs.
Be aware of hidden tokens: System prompts, formatting, and special characters all count toward your token limits.
Consider token efficiency: Some prompts can be rewritten to use fewer tokens while conveying the same information.
Different models, different strategies: Adapt your prompt strategy based on the specific tokenization of your chosen model.
Monitor token usage: Keep track of token consumption to optimize costs and performance. These are usually available in the API response or the provider's dashboard.

xavier collantes

Measuring Tokens in LLMs

What are Tokens in LLMs?

Why Tokens Matter

The Ambiguity Problem of Tokens

Measuring Tokens in Practice

Using tiktoken to Count Tokens

Practical Tips for Working with Tokens

Further Reading

Related Articles

Dynamic Routing in LangGraph

Testing LLMs

Promptfoo: LLM Evaluation Tool

Related Articles

Dynamic Routing in LangGraph
Build decision-making AI with conditional paths.
By Xavier Collantes11/9/2025
ai
langchain
langgraph
+2
Dynamic Routing in LangGraph
Build decision-making AI with conditional paths.
By Xavier Collantes11/9/2025
ai
langchain
langgraph
+2

Testing LLMs
Approaches to LLM Evals.
By Xavier Collantes10/7/2025
ai
llm
testing
+1
Testing LLMs
Approaches to LLM Evals.
By Xavier Collantes10/7/2025
ai
llm
testing
+1

Promptfoo: LLM Evaluation Tool
CLI tool for evaluating and comparing LLM prompts.
By Xavier Collantes10/1/2025
ai
llm
testing
+3
Promptfoo: LLM Evaluation Tool
CLI tool for evaluating and comparing LLM prompts.
By Xavier Collantes10/1/2025
ai
llm
testing
+3