xavier collantes

Measuring Tokens in LLMs

By Xavier Collantes


What are Tokens in LLMs?

Tokens visualized
Tokens are the fundamental units that large language models (LLMs) process text with. When you input text to an LLM, it first breaks down your text into "tokens". These "tokens" are not exactly words but sometimes can be:
  • Full words ("hello")
  • Parts of words ("un" + "likely")
  • Characters ("a", "!")
  • Spaces and punctuation
For English text, a rough estimate is 1 token equals about 4 characters or 3/4 of a word. This means a typical page of text (about 500 words) would be approximately 650-700 tokens.

Why Tokens Matter

Token limits
  • Cost calculation: Most API-based LLM services charge based on token usage.
  • Context window limits: Every model has a maximum number of tokens it can process at once (its "context window").
  • Performance impact: More tokens generally mean more processing time and higher computational costs in terms of memory and time.

The Ambiguity Problem of Tokens

Token limits
One of the most confusing aspects of working with different LLMs is that tokens are not standardized across models. Different LLMs have different tokenization algorithms, which means the same text can be split into different numbers of tokens depending on which model processes it.
  • Different tokenization algorithms: GPT models use tiktoken, Claude uses its own tokenizer, Llama uses SentencePiece, etc.
  • i18n differences: Some models tokenize certain languages more efficiently than others.
  • Special tokens: Models handle special tokens (like those for code, formatting, or system instructions) differently.
For example, the phrase "I love machine learning" might be:
  • 4 tokens in GPT-4
  • 5 tokens in Claude
  • 6 tokens in a different model
This inconsistency creates practical challenges:
  • Cost comparisons become difficult
  • Context window utilization varies by model
  • Performance benchmarks can be misleading if not accounting for tokenization differences

Measuring Tokens in Practice

Tokens are important for choosing the best LLM for your use case. There are tools that can help you measure tokens for your specific model.
GPT for Work: Tokenizer: Get stats for your tokens.
Claude Tokenizer: For Claude models.
OpenAI Tokenizer: For OpenAI models.
SentencePiece: For Llama models.

Using tiktoken to Count Tokens

tiktoken: For OpenAI as a Python package.
Example of how to count tokens for different OpenAI models:
🐍
Python3
1import tiktoken
2
3def count_tokens(text, model="gpt-4"):
4    """Count the number of tokens in a text string."""
5    encoder = tiktoken.encoding_for_model(model)
6    tokens = encoder.encode(text)
7    return len(tokens)
8
9# Example usage.
10sample_text = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
11
12print(f"GPT-4: {count_tokens(sample_text, 'gpt-4')} tokens")
13print(f"GPT-3.5: {count_tokens(sample_text, 'gpt-3.5-turbo')} tokens")
14print(f"davinci: {count_tokens(sample_text, 'text-davinci-003')} tokens")
15
snippet hosted withby Xavier

Practical Tips for Working with Tokens

  • Always measure before sending: Count tokens before sending requests to avoid errors or unexpected costs.
  • Be aware of hidden tokens: System prompts, formatting, and special characters all count toward your token limits.
  • Consider token efficiency: Some prompts can be rewritten to use fewer tokens while conveying the same information.
  • Different models, different strategies: Adapt your prompt strategy based on the specific tokenization of your chosen model.
  • Monitor token usage: Keep track of token consumption to optimize costs and performance. These are usually available in the API response or the provider's dashboard.

Further Reading

Token prediction meme

Related Articles

Related by topics:

llm
Qdrant vs AWS S3 Vector Store

Comparing the new AWS S3 Vector Store to Qdrant.

By Xavier Collantes8/15/2025
ai
llm
ml
+8

HomeFeedback