Claude Token Counter, now with model comparisons
TL;DR Highlight
Anthropic’s Claude Opus 4.7 consumes up to 46% more tokens than its predecessor on the same input due to a tokenizer change, effectively raising costs.
Who Should Read
Developers operating services with the Claude API, particularly backend/AI developers considering or already using Opus 4.7 and needing precise cost impact analysis.
Core Mechanics
- Simon Willison’s Claude Token Counter now compares token counts across models, simultaneously supporting Opus 4.7, Opus 4.6, Sonnet 4.6, and Haiku 4.5.
- Claude Opus 4.7 marks Anthropic’s first model to undergo a tokenizer change, potentially converting the same input into 1.0 to 1.35 times more tokens.
- Testing with a system prompt revealed Opus 4.7 generated 1.46 times more tokens than Opus 4.6, exceeding Anthropic’s stated range of 1.35x.
- Despite maintaining the same pricing ($5 per million input tokens, $25 per million output tokens as Opus 4.6), the increased token count results in a real cost increase of over 40%.
- Testing with a high-resolution image (3456x2234 pixels, 3.7MB PNG) showed Opus 4.7 generating 3.01 times more tokens than Opus 4.6, due to enhanced Vision capabilities supporting images up to 2,576 pixels.
- Conversely, smaller images (682x318) showed negligible token differences between Opus 4.7 (314 tokens) and 4.6 (310 tokens), indicating the increase stems from high-resolution support, not the tokenizer itself.
- A 15MB, 30-page text-centric PDF resulted in Opus 4.7 generating 60,934 tokens versus 56,482 for 4.6, a 1.08x difference—a smaller increase than observed with images.
- The token counting API requires a Claude API key and allows pre-checking expected token counts for each model by specifying the model ID.
Evidence
- "Critics labeled the tokenizer change a ‘money grab,’ citing Anthropic’s lack of transparency regarding the reasons or methodology behind the alteration. Technical counterarguments suggest the change could be an intentional design for performance improvements, potentially improving inference quality by breaking down text into more meaningful units. Speculation also arose about replacing the tokenizer with a smaller learning model, similar to Byte Latent Transformer. Data from tokens.billchambers.me/leaderboard shows large-scale comparisons between 4.6 and 4.7, with one user reporting a 40% increase in tokens for their prompts. Practical experience reveals that token costs escalate in agent systems due to re-transmitting the entire context (including previous tool call results) upon timeouts, potentially consuming three times the tokens for a failed API call. Developers are responding by maintaining the default model in Claude CLI as 4.6 and using the `--model claude-opus-4-7` flag only when necessary, and by downsampling high-resolution images before upload."
How to Apply
- "If considering migrating to Opus 4.7, pre-measure the token cost increase for your existing system prompts and representative inputs using Simon Willison’s Claude Token Counter (https://tools.simonwillison.net/claude-token-counter). If upgrading image processing pipelines to Opus 4.7, pre-resize images to 682x318 if high resolution isn’t essential to maintain token costs comparable to Opus 4.6. When using Claude CLI or API, separate models based on task complexity to manage costs, using Sonnet 4.6 or Haiku 4.5 as defaults and specifying `--model claude-opus-4-7` only for complex tasks. For agent systems, monitor tokens at both the token and action levels; track whether side effects actually executed to reduce unnecessary re-attempts and minimize token waste."
Terminology
tokenizerA tool LLMs use to break down text into numerical chunks (tokens) before processing. The number of tokens varies depending on how the text is split, and API costs are based on this token count.
BPEAbbreviation for Byte Pair Encoding, a common algorithm used in most LLM tokenizers that merges frequently occurring character combinations into single tokens.
token inflationThe phenomenon where the same text is split into more tokens due to a tokenizer change, resulting in higher costs despite unchanged pricing.
Byte Latent TransformerAn experimental architecture that processes text at the byte level without a traditional tokenizer, conceptually replacing the tokenizer with a small learning model.
context windowThe maximum token range an LLM can process at once. In agent systems, repeated failures can exponentially increase costs as previous results accumulate within this window.