Claude Code bug can silently 10-20x API costs
TL;DR Highlight
A warning post about two cache-related bugs in Claude Code that can silently spike API costs by up to 10–20x. Users on the $200/month plan are reportedly burning through their limits far faster than expected.
Who Should Read
Developers using Claude Code (Anthropic's AI coding tool) on an API-cost basis, especially those running automated pipelines on the Max plan or via direct API integration.
Core Mechanics
- Claude Code has two cache-related bugs that prevent prompt caching (the feature that reuses previously processed tokens to reduce costs) from working correctly, potentially causing API costs to spike by up to 10–20x.
- The problem occurs silently. Users think they are doing the same work as usual, but in reality the cache is being invalidated and the full context is being reprocessed from scratch every time, causing token costs to grow exponentially.
- One commenter reported using the $200/month Max 5x plan: two days of heavy work analyzing thousands of files across 20 simultaneous sessions consumed only 50% of the limit, but a few hours of simple refactoring and bug fixes wiped out the remaining 50%.
- That same user burned through 100% of their limit in just a few small bug-fix sessions (four ~20-minute sessions, ~45 minutes total work) and had to wait two days for rollover. That volume of work should normally account for only a few percent of the limit.
- A separate comment mentioned a case where Claude Opus 4 hallucinated a non-existent API and kept looping to make tests pass, consuming roughly $12 in 30 minutes—primarily attributed to thinking tokens.
- A similar looping issue was also reported with Gemini, illustrating that runaway cost explosions from infinite loops are a structural risk across AI coding tools in general.
- The community raised questions about whether these charges are for 'unverifiable work,' pointing out that users currently have no practical way to independently audit cache hit status or actual token consumption.
Evidence
- "A Max 5x plan ($200/month) user shared concrete numbers: an extreme imbalance between 2 days of heavy work processing thousands of files across 20 simultaneous sessions (50% consumed) versus a few hours of light refactoring (remaining 50% consumed). The user strongly criticized the situation, saying 'I'm not sure if this is a bug or a silent limit reduction, but this is unacceptable for $200/month.' The Opus 4 hallucination + loop incident is also noteworthy—a model fabricated a non-existent API and looped trying to make tests pass, burning $12 in 30 minutes, with thinking tokens suspected as the primary cause. Cynical comments like 'This is a feature, not a bug' and 'Some PM just hit their 1000% revenue growth KPI' suggest the community views this not just as a bug but as a business incentive problem. It was also pointed out that users currently have no way to independently verify whether cache hits occurred or how many tokens were actually consumed—essentially requiring reverse engineering to validate charges. A similar looping issue reported with Gemini further demonstrates that this is a structural risk across AI coding tools broadly, particularly for agent-based tasks that run autonomously, where infinite loops directly translate to cost explosions."
How to Apply
- "If you use Claude Code for automated pipelines or long-running agent tasks, always check the usage dashboard in the Anthropic console before and after your work to monitor for abnormal token consumption. A sudden large spike in usage after a short task is a signal to suspect the cache bug. For tasks where an agent might loop—such as automated test fixes or iterative code generation and validation—always set a maximum iteration count or total cost cap. Since Claude Code's loop detection and auto-stop functionality is currently incomplete, manually monitor sessions or split work into short session segments for safety. Exercise extra caution when using Opus 4-series models with thinking tokens enabled. Thinking tokens are far more expensive than regular tokens, and if a hallucination triggers a loop, costs can grow exponentially. For cost-sensitive tasks, disable the thinking feature or test first with a cheaper model (Haiku or Sonnet series). Until the cache bugs are fixed, splitting work into short independent sessions is preferable to long sessions that reuse the same context. The longer a session runs, the greater the risk of cost explosion from cache invalidation."
Terminology
프롬프트 캐싱(Prompt Caching)A feature that reuses previously processed long prompts (system prompts, prior conversation history, etc.) instead of reprocessing them. When working correctly, it can significantly reduce costs for repetitive tasks—the core issue here is that this bug caused the feature to fail.
Thinking TokenTokens that represent Claude's internal reasoning process output before producing a final answer. While designed to improve accuracy, they are far more expensive than regular tokens and can cause costs to explode when combined with loops.
HallucinationA phenomenon where an AI model confidently outputs information that is not factually accurate. In this case, the model fabricated a non-existent API and repeatedly attempted to use it.
Max PlanOne of Anthropic's Claude subscription plans that provides a fixed monthly usage allowance rather than pay-as-you-go API billing. 'Max 5x' refers to the plan offering 5x the base usage limit ($200/month).
컨텍스트(Context)The entire range of text the AI can reference in the current conversation. As conversations grow longer, the context grows larger and more expensive to process. Without proper caching, the full context must be reprocessed from scratch on every request.
에이전트(Agent)A mode in which AI autonomously plans and executes multiple steps of a task without the user giving instructions one at a time. Convenient, but if it enters a loop or goes in the wrong direction, costs keep accumulating until a human intervenes.