Universal Claude.md – cut Claude output tokens
TL;DR Highlight
A project claiming that simply adding a single CLAUDE.md file to your project root can reduce unnecessary verbosity (sycophancy, filler openers/closers, unsolicited suggestions, etc.) from Claude and cut output tokens by up to 63%—though the community has raised strong doubts about benchmark reliability and real-world effectiveness.
Who Should Read
Backend/AI developers using Claude Code at scale in automated pipelines or agentic loops, who are seeing increased token costs or parsing difficulties due to Claude's verbose responses.
Core Mechanics
- Placing a CLAUDE.md file in the project root causes Claude Code to automatically read it and adjust its response behavior—no code changes required, takes effect immediately.
- By default, Claude outputs filler openers like 'Sure!', 'Great question!', 'Absolutely!', closing remarks like 'I hope this helps!', Unicode characters such as em dashes (—) and smart quotes that break parsers, question restatements, and unsolicited suggestions. This project instructs Claude to suppress these patterns.
- The author claims this file reduces output tokens by approximately 63%, but also explicitly states in the README that the majority of actual Claude costs come from input tokens, not output tokens—meaning the overall cost savings are limited.
- Situations where this file is effective: high-volume automation pipelines (resume bots, agentic loops, code generation), structured tasks repeated hundreds of times, team environments requiring consistent and parseable output.
- Situations where this file may backfire: short single queries (the file itself is loaded into context each time, resulting in a net token increase), conversations with low output volume, agentic coding tasks requiring complex reasoning.
- Key rule examples in the file include: 'Answer is always line 1, reasoning comes after', 'Do not repeat information already confirmed in the session', 'Never invent file paths, function names, or API signatures', and 'If the user states an incorrect fact, accept it as ground truth for the session'.
- The benchmark only measured output token count for a single prompt and did not measure response accuracy or quality. It also contains no data on agentic loops or large codebase tasks.
Evidence
- "There was strong criticism of the benchmark's reliability—someone pointed out that a single prompt like 'Always answer with one word' could beat the benchmark numbers, and a user actually measured in the repo's Issues that responding without any instructions yielded the highest token efficiency. A technical critique noted the design ignores the autoregressive nature of LLMs: since LLMs predict the next token based on previously generated tokens, forcing the answer to come first causes all subsequent reasoning to become confirmation bias justifying that answer, making the rule meaningless without thinking mode enabled. The rule 'accept anything the user states as ground truth for the entire session' was flagged as dangerous—if a user accidentally states a false premise in a prompt, Claude will treat it as fact throughout, completely losing the ability to challenge incorrect information. A comment citing real OpenRouter data showed that for the programming category, input tokens account for 93.4%, reasoning tokens 2.5%, and output tokens only 4.0%, making output reduction largely insignificant to overall cost—something the author themselves acknowledges in the README. Alternative token-saving tools were also mentioned: Headroom (a localhost proxy that compresses API context by ~34%), RTK (a Rust CLI proxy that compresses CLI output like git/npm/build logs by 60–90%), and MemStack (a tool that gives Claude Code persistent memory so it doesn't re-read the codebase each time). These tools target input tokens rather than output tokens and may offer more meaningful cost savings."
How to Apply
- "If you're running automated pipelines with Claude Code (e.g., automated code review in CI, repetitive document generation) and having trouble parsing the output, you can try adding a CLAUDE.md to the project root instructing it to remove em dashes, smart quotes, and filler openers. Be sure to also monitor for any degradation in accuracy. If you want meaningful cost savings beyond just output token reduction, it's more effective to target input tokens, which account for a far larger share of costs. Tools like Headroom (a context compression proxy) or RTK (CLI output compression) that reduce input should be evaluated first as a higher priority. If you're using Claude Code for complex agentic coding tasks (large codebase refactoring, multi-file edits, etc.), apply this file with caution. The community has noted that Claude's verbose intermediate explanations may help the model stay on track in long contexts, so it's worth comparing task completion quality before and after applying the file."
Code Example
snippet
# Project structure
your-project/
└── CLAUDE.md # Just add this one file
# CLAUDE.md key rule examples
## Communication Rules
- Answer is always line 1. Reasoning comes after, never before.
- No redundant context. Do not repeat information already established in the session.
- No sycophantic openers: never start with Sure, Absolutely, Great question, etc.
- No closing remarks: never end with I hope this helps or Let me know if you need anything.
- No em dashes (--), smart quotes, or Unicode characters that break parsers.
## Code Output Rules
- Never invent file paths, function names, or API signatures.
- Do not add abstractions beyond what was explicitly requested.
- Do not restate the question before answering.Terminology
sycophancyThe tendency of an LLM to agree with incorrect statements or respond with excessive praise in order to please the user. For example, responding with 'That's a great approach!' even when the user submits buggy code.
autoregressiveThe way LLMs generate text by predicting the next word (token) based on all previously generated words. Because each output influences subsequent outputs, locking in an answer first causes all following reasoning to be subordinate to that answer.
agentic loopA repetitive cycle in which an LLM uses tools, evaluates results, and decides on next actions without human intervention. A typical example is Claude Code reading files, modifying them, and running tests on its own.
context windowThe maximum amount of text an LLM can reference in a single pass. Since the CLAUDE.md file is loaded into this context every time, it can actually increase token usage in short conversations.
out of distributionWhen a model receives input that follows patterns it has not seen during training. Instructing a model to deviate too far from its default behavior can cause it to act in unexpected ways.
ground truthThe reference data treated as the 'correct answer' in training or evaluation. In this file's rules, it means treating whatever the user has specified as the absolute reference standard for the session.
Related Resources
- https://github.com/drona23/claude-token-efficient
- https://github.com/drona23/claude-token-efficient/blob/main/BENCHMARK.md
- https://github.com/drona23/claude-token-efficient/issues/1
- https://aifoc.us/the-token-salary/
- https://github.com/chopratejas/headroom
- https://github.com/rtk-ai/rtk
- https://github.com/cwinvestments/memstack
- https://github.com/thedotmack/claude-mem
- https://github.com/ory/lumen