Universal Claude.md – cut Claude output tokens
TL;DR Highlight
A project claiming that simply adding a single CLAUDE.md file to your project root can reduce unnecessary verbosity (sycophancy, filler openers/closers, unsolicited suggestions, etc.) from Claude and cut output tokens by up to 63%—though the community has raised strong doubts about benchmark reliability and real-world effectiveness.
Who Should Read
Backend/AI developers using Claude Code at scale in automated pipelines or agentic loops, who are seeing increased token costs or parsing difficulties due to Claude's verbose responses.
Core Mechanics
- Placing a CLAUDE.md file in the project root causes Claude Code to automatically read it and adjust its response behavior—no code changes required, takes effect immediately.
- By default, Claude outputs filler openers like 'Sure!', 'Great question!', 'Absolutely!', closing remarks like 'I hope this helps!', Unicode characters such as em dashes (—) and smart quotes that break parsers, question restatements, and unsolicited suggestions. This project instructs Claude to suppress these patterns.
- The author claims this file reduces output tokens by approximately 63%, but also explicitly states in the README that the majority of actual Claude costs come from input tokens, not output tokens—meaning the overall cost savings are limited.
- Situations where this file is effective: high-volume automation pipelines (resume bots, agentic loops, code generation), structured tasks repeated hundreds of times, team environments requiring consistent and parseable output.
- Situations where this file may backfire: short single queries (the file itself is loaded into context each time, resulting in a net token increase), conversations with low output volume, agentic coding tasks requiring complex reasoning.
- Key rule examples in the file include: 'Answer is always line 1, reasoning comes after', 'Do not repeat information already confirmed in the session', 'Never invent file paths, function names, or API signatures', and 'If the user states an incorrect fact, accept it as ground truth for the session'.
- The benchmark only measured output token count for a single prompt and did not measure response accuracy or quality. It also contains no data on agentic loops or large codebase tasks.
Evidence
- "There was strong criticism of the benchmark's reliability—someone pointed out that a single prompt like 'Always answer with one word' could beat the benchmark numbers, and a user actually measured in the repo's Issues that responding without any instructions yielded the highest token efficiency. A technical critique noted the design ignores the autoregressive nature of LLMs: since LLMs predict the next token based on previously generated tokens, forcing the answer to come first causes all subsequent reasoning to become confirmation bias justifying that answer, making the rule meaningless without thinking mode enabled. The rule 'accept anything the user states as ground truth for the entire session' was flagged as dangerous—if a user accidentally states a false premise in a prompt, Claude will treat it as fact throughout, completely losing the ability to challenge incorrect information. A comment citing real OpenRouter data showed that for the programming category, input tokens account for 93.4%, reasoning tokens 2.5%, and output tokens only 4.0%, making output reduction largely insignificant to overall cost—something the author themselves acknowledges in the README. Alternative token-saving tools were also mentioned: Headroom (a localhost proxy that compresses API context by ~34%), RTK (a Rust CLI proxy that compresses CLI output like git/npm/build logs by 60–90%), and MemStack (a tool that gives Claude Code persistent memory so it doesn't re-read the codebase each time). These tools target input tokens rather than output tokens and may offer more meaningful cost savings."
How to Apply
- "If you're running automated pipelines with Claude Code (e.g., automated code review in CI, repetitive document generation) and having trouble parsing the output, you can try adding a CLAUDE.md to the project root instructing it to remove em dashes, smart quotes, and filler openers. Be sure to also monitor for any degradation in accuracy. If you want meaningful cost savings beyond just output token reduction, it's more effective to target input tokens, which account for a far larger share of costs. Tools like Headroom (a context compression proxy) or RTK (CLI output compression) that reduce input should be evaluated first as a higher priority. If you're using Claude Code for complex agentic coding tasks (large codebase refactoring, multi-file edits, etc.), apply this file with caution. The community has noted that Claude's verbose intermediate explanations may help the model stay on track in long contexts, so it's worth comparing task completion quality before and after applying the file."
Code Example
# Project structure
your-project/
└── CLAUDE.md # Just add this one file
# CLAUDE.md key rule examples
## Communication Rules
- Answer is always line 1. Reasoning comes after, never before.
- No redundant context. Do not repeat information already established in the session.
- No sycophantic openers: never start with Sure, Absolutely, Great question, etc.
- No closing remarks: never end with I hope this helps or Let me know if you need anything.
- No em dashes (--), smart quotes, or Unicode characters that break parsers.
## Code Output Rules
- Never invent file paths, function names, or API signatures.
- Do not add abstractions beyond what was explicitly requested.
- Do not restate the question before answering.Terminology
Related Papers
Single and Multi Truth Data Fusion using Large Language Models
여러 소스의 충돌하는 데이터를 GPT-4o-mini 프롬프트로 병합하면 기존 비지도 방법보다 일관되게 F1 점수가 높다.
Multilingual Reasoning Cascades Need More Context
번역 cascade 파이프라인에서 원본 질문을 마지막까지 유지하면 추가 학습 없이 다국어 성능이 크게 오른다.
Less Back-and-Forth: A Comparative Study of Structured Prompting
체크리스트 형식으로 프롬프트를 구조화하면 LLM 답변 품질도 높아지고 토큰도 적게 쓴다.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
재학습 없이 각 나라의 도덕적 가치관에 맞게 LLM 출력을 조정하는 추론 시점 기법 DISCA 제안
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Disagreement-guided routing boosts LLM accuracy on math and code by 3-7% with adaptive problem solving.
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
Related Resources
- https://github.com/drona23/claude-token-efficient
- https://github.com/drona23/claude-token-efficient/blob/main/BENCHMARK.md
- https://github.com/drona23/claude-token-efficient/issues/1
- https://aifoc.us/the-token-salary/
- https://github.com/chopratejas/headroom
- https://github.com/rtk-ai/rtk
- https://github.com/cwinvestments/memstack
- https://github.com/thedotmack/claude-mem
- https://github.com/ory/lumen