Saying 'hey' cost me 22% of my usage limits
TL;DR Highlight
A post sharing the experience that sending a short greeting like 'hey' to Claude first can consume a significant portion of your total usage limit, raising awareness about prompt-writing habits for token conservation.
Who Should Read
Developers and general users who want to use the Claude API or Claude.ai efficiently within usage limits — especially relevant for those on free or restricted plans.
Core Mechanics
- A case was reported where simply sending a short greeting like 'hey', 'hi', or 'hello' to Claude consumed 22% of the total usage limit.
- Claude tends to generate fairly lengthy responses even to simple greetings, meaning tokens can be heavily consumed without any actual question or task.
- Usage limits are calculated based on tokens (input + output), not just message count — so even a short greeting that triggers a long response can rapidly drain your quota.
- Users on Claude.ai's free or limited plans are much better off skipping unnecessary warm-up messages and getting straight to the point to manage their usage effectively.
Evidence
- "The post author reported directly experiencing 22% of their total usage limit being consumed after sending a single-word message — 'hey'. Due to restricted access to the original post, additional community reactions or reproduced cases in the comments could not be verified."
How to Apply
- "Build the habit of jumping straight into your question or task when messaging Claude, without greetings or filler messages like 'hey', 'hi', or 'hold on'. For example, instead of 'Hi, can you review the following code?', start with 'Please review the following code.' If using the Claude API, you can also specify in the system prompt 'Reply directly without unnecessary preambles or greetings' to reduce response token count. Periodically check how much of your usage limit remains, and open a new conversation window before starting a long session to prevent token waste from accumulated context."
Terminology
Related Papers
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Disagreement-guided routing boosts LLM accuracy on math and code by 3-7% with adaptive problem solving.
Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application
Five failure modes and eight practical solutions emerged after five days of running on-device SLMs (Gemma 4 E2B, Qwen3 0.6B) with Wordle.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.