This new technique saves 60% of my token expenses

TL;DR Highlight

You can reduce LLM response tokens by 60% by using a telegraphic style that only keeps nouns and verbs, excluding articles, conjunctions, and auxiliary verbs.

Who Should Read

Backend developers who are concerned about API costs and token optimization. Especially those using GPT-4 level models for simple tasks such as summarization, classification, and data extraction.

Core Mechanics

When a typical response is hundreds of tokens, forcing a 'caveman' style compresses it to around 40 tokens. It's possible to convey the same meaning with significantly fewer tokens.
Key prompt pattern: 'Drop articles, conjunctions, filler words, copulas. Keep nouns, verbs, key modifiers only.' — Explicitly instruct to remove articles (a, the), conjunctions (and, but), and unnecessary verbs (is, are).
This approach is similar to the structure of American Sign Language (ASL) or telegrams. It's a strategy to increase meaning density and remove padding words.
However, this technique is only valid for pipelines where 'readable responses' are not required. It's not suitable for responses exposed to end-users.
It's also pointed out that 80% of prompts can be handled without expensive models (GPT-4, Claude Opus). Model downgrading (routing) may be a more fundamental cost reduction than compression style.
Synergy can be achieved by combining a routing strategy to smaller models (GPT-4o mini, Haiku, etc.) with a compression style.

Evidence

"Reported a 60% reduction in token count compared to normal responses. Presented a case where a hundreds-of-tokens response was compressed to around 40 tokens. Since costs are calculated based on the sum of input and output tokens, reducing output tokens by 60% proportionally reduces API costs. The effect is greater when the output proportion is large."

How to Apply

If the response is not directly read by humans in internal pipelines (classification, extraction, summarization, etc.), add a telegraphic style instruction to the system prompt. Example: 'Respond in compressed telegraphic style. Drop articles, conjunctions, filler words, copulas. Keep nouns, verbs, key modifiers only.'
Create a router that first determines the complexity of the task, sending simple classification/summarization to GPT-4o mini or Claude Haiku, and sending only complex reasoning to expensive models. Adding a compression style on top of this can provide double savings.
If response parsing is required, use JSON mode or structured output along with the telegraphic style to structure the response, reducing tokens without parsing errors.

Code Example

snippet

system_prompt = """
Respond in compressed telegraphic style.
Drop articles, conjunctions, filler words, copulas.
Keep nouns, verbs, key modifiers only.
Meaning density over readability.
Write like a telegram costs per word.
"""

# Example input
user_message = "What are the main causes of climate change?"

# Normal response example (~80 tokens)
# "Climate change is primarily caused by the burning of fossil fuels, which releases greenhouse gases..."

# Telegraphic response example (~20 tokens)
# "Fossil fuel burning → CO2 rise → heat trap. Also: deforestation, agriculture, industry emissions."

Terminology

토큰(Token)The smallest unit of text processed by an LLM. Approximately 0.75 English words. API costs are billed based on this token count.

전보체(Telegraphic style)A compressed writing style that only retains essential words, like a telegram. A writing style from the past when telegrams were charged per character.

라우팅(Routing)A strategy for determining the complexity of a request and sending it to either an expensive or inexpensive model. Like a parcel sorting, it divides tasks to appropriate handlers.

CopulaEnglish 'is', 'are', 'was' like connecting verbs. Often not essential for conveying meaning and the first priority for removal during compression.

ASL(American Sign Language)American Sign Language. A structure that conveys concepts without grammatical padding, similar to the telegraphic style in that it conveys a lot of meaning with few signals.

출력 토큰(Output token)Tokens generated by the LLM as a response. Often more expensive than input, so reducing the response length can significantly reduce costs.