Sketch-of-Thought: 적응형 인지 스케치로 효율적인 LLM 추론

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Mar 7, 2025•Simon A. Aytes, Jinheon Baek, Sung Ju Hwang•View PDF

TL;DR Highlight

시스템 프롬프트만 바꿔서 CoT 대비 토큰 84% 줄이면서 정확도는 거의 유지하는 프롬프팅 기법

Who Should Read

LLM API 비용이나 응답 레이턴시를 줄이고 싶은 AI/백엔드 개발자. 특히 GPT-4o나 Claude에 CoT 프롬프팅을 쓰면서 토큰 비용이 부담되는 분들.

Core Mechanics

추론 방식을 3가지 패러다임으로 나눔: 개념 연결(Conceptual Chaining), 기호 압축(Chunked Symbolism), 전문가 약어(Expert Lexicons) — 각각 상식/수학/의학 류 문제에 특화
DistilBERT 기반 67M짜리 경량 라우터가 질문을 보고 어떤 패러다임을 쓸지 자동 선택 (96.4% 정확도, 추론 속도 0.012초)
18개 데이터셋 전체 평균 74% 토큰 절감, 정확도 손실은 평균 0.83%로 통계적으로 유의미하지 않음
수학 문제에선 오히려 정확도가 올라감 — Qwen-2.5-32B 기준 CoT 84.17% → SoT 86.94%, 토큰은 222 → 88로 감소
Self-Consistency, Self-Refine, Multi-Agent Debate 같은 앙상블 파이프라인에서 CoT를 SoT로 교체해도 성능이 유지되거나 향상됨
한국어/이탈리아어/독일어 다국어 실험에서도 80~84% 토큰 절감, 정확도 최대 1.33% 하락

Evidence

GPT-4o: CoT 84.64% vs SoT 84.55% — 정확도 0.09% 차이로 토큰 76.2% 절감
Claude Sonnet 3.5: CoT 85.01% vs SoT 84.50% — 정확도 0.51% 차이로 토큰 68.99% 절감
Multi-Agent Debate에서 SoT가 CoT 대비 정확도 0.57% 향상, 토큰 68.9% 절감
한국어 MMMLU: CoT 74.20% vs SoT 73.40% — 정확도 0.80% 하락, 토큰은 308 → 49로 84.09% 절감

How to Apply

질문 유형을 보고 패러다임을 골라서 해당 시스템 프롬프트로 교체: 수식/계산 문제면 Chunked Symbolism, 상식/멀티홉이면 Conceptual Chaining, 의학/법률/공학이면 Expert Lexicons
HuggingFace의 공개 라우터 모델(saytes/SoT_DistilBERT)을 붙이면 패러다임 선택을 자동화할 수 있음 — 분류 호출 한 번으로 어떤 프롬프트를 쓸지 결정
기존에 CoT 기반으로 짠 Self-Consistency나 멀티에이전트 파이프라인에서 시스템 프롬프트만 SoT용으로 교체하면 바로 적용 가능

Code Example

snippet

# Conceptual Chaining 패러다임 시스템 프롬프트 (상식/멀티홉 추론용)
SYSTEM_PROMPT_CC = """
You are a reasoning expert specializing in structured concept linking.
Extract key terms and present reasoning as stepwise chains using arrows (→).
Do NOT use full sentences. Keep each step minimal.

Format:
<think>
#concept_A → #concept_B → answer
</think>
\\boxed{Final Answer}
"""

# Chunked Symbolism 패러다임 시스템 프롬프트 (수학/계산 문제용)
SYSTEM_PROMPT_CS = """
You are a reasoning expert using Chunked Symbolism.
Define variables, write equations, solve step-by-step with minimal text.

Format:
<think>
var1 = value, var2 = value
result = var1 op var2 = number
</think>
\\boxed{Final Answer}
"""

# Expert Lexicons 패러다임 시스템 프롬프트 (의학/공학/법률 전문 분야용)
SYSTEM_PROMPT_EL = """
You are a reasoning expert using Expert Lexicons.
Use domain-specific abbreviations, symbols, and jargon to compress reasoning.
No full sentences. Maximize information density.

Format:
<think>
TERM → definition, ACRONYM ∈ {components}, ∴ conclusion
</think>
\\boxed{Final Answer}
"""

# 라우터 모델로 자동 패러다임 선택 (HuggingFace 공개 모델 사용)
from transformers import pipeline

router = pipeline("text-classification", model="saytes/SoT_DistilBERT")

def get_sot_prompt(question: str) -> str:
    result = router(question)[0]["label"]
    mapping = {
        "conceptual_chaining": SYSTEM_PROMPT_CC,
        "chunked_symbolism": SYSTEM_PROMPT_CS,
        "expert_lexicons": SYSTEM_PROMPT_EL,
    }
    return mapping.get(result, SYSTEM_PROMPT_CC)

# 사용 예시
question = "A car accelerates at 2.5 m/s² for 10 seconds from 15 m/s. Final velocity?"
system_prompt = get_sot_prompt(question)
print(f"선택된 패러다임 프롬프트: {system_prompt[:50]}...")

Terminology

CoTChain-of-Thought의 약자. '단계별로 생각해봐'라고 프롬프트에 넣으면 LLM이 풀이 과정을 길게 쓰게 되는 기법. 정확도는 올라가지만 토큰을 많이 씀.

DistilBERTBERT를 40% 가볍게 만든 소형 언어 모델. 분류 같은 단순 NLP 작업에 빠르고 저렴하게 쓸 수 있음. 라우터처럼 경량 판단이 필요한 곳에 자주 씀.

패러다임 라우터질문을 보고 어떤 추론 방식을 쓸지 자동으로 결정해주는 분류기. 공항 수하물 컨베이어처럼 짐(질문)을 맞는 레인(패러다임)으로 보내는 역할.

Conceptual Chaining개념들을 화살표(→)로 연결해서 최소한의 단어로 추론을 표현하는 방식. '서울 → 한국 → 원화'처럼 생각의 흐름을 단어 체인으로 압축.

Chunked Symbolism수학/물리 문제를 변수와 수식으로 바꿔 압축 표현하는 방식. 'v = 15, a = 2.5, t = 10, vf = 40'처럼 풀이를 기호로 쪼개서 표현.

Expert Lexicons의학·법률·공학 같은 전문 분야의 약어와 표기법으로 추론을 압축하는 방식. 의사가 '심근경색(STEMI), MONA 요법'처럼 동료에게 짧게 말하는 것과 같은 원리.

Self-Consistency같은 질문을 여러 번 추론해서 가장 많이 나온 답을 최종 답으로 채택하는 앙상블 기법. 다수결 투표처럼 작동해서 안정성을 높임.

Related Resources

Original Abstract (Expand)

Recent advances in large language models (LLMs) have enabled strong reasoning capabilities through Chain-of-Thought (CoT) prompting, which elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs, leading to increased computational overhead. We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints to reduce token usage while preserving reasoning accuracy. SoT is designed as a flexible, modular approach and is instantiated with three paradigms--Conceptual Chaining, Chunked Symbolism, and Expert Lexicons--each tailored to distinct reasoning tasks and selected dynamically at test-time by a lightweight routing model. Across 18 reasoning datasets spanning multiple domains, languages, and modalities, SoT achieves token reductions of up to 84% with minimal accuracy loss. In tasks such as mathematical and multi-hop reasoning, it even improves accuracy while shortening outputs.