Hypothesis-Conditioned Query Rewriting: 가설 기반 쿼리 재작성으로 의사결정에 유용한 RAG 검색 구현

Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval

Mar 19, 2026•Hangeol Chang, Changsun Lee, Seungjoon Rho +2•View PDF

TL;DR Highlight

RAG에서 단순 토픽 검색 대신 '가설 → 3가지 타깃 쿼리' 방식으로 실제 답 선택에 도움되는 문서를 검색하는 training-free 기법

Who Should Read

RAG 파이프라인에서 검색된 문서가 실제 답변 품질 향상에 기여하지 못하는 문제를 겪고 있는 백엔드/ML 개발자. 특히 의료, 법률처럼 여러 후보 중 정답을 골라야 하는 다지선다형 QA 시스템을 만드는 개발자.

Core Mechanics

Simple RAG(원본 질문을 그대로 검색 쿼리로 쓰는 방식)은 모든 모델 설정에서 검색 없이 CoT로 답하는 것보다 오히려 성능이 낮음 — 관련 없는 문서가 오히려 방해됨
HCQR은 먼저 '임시 가설(working hypothesis)'을 세우고, 이를 기반으로 3가지 쿼리를 생성: (1) 가설 지지 증거, (2) 경쟁 후보와 구별하는 증거, (3) 질문의 핵심 단서 검증
가설은 최종 생성 모델에게 노출하지 않고 검색 쿼리 생성에만 사용 — 답을 미리 고정하는 anchoring 문제 방지
결정에 유용한 문서 비율(Decision-Useful Rate)을 측정하는 새로운 평가 지표 도입: ENTAILED(완전 근거), USEFUL(부분 근거), NOT USEFUL(무관) 3단계로 분류
post-retrieval 방식(리랭킹, 필터링)보다 pre-retrieval 쿼리 설계가 더 효과적 — 처음부터 좋은 쿼리를 만드는 게 나중에 걸러내는 것보다 중요
training-free 방식이라 모델 파인튜닝 없이 프롬프트만으로 구현 가능, Llama3.2-3B부터 Qwen3-30B까지 모든 모델에서 일관된 성능 향상

Evidence

MedQA에서 Simple RAG 대비 평균 +5.9 포인트 정확도 향상 (69.1% → 75.0%), MMLU-Med에서 +3.6 포인트 향상 (77.5% → 81.1%)
Decision-Useful Rate: Simple RAG가 MedQA에서 30.0%인 반면 HCQR은 82.1% — 결정에 유용한 문서 비율이 2.7배 높음
NOT USEFUL 문서 비율: Simple RAG 60.7% vs HCQR 23.8% — 쓸모없는 문서가 절반 이하로 감소, ENTAILED 비율은 6.0%→34.7%로 5.8배 증가
ENTAILED 문서가 포함된 경우 정확도 94.4%, CoT 대비 +8.2 포인트; NOT USEFUL 문서만 있는 경우 57.6%, CoT 대비 -5.8 포인트 — 문서 품질이 성능에 직결됨

How to Apply

다지선다형 QA 시스템에서 기존 단일 쿼리를 3단계로 교체: 질문+선택지 → LLM으로 가설 생성(best_guess, discriminating_features, confirming_evidence) → 3개 타깃 쿼리 생성 → 각 쿼리로 top-5 검색 후 합치기
검색 쿼리 재작성 시 '가설 지지 / 대안 구별 / 핵심 단서 검증' 세 역할을 명시적으로 분리하면 됨 — 프롬프트 템플릿만 바꾸면 되므로 기존 RAG 파이프라인에 추가 비용 없이 적용 가능
검색된 문서의 품질을 평가할 때 단순 유사도 점수 대신 ENTAILED/USEFUL/NOT USEFUL LLM judge를 활용하면 RAG 성능 병목이 어디인지(검색 품질 vs 생성 품질) 파악 가능

Code Example

snippet

# HCQR 핵심 프롬프트 흐름 (직접 적용 가능)

# Stage 1: Hypothesis Formulator
hypothesis_prompt = """
Question: {question}
Options: {options}

Analyze this question carefully. Think step-by-step about each option.
After your analysis, provide your final assessment in JSON:
{
  "discriminating_features": ["2-3 features that distinguish between options"],
  "reasoning": "brief explanation why this is the best answer",
  "confirming_evidence": ["1-3 specific facts that would confirm this answer"],
  "best_guess": "A/B/C/D",
  "best_guess_text": "copy the chosen option text verbatim"
}
"""

# Stage 2: Query Rewriter
query_rewrite_prompt = """
Generate 3 highly targeted search queries to find evidence for this question.

Question: {question}
Best Guess Answer: {best_guess_text}
Reasoning: {reasoning}
Evidence Needed: {confirming_evidence}
Key Features: {discriminating_features}

Generate 3 SPECIFIC queries:
Query 1: Find evidence SUPPORTING {best_guess_text} - focus on the main reasoning
Query 2: Find DISTINGUISHING criteria between the top candidate answers
Query 3: Find specific KEY FEATURES or facts

Format:
Query 1: [query]
Query 2: [query]
Query 3: [query]
"""

# Stage 3: Retrieve & Fuse
def hcqr_retrieve(question, options, retriever, top_k=5):
    # Step 1: Generate hypothesis
    hypothesis = llm(hypothesis_prompt.format(
        question=question, options=options
    ))
    
    # Step 2: Generate 3 queries
    queries = llm(query_rewrite_prompt.format(
        question=question,
        best_guess_text=hypothesis['best_guess_text'],
        reasoning=hypothesis['reasoning'],
        confirming_evidence=hypothesis['confirming_evidence'],
        discriminating_features=hypothesis['discriminating_features']
    ))
    
    # Step 3: Retrieve & deduplicate (hypothesis NOT passed to generator)
    all_docs = []
    for q in [queries.q1, queries.q2, queries.q3]:
        docs = retriever.search(q, top_k=top_k)
        all_docs.extend(docs)
    
    # Deduplicate and limit to budget
    unique_docs = deduplicate(all_docs)[:15]
    return unique_docs  # 가설은 포함하지 않고 문서만 반환

Terminology

RAGLLM이 모르는 정보를 외부 문서에서 검색해서 답변에 활용하는 방식. 오픈북 시험처럼 책을 찾아보면서 답하는 것과 비슷.

working hypothesis최종 답이 아닌 '일단 이게 답일 것 같다'는 임시 가설. 검색 방향을 잡기 위한 나침반 역할을 하며, 나중에 검색된 증거에 따라 바뀔 수 있음.

Decision-Useful Rate (DUR)검색된 문서가 실제로 정답 선택에 도움이 됐는지 측정하는 지표. 단순히 주제가 관련있냐가 아니라 '이 문서가 A vs B 선택에 실제로 기여했냐'를 봄.

pre-retrieval문서를 검색하기 전 단계에서 개선하는 방법. 어떤 쿼리로 검색할지를 더 잘 설계하는 것. 반대로 post-retrieval은 검색 후 결과를 걸러내거나 재정렬하는 방식.

HyDE가상의 답변 문서를 LLM으로 먼저 생성한 뒤, 그 문서의 임베딩으로 실제 문서를 검색하는 기법. 질문 자체보다 답변 스타일의 문서를 더 잘 찾을 수 있음.

CoT (Chain-of-Thought)단계적으로 추론 과정을 거쳐 답을 내는 방식. '1+1=2이고, 2+2=4이니까...'처럼 중간 생각을 거쳐 최종 답에 도달하는 것.

ENTAILED검색된 문서만 봐도 정답이 완전히 도출되는 상태. 추가 지식 없이 문서에서 바로 답이 나오는 가장 이상적인 검색 결과.

Related Resources

HCQR 코드 (Anonymous GitHub)

Original Abstract (Expand)

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by grounding generation in external, non-parametric knowledge. However, when a task requires choosing among competing options, simply grounding generation in broadly relevant context is often insufficient to drive the final decision. Existing RAG methods typically rely on a single initial query, which often favors topical relevance over decision-relevant evidence, and therefore retrieves background information that can fail to discriminate among answer options. To address this issue, here we propose Hypothesis-Conditioned Query Rewriting (HCQR), a training-free pre-retrieval framework that reorients RAG from topic-oriented retrieval to evidence-oriented retrieval. HCQR first derives a lightweight working hypothesis from the input question and candidate options, and then rewrites retrieval into three targeted queries that seek evidence to: (1) support the hypothesis, (2) distinguish it from competing alternatives, and (3) verify salient clues in the question. This approach enables context retrieval that is more directly aligned with answer selection, allowing the generator to confirm or overturn the initial hypothesis based on the retrieved evidence. Experiments on MedQA and MMLU-Med show that HCQR consistently outperforms single-query RAG and re-rank/filter baselines, improving average accuracy over Simple RAG by 5.9 and 3.6 points, respectively. Code is available at https://anonymous.4open.science/r/HCQR-1C2E.