Decide Then Retrieve: Uncertainty 기반 선택적 검색과 Dual-Path Retrieval을 활용한 Training-Free RAG 프레임워크

Decide Then Retrieve: A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval

Jan 7, 2026•Wang Chen, Guanqiang Qi, Weikang Li +3•View PDF

TL;DR Highlight

RAG에서 '항상 검색'하는 대신 LLM의 불확실도로 검색 여부를 먼저 판단하고, 쿼리+가상 문서 두 경로로 검색해 노이즈를 줄이는 프레임워크.

Who Should Read

RAG 파이프라인에서 불필요한 검색 노이즈 때문에 답변 품질이 떨어지는 문제를 겪고 있는 백엔드/AI 개발자. 특히 짧거나 모호한 쿼리에서 검색 품질이 낮아 고민인 팀.

Core Mechanics

기존 RAG는 모든 쿼리에 무조건 검색을 트리거해서 LLM이 이미 아는 답도 노이즈 문서로 망침 — DTR은 LLM이 생성한 답의 불확실도(negative log-likelihood)가 임계값 이하면 검색 자체를 건너뜀
짧고 sparse한 쿼리는 검색 품질이 낮은 문제를 해결하기 위해 원본 쿼리 + LLM이 생성한 가상 답변(pseudo-context) 두 경로로 각각 검색 후 합산 — 단일 경로보다 GT 문서 회수율이 높아짐
두 경로에서 가져온 후보 문서를 '쿼리와도 유사하고 가상 답변과도 유사한' 문서에 높은 점수 부여하는 기하학적 공식(cos(θ1+θ2))으로 최종 k개 선별
학습이 전혀 필요 없는(training-free) 구조라 Qwen2.5-7B, 72B 등 어떤 LLM에도 plug-in 가능하고, bge, e5 등 다른 retriever에서도 동일하게 효과
LLM Judge 방식(7B 모델)은 거의 검색을 안 트리거(0.1%)해서 no-retrieval과 동일하고, 72B Judge는 너무 많이 트리거해서 노이즈가 생김 — 불확실도 기반이 더 안정적
불확실도가 낮은(= 모델이 확신하는) 쿼리에 강제로 검색하면 오히려 정답률이 떨어지는 사례를 실험으로 입증

Evidence

Qwen2.5-7B 기준 Standard RAG 평균 EM/F1 35.81/45.81 → DTR 37.87/48.08로 개선 (5개 QA 벤치마크 평균)
Qwen2.5-72B 기준 Standard RAG 38.83/50.73 → DTR 40.46/52.14로 개선
HotpotQA 검색 정확도(Recall@3): Standard RAG 61.9% → DPR 62.7% (bge 기준), e5에서는 59.3% → 62.6%로 향상. 반면 HyDE, Q2D, CoT는 모두 Standard RAG보다 낮은 49.9~55.2% 수준
불확실도 낮은 쿼리의 약 20~30%만 no-retrieval로 처리해도 최대 달성 가능 정확도의 대부분을 유지함을 실험으로 확인

How to Apply

LLM에서 답변 생성 시 토큰별 log probability를 뽑아 u = -(1/T) * log P(a|q)를 계산하고, 임계값(예: u=0.005) 미만이면 검색 없이 그 답변을 바로 사용 — 임계값은 정확도와 검색 비용 트레이드오프에 따라 조정
검색이 필요한 쿼리에서는 원본 쿼리로 top-n 검색, LLM에 'Write a passage to answer this question' 프롬프트로 생성한 가상 문서로 top-n 검색을 병렬 실행한 뒤 2n개 후보에서 s(d)=cos(θ1+θ2) 공식으로 top-k 재선별
기존 RAG 시스템에 UGT 레이어를 앞단에 추가하고 DPR-AIS를 검색 모듈로 교체하는 방식으로 파인튜닝 없이 적용 가능 — 특히 짧은 키워드 검색이나 모호한 질문이 많은 고객 지원, 사내 Q&A 시스템에 효과적

Code Example

snippet

import numpy as np

def compute_uncertainty(log_probs: list[float]) -> float:
    """토큰별 log probability 리스트로 불확실도 계산"""
    T = len(log_probs)
    return -sum(log_probs) / T

def should_retrieve(query: str, llm, threshold: float = 0.005) -> tuple[bool, str]:
    """UGT: 불확실도 기반 검색 트리거 판단"""
    result = llm.generate_with_logprobs(query + "\nAnswer the question using a single word or phrase.")
    uncertainty = compute_uncertainty(result.log_probs)
    return uncertainty > threshold, result.text

def dual_path_retrieval(query: str, pseudo_context: str, retriever, n: int = 5) -> list:
    """DPR-AIS: 쿼리 + 가상 문서 두 경로로 검색 후 재점수화"""
    q_emb = retriever.encode(query)
    p_emb = retriever.encode(pseudo_context)
    
    docs_q = retriever.search(q_emb, top_k=n)  # 쿼리 경로
    docs_p = retriever.search(p_emb, top_k=n)  # 가상 문서 경로
    candidates = list(set(docs_q + docs_p))     # 합집합 2n개
    
    # AIS: cos(θ1 + θ2) = s1*s2 - sqrt(1-s1²)*sqrt(1-s2²)
    scored = []
    for doc in candidates:
        d_emb = retriever.encode(doc)
        s1 = np.dot(q_emb, d_emb)  # 쿼리-문서 유사도
        s2 = np.dot(p_emb, d_emb)  # 가상문서-문서 유사도
        joint_score = s1 * s2 - np.sqrt(max(0, 1 - s1**2)) * np.sqrt(max(0, 1 - s2**2))
        scored.append((doc, joint_score))
    
    return [doc for doc, _ in sorted(scored, key=lambda x: -x[1])[:3]]

# 사용 예시
def dtr_answer(query: str, llm, retriever) -> str:
    needs_retrieval, initial_answer = should_retrieve(query, llm, threshold=0.005)
    
    if not needs_retrieval:
        return initial_answer  # 확신 있으면 바로 반환
    
    # 가상 문서 생성 후 dual-path 검색
    pseudo = llm.generate(query + "\nWrite a passage to answer this question.")
    docs = dual_path_retrieval(query, pseudo, retriever)
    
    context = "\n".join(docs)
    return llm.generate(f"{query}\n{context}\nAnswer using a single word or phrase.")

Terminology

RAGLLM이 모르는 내용을 외부 문서에서 검색해서 답변에 활용하는 방식. 회사 내부 문서를 GPT에 연결해서 Q&A하는 구조가 대표적.

Uncertainty (불확실도)LLM이 다음 단어를 예측할 때 얼마나 확신하는지의 역수. 토큰별 확률의 평균 음수 로그값으로, 높을수록 '모르겠다'는 뜻.

Sparse Query정보가 부족한 짧은 검색어. 예: '파이썬 오류'처럼 맥락 없는 질문은 검색엔진이 뭘 원하는지 파악하기 어려움.

Pseudo-contextLLM이 질문을 보고 '이런 내용의 문서가 있을 것 같다'며 스스로 써낸 가상의 문서. 실제 정답은 아니지만 검색 쿼리 보조로 사용.

Dual-Path Retrieval검색을 두 갈래로 실행하는 방식. 원본 쿼리로 한 번, LLM이 생성한 가상 문서로 한 번 각각 검색해서 더 넓은 후보를 확보.

EM (Exact Match)예측 답변이 정답과 토씨 하나 틀리지 않고 완전히 일치하는 비율. F1보다 엄격한 평가 기준.

Training-free별도 학습 없이 기존 모델에 바로 적용 가능한 방식. 파인튜닝 비용 없이 plug-in처럼 붙여서 쓸 수 있음.

Related Resources

https://github.com/ChenWangHKU/DTR

Original Abstract (Expand)

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, but existing approaches indiscriminately trigger retrieval and rely on single-path evidence construction, often introducing noise and limiting performance gains. In this work, we propose Decide Then Retrieve (DTR), a training-free framework that adaptively determines when retrieval is necessary and how external information should be selected. DTR leverages generation uncertainty to guide retrieval triggering and introduces a dual-path retrieval mechanism with adaptive information selection to better handle sparse and ambiguous queries. Extensive experiments across five open-domain QA benchmarks, multiple model scales, and different retrievers demonstrate that DTR consistently improves EM and F1 over standard RAG and strong retrieval-enhanced baselines, while reducing unnecessary retrievals. The code and data used in this paper are available at https://github.com/ChenWangHKU/DTR.