CoE: 멀티 LLM 에이전트 시스템의 불확실성 측정을 위한 Collaborative Entropy

CoE: Collaborative Entropy for Uncertainty Quantification in Agentic Multi-LLM Systems

Mar 30, 2026•Kangkang Sun, Jun Wu, Jianhua Li +3•View PDF

TL;DR Highlight

여러 LLM이 협업할 때 '각 모델이 얼마나 확신하는가'와 '모델들끼리 얼마나 의견이 다른가'를 동시에 측정하는 새로운 불확실성 지표

Who Should Read

여러 LLM을 조합해 에이전트 시스템을 구축하는 개발자. 특히 의료/법률처럼 틀린 답변이 위험한 도메인에서 멀티 모델 앙상블의 신뢰도를 평가해야 하는 ML 엔지니어.

Core Mechanics

기존 불확실성 측정법(Semantic Entropy 등)은 모델 하나의 내부 확신만 보는데, 여러 모델이 각각 확신하지만 서로 다른 답을 낼 때를 잡아내지 못함
CoE는 '개별 모델 내부 불확실성(UA)'과 '모델 간 의견 불일치(UE)' 두 성분으로 분해해서 '왜 시스템이 불확실한지' 구분 가능
UA가 높으면 프롬프트 개선/샘플링 다양화가 해결책, UE가 높으면 모델 간 정렬이 필요 — 두 가지를 하나로 합치면 이 구분 자체가 사라짐
LLaMA-3.1-8B-Instruct, Qwen-2.5-7B-Instruct, Mistral-7B-Instruct 3개 모델로 실험하면 모델 수가 늘수록 CoE의 이점이 커짐 (AUROC 0.683 → 0.772)
CoE 기반의 training-free 후처리 가중치 조정 휴리스틱으로 정확도를 최대 +39.0% 향상 (비대칭 KL divergence 사용 시)
비대칭 KL divergence가 JS/Wasserstein/Hellinger 같은 대칭 방식보다 훨씬 효과적 — 모델이 앙상블 평균에서 얼마나 이탈하는지를 방향성 있게 측정하기 때문

Evidence

TriviaQA 3모델 설정에서 CoE AUROC 0.772로, 기존 최강 베이스라인 UE(0.716)와 Semantic Entropy(0.687) 모두 초과
SQuAD 3모델 설정에서 AUROC 0.878 달성, 6모델에서도 0.811로 베이스라인 대비 최대 20% 향상
CoE 기반 조정 휴리스틱으로 정확도 +39.0% 향상 (KL), JS는 +27.5%, Hellinger +23.0%, Wasserstein +20.5%로 비대칭 KL이 압도적
샘플 수를 4→8개로 늘렸을 때 앙상블 정확도 81.9%→96.0%로 향상, 온도 0.8~1.0 범위에서 앙상블 정확도 92~95%로 안정적

How to Apply

멀티 LLM 파이프라인에서 각 모델의 응답을 샘플링한 뒤, bidirectional entailment(양방향 함의 관계)로 의미적으로 같은 답끼리 클러스터링하고 UA+UE를 계산해 신뢰도 낮은 쿼리를 걸러내는 필터로 활용
CoE 값이 높을 때 UA가 높으면 각 모델의 temperature를 낮추거나 few-shot 프롬프트를 보강하고, UE가 높으면 모델 가중치 재조정이나 추가 검증 단계를 삽입하는 분기 로직 구현
이미 생성된 출력에 후처리로 붙이는 방식이라 모델 재학습 불필요 — 기존 멀티 LLM 앙상블에 플러그인 형태로 추가 가능하며, 불확실성 높은 케이스만 인간 검토로 라우팅하는 selective prediction 시스템 구축에 바로 활용 가능

Code Example

snippet

import numpy as np
from scipy.special import rel_entr

def collaborative_entropy(cluster_probs_list, weights=None):
    """
    cluster_probs_list: 각 모델의 클러스터 확률 분포 리스트
      예: [[0.8, 0.1, 0.1], [0.1, 0.8, 0.1], [0.7, 0.2, 0.1]]
    weights: 모델별 가중치 (None이면 균등)
    """
    K = len(cluster_probs_list)
    if weights is None:
        weights = [1.0 / K] * K
    
    probs = [np.array(p) + 1e-10 for p in cluster_probs_list]  # 0 방지
    probs = [p / p.sum() for p in probs]
    
    # 앙상블 평균 분포
    ensemble_mean = sum(w * p for w, p in zip(weights, probs))
    
    # UA: 각 모델의 semantic entropy 평균
    def shannon_entropy(p):
        return -np.sum(p * np.log(p + 1e-10))
    
    UA = np.mean([shannon_entropy(p) for p in probs])
    
    # UE: 각 모델의 분포와 앙상블 평균 간 KL divergence 가중합
    UE = sum(w * np.sum(rel_entr(p, ensemble_mean))
             for w, p in zip(weights, probs))
    
    CoE = UA + UE
    return {"CoE": CoE, "UA": UA, "UE": UE}

# 예시: 3개 모델이 3개 의미 클러스터에 대해 가진 확률 분포
model_outputs = [
    [0.8, 0.1, 0.1],   # LLaMA: 클러스터 0에 확신
    [0.1, 0.8, 0.1],   # Qwen: 클러스터 1에 확신 (모델 간 불일치!)
    [0.7, 0.2, 0.1],   # Mistral: 클러스터 0에 확신
]

result = collaborative_entropy(model_outputs)
print(f"CoE: {result['CoE']:.4f}")
print(f"UA (intra-model): {result['UA']:.4f}")
print(f"UE (inter-model): {result['UE']:.4f}")
# UA 낮고 UE 높으면 -> 모델들이 각자 확신하지만 의견이 다른 상태

Terminology

Semantic EntropyLLM이 같은 질문에 여러 번 답할 때, 의미적으로 다른 답이 얼마나 다양하게 나오는지 측정하는 불확실성 지표. '파리의 수도는?'에 항상 '서울'이라고 하면 entropy 0, '서울/부산/대전' 다양하게 답하면 entropy 높음.

AUROC불확실성 점수가 실제로 틀린 답을 잘 가려내는지 평가하는 지표. 1.0이면 완벽, 0.5면 랜덤과 같음. 높을수록 '이 답은 위험하니 걸러내라'는 신호가 정확한 것.

KL Divergence두 확률 분포가 얼마나 다른지 측정하는 수식. 비대칭이라 A→B 거리와 B→A 거리가 다름. CoE에서는 각 모델이 앙상블 평균에서 얼마나 벗어났는지 방향성 있게 측정하기 위해 사용.

Bidirectional Entailment두 문장이 서로를 논리적으로 함의하는지 확인해서 의미가 같은지 판단하는 방법. '오바마는 미국 대통령이었다'와 '미국의 전 대통령은 오바마다'는 bidirectional entailment 성립 → 같은 클러스터.

Aleatoric Uncertainty데이터 자체의 모호함에서 오는 불확실성. 동전 앞뒤처럼 근본적으로 랜덤한 성질이라 더 많은 데이터를 모아도 줄어들지 않음.

Epistemic Uncertainty지식 부족에서 오는 불확실성. 더 많은 정보나 더 나은 모델로 줄일 수 있음. 멀티 LLM에서는 모델들이 다른 지식 기반을 학습해서 의견이 갈릴 때 나타남.

AURAC불확실한 예측을 순서대로 거절해 나갈 때, 남은 예측들의 정확도가 얼마나 잘 올라가는지 측정하는 지표. 높을수록 '불확실한 것 버리면 정확해진다'는 선택적 예측이 잘 동작하는 것.

Original Abstract (Expand)

Uncertainty estimation in multi-LLM systems remains largely single-model-centric: existing methods quantify uncertainty within each model but do not adequately capture semantic disagreement across models. To address this gap, we propose Collaborative Entropy (CoE), a unified information-theoretic metric for semantic uncertainty in multi-LLM collaboration. CoE is defined on a shared semantic cluster space and combines two components: intra-model semantic entropy and inter-model divergence to the ensemble mean. CoE is not a weighted ensemble predictor; it is a system-level uncertainty measure that characterizes collaborative confidence and disagreement. We analyze several core properties of CoE, including non-negativity, zero-value certainty under perfect semantic consensus, and the behavior of CoE when individual models collapse to delta distributions. These results clarify when reducing per-model uncertainty is sufficient and when residual inter-model disagreement remains. We also present a simple CoE-guided, training-free post-hoc coordination heuristic as a practical application of the metric. Experiments on \textit{TriviaQA} and \textit{SQuAD} with LLaMA-3.1-8B-Instruct, Qwen-2.5-7B-Instruct, and Mistral-7B-Instruct show that CoE provides stronger uncertainty estimation than standard entropy- and divergence-based baselines, with gains becoming larger as additional heterogeneous models are introduced. Overall, CoE offers a useful uncertainty-aware perspective on multi-LLM collaboration.