Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents

May 28, 2025•Michael Kirchhof, Gjergji Kasneci, Enkelejda Kasneci•View PDF

TL;DR Highlight

A position paper arguing that the traditional aleatoric/epistemic uncertainty split doesn't fit conversational LLM agents, proposing 3 new research directions instead.

Who Should Read

AI engineers building chatbots or conversational LLM agents who need to communicate when the model is uncertain. Product teams designing hallucination detection or confidence display features.

Core Mechanics

The traditional split of uncertainty into aleatoric (irreducible) and epistemic (reducible through learning) has conflicting definitions across schools of thought — theoretically inconsistent
Empirically, aleatoric/epistemic estimates have rank correlations of 0.8-0.999, meaning they carry essentially the same information — separating them is pointless
Three proposed new directions: (1) underspecification uncertainty from ambiguous questions, (2) communicating uncertainty naturally in dialogue, (3) uncertainty-aware agent decision making

Evidence

Aleatoric/epistemic estimate rank correlations of 0.8-0.999 — theoretically separate metrics contain the same information in practice (Mucsanyi et al., 2024, ImageNet-1k deep ensemble experiments)
GPT-3.5-Turbo-16k accuracy at detecting ambiguous questions: 57% (vs. 50% random baseline) — barely better than chance
Human evaluators also struggle with follow-up question quality assessment

How to Apply

Design chatbot prompts to ask clarifying questions before answering uncertain queries — e.g., 'This question is missing country info, which could change the answer. Which country are you asking about?'
Instead of showing 'probability 0.6', express uncertainty as 'There are two possibilities: if A then ~, if B then ~. Let me know which situation applies for a more precise answer.'
For agent systems, integrate uncertainty estimates into action selection — high uncertainty should trigger information-gathering actions rather than committing to potentially wrong answers.

Code Example

snippet

# Example system prompt for expressing uncertainty in text rather than numbers
SYSTEM_PROMPT = """
You are a helpful assistant. When you are uncertain about an answer:
1. DO NOT just say a confidence score like '70% confident'.
2. Instead, explain WHY you are uncertain.
3. List the competing possibilities and what distinguishes them.
4. Ask ONE clarifying question if missing information is the key issue.

Example of good uncertainty expression:
"There are two likely answers depending on context:
- If you're asking about the US release: November 2001
- If you're asking about the UK release: November 4, 2001
Could you clarify which country you're asking about?"

Example of bad uncertainty expression:
"I'm about 70% confident the answer is November 2001."
"""

Terminology

Uncertainty Quantification (UQ)Techniques for numerically expressing how confident a model is. Like a doctor saying '90% chance it's a cold' — AI expressing its own confidence level.

Aleatoric UncertaintyUncertainty inherent to the data that can't be reduced no matter how much you train. Like coin flips — fundamentally unpredictable.

Epistemic UncertaintyUncertainty from the model's lack of knowledge that could theoretically be reduced with more training data.

Related Resources

Original Abstract (Expand)

Large-language models (LLMs) and chatbot agents are known to provide wrong outputs at times, and it was recently found that this can never be fully prevented. Hence, uncertainty quantification plays a crucial role, aiming to quantify the level of ambiguity in either one overall number or two numbers for aleatoric and epistemic uncertainty. This position paper argues that this traditional dichotomy of uncertainties is too limited for the open and interactive setup that LLM agents operate in when communicating with a user, and that we need to research avenues that enrich uncertainties in this novel scenario. We review the literature and find that popular definitions of aleatoric and epistemic uncertainties directly contradict each other and lose their meaning in interactive LLM agent settings. Hence, we propose three novel research directions that focus on uncertainties in such human-computer interactions: Underspecification uncertainties, for when users do not provide all information or define the exact task at the first go, interactive learning, to ask follow-up questions and reduce the uncertainty about the current context, and output uncertainties, to utilize the rich language and speech space to express uncertainties as more than mere numbers. We expect that these new ways of dealing with and communicating uncertainties will lead to LLM agent interactions that are more transparent, trustworthy, and intuitive.