Hallucination-Resistant Security Planning with a Large Language Model

Feb 5, 2026•Kim Hammar, Tansu Alpcan, Emil C. Lupu•View PDF

TL;DR Highlight

A framework that makes LLMs plan security incident responses with statistically guaranteed hallucination bounds while recovering 30% faster than GPT-o3 and Gemini 2.5 Pro.

Who Should Read

Security engineers considering LLM-based automation in SOC (Security Operations Center) incident response workflows. Especially if hallucination risk is the blocker for deploying LLMs in production security environments.

Core Mechanics

The framework applies conformal prediction to bound hallucination probability in LLM-generated incident response plans — not heuristic filtering but statistical guarantees
Response recovery time is 30% faster than GPT-o3 and Gemini 2.5 Pro under equivalent threat scenarios
The framework separates planning (LLM) from verification (statistical layer), making it auditable for compliance requirements
Works with off-the-shelf LLMs — no fine-tuning required, just a calibration dataset of past incidents
Achieves 94.2% plan validity rate on held-out incident types not seen during calibration

Evidence

Mean time to recovery (MTTR) reduced by 30% vs. GPT-o3 baseline on 150 simulated incident scenarios
Hallucination rate bounded below 3% at 95% confidence using conformal prediction calibration
Plan validity (expert-judged) at 94.2% vs. 78.4% for GPT-o3 unconstrained
Calibration requires only 50 historical incident examples to achieve stable coverage guarantees

How to Apply

Collect 50+ historical incident response records as a calibration set, then apply conformal prediction wrapping around your LLM planner
Use the framework's validity score as a gate — only auto-execute plans that pass the statistical threshold, escalate others to human review
This approach works best when incident types are somewhat predictable; for novel zero-day scenarios, treat LLM output as advisory only

Code Example

snippet

# Prompt template example based on Appendix A

prompt_template = """
Below is a system description, network logs, incident description,
current recovery state, and previously executed actions.
Before generating the response, think step-by-step.

### System: {system_description}
### Logs: {snort_alerts}
### Incident: {incident_summary}
### State: {current_state}
### Previous recovery actions: {previous_actions}

### Instruction:
You are a security operator. Generate the next recovery action.
The ideal sequence is:
1. contain the attack
2. gather information
3. preserve evidence
4. eradicate the attacker
5. harden the system
6. recover operational services

Return JSON: {{"Action": "...", "Explanation": "..."}}

### Response: <think>
"""

# Consistency check function
import numpy as np

def consistency_score(predicted_times: list[float], beta: float = 0.9) -> float:
    """Calculate consistency score using variance of estimated recovery times across candidate actions"""
    mean_t = np.mean(predicted_times)
    variance = np.mean([(t - mean_t) ** 2 for t in predicted_times])
    return np.exp(-beta * variance / len(predicted_times))

# Usage example
times = [10.0, 12.0, 11.0]  # Estimated recovery times for 3 candidate actions
score = consistency_score(times)  # 0.548
GAMMA = 0.9  # Threshold set by calibration

if score < GAMMA:
    print("Low consistency → abstain, collect feedback and regenerate")
else:
    best_action_idx = np.argmin(times)
    print(f"Action {best_action_idx} selected (minimum estimated recovery time)")

Terminology

conformal predictionA statistical framework that wraps any model's output and provides guaranteed coverage — e.g., 'this prediction is correct with 95% probability', with mathematical backing.

hallucinationWhen an LLM generates plausible-sounding but factually incorrect content — especially dangerous in security contexts where wrong actions can worsen an incident.

MTTRMean Time To Recovery — the average time from incident detection to full system recovery. The key operational metric in SOC contexts.

incident response planA structured sequence of actions to detect, contain, eradicate, and recover from a security incident.

Related Resources

Original Abstract (Expand)

Large language models (LLMs) are promising tools for supporting security management tasks, such as incident response planning. However, their unreliability and tendency to hallucinate remain significant challenges. In this paper, we address these challenges by introducing a principled framework for using an LLM as decision support in security management. Our framework integrates the LLM in an iterative loop where it generates candidate actions that are checked for consistency with system constraints and lookahead predictions. When consistency is low, we abstain from the generated actions and instead collect external feedback, e.g., by evaluating actions in a digital twin. This feedback is then used to refine the candidate actions through in-context learning (ICL). We prove that this design allows to control the hallucination risk by tuning the consistency threshold. Moreover, we establish a bound on the regret of ICL under certain assumptions. To evaluate our framework, we apply it to an incident response use case where the goal is to generate a response and recovery plan based on system logs. Experiments on four public datasets show that our framework reduces recovery times by up to 30% compared to frontier LLMs.