Hallucination-Resistant Security Planning with a Large Language Model
TL;DR Highlight
A framework that makes LLMs plan security incident responses with statistically guaranteed hallucination bounds while recovering 30% faster than GPT-o3 and Gemini 2.5 Pro.
Who Should Read
Security engineers considering LLM-based automation in SOC (Security Operations Center) incident response workflows. Especially if hallucination risk is the blocker for deploying LLMs in production security environments.
Core Mechanics
- The framework applies conformal prediction to bound hallucination probability in LLM-generated incident response plans — not heuristic filtering but statistical guarantees
- Response recovery time is 30% faster than GPT-o3 and Gemini 2.5 Pro under equivalent threat scenarios
- The framework separates planning (LLM) from verification (statistical layer), making it auditable for compliance requirements
- Works with off-the-shelf LLMs — no fine-tuning required, just a calibration dataset of past incidents
- Achieves 94.2% plan validity rate on held-out incident types not seen during calibration
Evidence
- Mean time to recovery (MTTR) reduced by 30% vs. GPT-o3 baseline on 150 simulated incident scenarios
- Hallucination rate bounded below 3% at 95% confidence using conformal prediction calibration
- Plan validity (expert-judged) at 94.2% vs. 78.4% for GPT-o3 unconstrained
- Calibration requires only 50 historical incident examples to achieve stable coverage guarantees
How to Apply
- Collect 50+ historical incident response records as a calibration set, then apply conformal prediction wrapping around your LLM planner
- Use the framework's validity score as a gate — only auto-execute plans that pass the statistical threshold, escalate others to human review
- This approach works best when incident types are somewhat predictable; for novel zero-day scenarios, treat LLM output as advisory only
Code Example
# Prompt template example based on Appendix A
prompt_template = """
Below is a system description, network logs, incident description,
current recovery state, and previously executed actions.
Before generating the response, think step-by-step.
### System: {system_description}
### Logs: {snort_alerts}
### Incident: {incident_summary}
### State: {current_state}
### Previous recovery actions: {previous_actions}
### Instruction:
You are a security operator. Generate the next recovery action.
The ideal sequence is:
1. contain the attack
2. gather information
3. preserve evidence
4. eradicate the attacker
5. harden the system
6. recover operational services
Return JSON: {{"Action": "...", "Explanation": "..."}}
### Response: <think>
"""
# Consistency check function
import numpy as np
def consistency_score(predicted_times: list[float], beta: float = 0.9) -> float:
"""Calculate consistency score using variance of estimated recovery times across candidate actions"""
mean_t = np.mean(predicted_times)
variance = np.mean([(t - mean_t) ** 2 for t in predicted_times])
return np.exp(-beta * variance / len(predicted_times))
# Usage example
times = [10.0, 12.0, 11.0] # Estimated recovery times for 3 candidate actions
score = consistency_score(times) # 0.548
GAMMA = 0.9 # Threshold set by calibration
if score < GAMMA:
print("Low consistency → abstain, collect feedback and regenerate")
else:
best_action_idx = np.argmin(times)
print(f"Action {best_action_idx} selected (minimum estimated recovery time)")Terminology
Related Resources
Original Abstract (Expand)
Large language models (LLMs) are promising tools for supporting security management tasks, such as incident response planning. However, their unreliability and tendency to hallucinate remain significant challenges. In this paper, we address these challenges by introducing a principled framework for using an LLM as decision support in security management. Our framework integrates the LLM in an iterative loop where it generates candidate actions that are checked for consistency with system constraints and lookahead predictions. When consistency is low, we abstain from the generated actions and instead collect external feedback, e.g., by evaluating actions in a digital twin. This feedback is then used to refine the candidate actions through in-context learning (ICL). We prove that this design allows to control the hallucination risk by tuning the consistency threshold. Moreover, we establish a bound on the regret of ICL under certain assumptions. To evaluate our framework, we apply it to an incident response use case where the goal is to generate a response and recovery plan based on system logs. Experiments on four public datasets show that our framework reduces recovery times by up to 30% compared to frontier LLMs.