Hallucination을 수학적으로 제어하는 LLM 기반 보안 인시던트 대응 플래닝 프레임워크

Hallucination-Resistant Security Planning with a Large Language Model

Feb 5, 2026•Kim Hammar, Tansu Alpcan, Emil C. Lupu•View PDF

TL;DR Highlight

LLM이 보안 인시던트 대응 계획을 세울 때 hallucination을 통계적으로 보장하면서 GPT-o3, Gemini 2.5 Pro보다 30% 빠르게 복구하는 프레임워크.

Who Should Read

SOC(보안 운영 센터)에서 인시던트 대응 자동화를 고민하는 보안 엔지니어. LLM을 실제 운영 환경에 붙이려는데 hallucination이 걱정되는 AI/보안 팀.

Core Mechanics

LLM이 N개(기본 3개) 후보 액션을 동시에 생성하고, 각 액션의 예상 복구 시간 분산으로 '일관성(consistency)' 점수를 계산 — 점수가 낮으면 자동으로 abstain(거부)하고 피드백 수집 루프로 진입
일관성이 낮을 때 digital twin(실제 시스템의 가상 복제본)에서 해당 액션을 테스트한 결과를 프롬프트 컨텍스트에 추가해 재생성 — 파라미터 변경 없이 ICL(In-Context Learning)로만 개선
14B짜리 fine-tuned DeepSeek-R1-14B가 671B DeepSeek-R1, Gemini 2.5 Pro, OpenAI O3보다 모든 데이터셋에서 성능 우위 — 모델 크기보다 검증 루프가 중요함을 보여줌
Consistency threshold γ를 calibration dataset(~100개 hallucination 샘플)으로 설정하면 hallucination 확률 상한을 수학적으로 보장 (예: γ=0.9 → hallucination ≤ 5%)
ablation 결과: lookahead 제거 시 평균 복구 액션 수 12→21로 75% 증가, abstention 제거 시 hallucination 확률 0.02→0.06으로 3배 상승 — 각 컴포넌트 모두 필수

Evidence

4개 데이터셋 25개 인시던트 기준 평균 복구 시간 12.02 (우리 방법) vs 16.21 (차선인 Gemini 2.5 Pro) — 약 26% 단축
최대 30% 복구 시간 단축 (CSLE-IDS-2024 데이터셋 기준 10.82 vs 19.19)
hallucination 확률: abstention 포함 시 0.02, 제거 시 0.06 — 3배 차이
ICL regret이 약 8 iteration 이후 수렴 — 평균 2.24회 피드백 요청으로 수렴

How to Apply

LLM으로 인시던트 대응 액션을 생성할 때, 동일한 프롬프트로 3개 후보를 뽑아 각각의 예상 복구 시간을 다시 LLM에게 예측시키고 분산이 크면(일관성 낮음) 그 액션을 채택하지 말고 sandbox/staging 환경에서 테스트한 결과를 컨텍스트에 추가 후 재생성하는 루프를 구현
hallucination이 발생한 사례를 100개 정도 모아 calibration dataset으로 사용해 consistency threshold γ를 설정하면, 원하는 수준의 hallucination 확률 상한(예: κ=0.05)을 통계적으로 보장할 수 있음 — 자동화 수준이 높을수록 낮은 κ 설정 권장
digital twin 없이 시작하려면 외부 검증을 '사람 전문가 검토'로 대체해도 동일한 프레임워크 적용 가능 — 피드백 텍스트를 컨텍스트에 append하는 방식으로 구현

Code Example

snippet

# Appendix A 기반 프롬프트 템플릿 예시

prompt_template = """
Below is a system description, network logs, incident description,
current recovery state, and previously executed actions.
Before generating the response, think step-by-step.

### System: {system_description}
### Logs: {snort_alerts}
### Incident: {incident_summary}
### State: {current_state}
### Previous recovery actions: {previous_actions}

### Instruction:
You are a security operator. Generate the next recovery action.
The ideal sequence is:
1. contain the attack
2. gather information
3. preserve evidence
4. eradicate the attacker
5. harden the system
6. recover operational services

Return JSON: {{"Action": "...", "Explanation": "..."}}

### Response: <think>
"""

# 일관성 체크 함수
import numpy as np

def consistency_score(predicted_times: list[float], beta: float = 0.9) -> float:
    """후보 액션들의 예상 복구 시간 분산으로 일관성 점수 계산"""
    mean_t = np.mean(predicted_times)
    variance = np.mean([(t - mean_t) ** 2 for t in predicted_times])
    return np.exp(-beta * variance / len(predicted_times))

# 사용 예시
times = [10.0, 12.0, 11.0]  # 3개 후보 액션의 예상 복구 시간
score = consistency_score(times)  # 0.548
GAMMA = 0.9  # calibration으로 설정한 threshold

if score < GAMMA:
    print("일관성 낮음 → abstain, 피드백 수집 후 재생성")
else:
    best_action_idx = np.argmin(times)
    print(f"액션 {best_action_idx} 선택 (예상 복구 시간 최소)")

Terminology

HallucinationLLM이 자신감 있게 틀린 답을 출력하는 현상. 보안 맥락에서는 존재하지 않는 취약점을 패치하라는 등 실제 복구에 도움 안 되는 액션을 그럴듯하게 생성하는 것.

ICL (In-Context Learning)모델 파라미터를 재학습하지 않고, 프롬프트에 예시나 피드백을 추가해서 모델의 다음 출력을 유도하는 방법. 채팅에서 '이전에 이런 실수를 했어, 다시 해봐'라고 맥락을 알려주는 것과 비슷.

Conformal Abstention모델이 불확실할 때 답을 거부하는 기법. 통계적 보정(calibration)을 통해 '이 설정이면 틀릴 확률이 5% 이하임'을 수학적으로 보장할 수 있음.

Digital Twin실제 네트워크 시스템을 가상으로 복제한 환경. 실제 서버에 영향 없이 복구 액션을 먼저 테스트해볼 수 있음 — 마치 프로덕션 배포 전에 staging에서 테스트하는 것.

Lookahead현재 액션을 실행했을 때 미래 상태를 미리 예측하고 그 결과를 바탕으로 의사결정하는 방법. 체스에서 몇 수 앞을 내다보는 것과 같은 개념.

Consistency Threshold (γ)여러 후보 답변이 얼마나 일치해야 '믿을 만하다'고 판단할지 기준값. 높게 설정할수록 더 자주 거부하고 피드백을 요청함.

MITRE ATT&CK실제 해커들이 사용하는 공격 기법을 체계적으로 분류한 지식 베이스. 인시던트 대응 시 어떤 유형의 공격인지 분류하는 공통 언어로 사용됨.

Related Resources

Original Abstract (Expand)

Large language models (LLMs) are promising tools for supporting security management tasks, such as incident response planning. However, their unreliability and tendency to hallucinate remain significant challenges. In this paper, we address these challenges by introducing a principled framework for using an LLM as decision support in security management. Our framework integrates the LLM in an iterative loop where it generates candidate actions that are checked for consistency with system constraints and lookahead predictions. When consistency is low, we abstain from the generated actions and instead collect external feedback, e.g., by evaluating actions in a digital twin. This feedback is then used to refine the candidate actions through in-context learning (ICL). We prove that this design allows to control the hallucination risk by tuning the consistency threshold. Moreover, we establish a bound on the regret of ICL under certain assumptions. To evaluate our framework, we apply it to an incident response use case where the goal is to generate a response and recovery plan based on system logs. Experiments on four public datasets show that our framework reduces recovery times by up to 30% compared to frontier LLMs.