그래프 위에서 추론하기: 신뢰할 수 있고 해석 가능한 LLM Reasoning (RoG)

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

Oct 2, 2023•Linhao Luo, Yuan-Fang Li, Gholamreza Haffari +1•View PDF

TL;DR Highlight

LLM이 Knowledge Graph의 관계 경로를 계획-검색-추론 파이프라인으로 활용해 환각 없이 정확한 답변을 내는 방법.

Who Should Read

LLM 기반 QA 시스템에서 환각(hallucination)과 최신 지식 부재 문제를 해결하려는 백엔드/AI 개발자. 특히 Knowledge Graph를 보유하거나 연동할 수 있는 환경에서 신뢰도 높은 추론 파이프라인을 구축하려는 팀.

Core Mechanics

LLM이 Knowledge Graph(KG)에서 '관계 경로(relation path)'를 먼저 계획으로 생성하고, 그 경로로 실제 사실을 검색한 뒤 추론하는 3단계 파이프라인(Planning → Retrieval → Reasoning)을 제안
KG의 구조 정보(entity 간 관계 순서)를 무시하고 단순 트리플만 가져오는 기존 RAG 방식과 달리, 관계 경로를 '추론 계획'으로 써서 멀티홉 질문에도 강함
LLaMA2-Chat-7B를 백본으로 instruction fine-tuning 후, RoG의 Planning 모듈은 재학습 없이 ChatGPT, Alpaca 등 어떤 LLM에도 plug-and-play로 붙일 수 있음
단순히 KG 사실을 컨텍스트로 넣는 게 아니라, 추론 경로 자체를 출력에 포함시켜 왜 그 답이 나왔는지 단계별 설명 제공 (interpretability 확보)
Freebase로 학습한 모델을 Wiki-Movies KG로 전이 시 재학습 시간이 38시간 → 2시간으로 단축, 다른 KG로 쉽게 이식 가능
계획 모듈 없이 랜덤 경로를 쓰면 성능이 크게 떨어져, '어떤 경로를 탐색할지 계획하는 것' 자체가 핵심임을 ablation으로 확인

Evidence

WebQSP에서 Hits@1 85.7%로 이전 SOTA DECAF(82.1%) 대비 4.4%p 향상
CWQ(복잡한 멀티홉 질문)에서 Hits@1 62.6%, F1 56.2%로 이전 SOTA UniKGQA(Hits@1 51.2%, F1 49.1%) 대비 각각 22.3%p, 14.4%p 향상
RoG Planning 모듈 연동 시 ChatGPT Hits@1 66.8% → 81.5%, Flan-T5-xl 31.0% → 67.9%로 개선 (Flan-T5 기준 약 119% 향상)
Ablation: Planning 제거 시 WebQSP F1 70.8 → 49.7, 랜덤 경로 사용 시 35.2로 추락

How to Apply

사내 Knowledge Graph(Neo4j, Freebase 등)가 있다면, 질문을 받을 때 LLM에게 먼저 '이 질문을 풀기 위해 어떤 관계 순서로 탐색해야 하는가'를 관계 경로 형태로 생성하게 하고, 그 경로로 KG를 BFS 탐색해 실제 경로를 검색한 뒤 LLM에 컨텍스트로 제공하면 환각을 크게 줄일 수 있음
기존 RAG 파이프라인에서 단순 트리플 검색 대신 relation path 기반 검색으로 바꾸면 멀티홉 질문(예: 'A의 아버지의 출생지 인구는?') 성능이 올라감; 논문의 Planning Prompt Template과 Reasoning Prompt Template을 그대로 프롬프트로 활용 가능
LLaMA2-Chat-7B를 Planning에만 fine-tune하고, 추론은 기존에 쓰던 GPT-4나 Claude에 위임하는 하이브리드 구조로도 활용 가능 (plug-and-play 지원)

Code Example

snippet

# RoG Planning + Reasoning 프롬프트 예시

# Step 1: Planning - KG 관계 경로 생성
planning_prompt = """
Please generate a valid relation path that can be helpful for answering the following question:
{question}
"""
# LLM이 반환하는 형식 예시:
# <PATH> location.administrative_division.first_level_division_of <SEP> government.form_of_government.countries </PATH>

# Step 2: Retrieval - KG에서 BFS로 실제 경로 탐색
def retrieve_reasoning_paths(question_entities, relation_path, kg):
    from collections import deque
    results = []
    queue = deque([(e, []) for e in question_entities])
    while queue:
        node, path = queue.popleft()
        if len(path) == len(relation_path):
            results.append(path)
            continue
        next_relation = relation_path[len(path)]
        for (s, r, t) in kg.get_triples(node):
            if r == next_relation:
                queue.append((t, path + [(s, r, t)]))
    return results

# Step 3: Reasoning - 검색된 경로 기반 최종 답변 생성
reasoning_prompt = """
Based on the reasoning paths, please answer the given question.
Please keep the answer as simple as possible and return all the possible answers as a list.

Reasoning Paths:
{reasoning_paths}

Question:
{question}
"""

Terminology

Knowledge Graph (KG)엔티티(사람, 장소, 개념)와 그 사이의 관계를 그래프 형태로 저장한 구조화된 지식 베이스. 예: (Justin Bieber → child_of → Jeremy Bieber)처럼 (주어, 관계, 목적어) 트리플로 구성됨.

Relation PathKG에서 엔티티를 이어주는 관계들의 순서. 예: 'child_of → has_son'처럼 어떤 관계를 따라가면 답에 도달하는지를 나타내는 경로 계획.

HallucinationLLM이 사실이 아닌 내용을 그럴듯하게 생성하는 현상. 마치 자신감 있게 틀린 답을 말하는 것처럼, 모델이 없는 사실을 만들어냄.

KGQAKnowledge Graph Question Answering의 약자. KG에 저장된 지식을 바탕으로 자연어 질문에 답하는 태스크.

Hits@1모델이 가장 높은 확률로 예측한 첫 번째 답이 정답인 비율. 숫자가 높을수록 첫 번째 예측이 정확하다는 뜻.

Instruction Fine-tuningLLM에게 '이런 지시가 오면 이렇게 답해라'는 예제를 대량으로 학습시키는 방법. 일반 언어 모델을 특정 작업에 맞게 길들이는 과정.

BFS (Breadth-First Search)그래프 탐색 알고리즘으로, 시작점에서 가까운 노드부터 차례로 탐색함. RoG에서는 KG의 관계 경로를 따라 답 엔티티를 찾을 때 사용됨.

Plug-and-play추가 학습 없이 기존 모듈을 다른 시스템에 바로 연결해서 쓸 수 있는 방식. RoG의 Planning 모듈을 ChatGPT나 다른 LLM에 연결해서 그대로 쓸 수 있다는 의미.

Related Resources

Original Abstract (Expand)

Large language models (LLMs) have demonstrated impressive reasoning abilities in complex tasks. However, they lack up-to-date knowledge and experience hallucinations during reasoning, which can lead to incorrect reasoning processes and diminish their performance and trustworthiness. Knowledge graphs (KGs), which capture vast amounts of facts in a structured format, offer a reliable source of knowledge for reasoning. Nevertheless, existing KG-based LLM reasoning methods only treat KGs as factual knowledge bases and overlook the importance of their structural information for reasoning. In this paper, we propose a novel method called reasoning on graphs (RoG) that synergizes LLMs with KGs to enable faithful and interpretable reasoning. Specifically, we present a planning-retrieval-reasoning framework, where RoG first generates relation paths grounded by KGs as faithful plans. These plans are then used to retrieve valid reasoning paths from the KGs for LLMs to conduct faithful reasoning. Furthermore, RoG not only distills knowledge from KGs to improve the reasoning ability of LLMs through training but also allows seamless integration with any arbitrary LLMs during inference. Extensive experiments on two benchmark KGQA datasets demonstrate that RoG achieves state-of-the-art performance on KG reasoning tasks and generates faithful and interpretable reasoning results.