Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph
TL;DR Highlight
The ToG framework where LLMs directly traverse Knowledge Graphs using beam search to find reasoning paths — achieving SOTA on 6 of 9 datasets without additional training.
Who Should Read
AI backend developers who want to reduce LLM hallucinations. Especially developers connecting Knowledge Graphs or structured external knowledge to LLM reasoning.
Core Mechanics
- Proposes 'LLM ⊗ KG' paradigm where LLM directly runs beam search on the KG to explore reasoning paths, instead of using LLM as a mere translator in existing 'LLM ⊕ KG' approaches
- Each step: explore relations → prune → explore entities → prune, with the LLM autonomously deciding 'do I have enough information?' for early termination
- Can supplement with LLM internal knowledge for info not in KG — can still find answers even when the KG is incomplete
- Explicit reasoning paths remain for knowledge traceability and automatic KG correction
- Llama-2-70B + ToG combination outperforms standalone GPT-4 CoT in some cases — operable without expensive large models
- Plug-and-play structure works with various LLMs (ChatGPT, GPT-4, Llama-2) and KGs (Freebase, Wikidata) without training
Evidence
- SOTA on 6 of 9 datasets (WebQSP 82.6%, GrailQA 81.4%, Zero-Shot RE 88.3%, Creak 95.6%) — most previous SOTAs were fine-tuned models
- CoT comparison: GrailQA +51.8%, Zero-Shot RE +42.9% improvement (ChatGPT baseline)
- Llama-2-70B + ToG on CWQ: 53.6% > GPT-4 standalone CoT 46.0% — cheaper model beats more expensive model
- GPT-4 + ToG on CWQ: 69.5% vs fine-tuning SOTA 70.4% — closed to within 0.9%p without additional training
How to Apply
- When replacing vector search with KG in an existing RAG pipeline: use the paper's relation prune prompt (E.3.1) and entity prune prompt (E.3.2) as-is, starting with depth=3, width=3
- To reduce costs, use the ToG-R variant — replace the entity prune step with random sampling instead of LLM calls to halve LLM calls, or replace with BM25/SentenceBERT for only D+1 calls
- When KG quality improvement is needed: exposing ToG reasoning paths to users can help discover outdated/incorrect triples, and having the LLM auto-generate correction suggestions for gradual KG refinement
Code Example
Terminology
Related Resources
Original Abstract (Expand)
Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``$\hbox{LLM}\otimes\hbox{KG}$'' which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. We further implement this paradigm by introducing a new approach called Think-on-Graph (ToG), in which the LLM agent iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. We use a number of well-designed experiments to examine and illustrate the following advantages of ToG: 1) compared with LLMs, ToG has better deep reasoning power; 2) ToG has the ability of knowledge traceability and knowledge correctability by leveraging LLMs reasoning and expert feedback; 3) ToG provides a flexible plug-and-play framework for different LLMs, KGs and prompting strategies without any additional training cost; 4) the performance of ToG with small LLM models could exceed large LLM such as GPT-4 in certain scenarios and this reduces the cost of LLM deployment and application. As a training-free method with lower computational cost and better generality, ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.