Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph

Jul 15, 2023•Jiashuo Sun, Chengjin Xu, Lumingyuan Tang +6•View PDF

TL;DR Highlight

The ToG framework where LLMs directly traverse Knowledge Graphs using beam search to find reasoning paths — achieving SOTA on 6 of 9 datasets without additional training.

Who Should Read

AI backend developers who want to reduce LLM hallucinations. Especially developers connecting Knowledge Graphs or structured external knowledge to LLM reasoning.

Core Mechanics

Proposes 'LLM ⊗ KG' paradigm where LLM directly runs beam search on the KG to explore reasoning paths, instead of using LLM as a mere translator in existing 'LLM ⊕ KG' approaches
Each step: explore relations → prune → explore entities → prune, with the LLM autonomously deciding 'do I have enough information?' for early termination
Can supplement with LLM internal knowledge for info not in KG — can still find answers even when the KG is incomplete
Explicit reasoning paths remain for knowledge traceability and automatic KG correction
Llama-2-70B + ToG combination outperforms standalone GPT-4 CoT in some cases — operable without expensive large models
Plug-and-play structure works with various LLMs (ChatGPT, GPT-4, Llama-2) and KGs (Freebase, Wikidata) without training

Evidence

SOTA on 6 of 9 datasets (WebQSP 82.6%, GrailQA 81.4%, Zero-Shot RE 88.3%, Creak 95.6%) — most previous SOTAs were fine-tuned models
CoT comparison: GrailQA +51.8%, Zero-Shot RE +42.9% improvement (ChatGPT baseline)
Llama-2-70B + ToG on CWQ: 53.6% > GPT-4 standalone CoT 46.0% — cheaper model beats more expensive model
GPT-4 + ToG on CWQ: 69.5% vs fine-tuning SOTA 70.4% — closed to within 0.9%p without additional training

How to Apply

When replacing vector search with KG in an existing RAG pipeline: use the paper's relation prune prompt (E.3.1) and entity prune prompt (E.3.2) as-is, starting with depth=3, width=3
To reduce costs, use the ToG-R variant — replace the entity prune step with random sampling instead of LLM calls to halve LLM calls, or replace with BM25/SentenceBERT for only D+1 calls
When KG quality improvement is needed: exposing ToG reasoning paths to users can help discover outdated/incorrect triples, and having the LLM auto-generate correction suggestions for gradual KG refinement

Code Example

snippet

Terminology

Knowledge Graph (KG)A structured knowledge DB storing entities (people, places, concepts) and their relations as (subject, relation, object) triples. Like Wikipedia but machine-understandable.

Beam SearchA method to find the best path while keeping the top N candidates simultaneously. Like a navigation app computing several optimal routes at once.

HallucinationWhen an LLM plausibly generates content that isn't factual. Confidently stating incorrect information.

Multi-hop ReasoningReasoning requiring multiple steps to find an answer. E.g., 'majority party of the country with capital Canberra' → Canberra→Australia→PM→party needs 3+ hops.

Knowledge TraceabilityThe ability to trace which knowledge was used to arrive at an answer. Explicit reasoning paths make it possible to identify where errors came from.

Related Resources

Think-on-Graph GitHub (IDEA-FinAI/ToG)

Original Abstract (Expand)

Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``$\hbox{LLM}\otimes\hbox{KG}$'' which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. We further implement this paradigm by introducing a new approach called Think-on-Graph (ToG), in which the LLM agent iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. We use a number of well-designed experiments to examine and illustrate the following advantages of ToG: 1) compared with LLMs, ToG has better deep reasoning power; 2) ToG has the ability of knowledge traceability and knowledge correctability by leveraging LLMs reasoning and expert feedback; 3) ToG provides a flexible plug-and-play framework for different LLMs, KGs and prompting strategies without any additional training cost; 4) the performance of ToG with small LLM models could exceed large LLM such as GPT-4 in certain scenarios and this reduces the cost of LLM deployment and application. As a training-free method with lower computational cost and better generality, ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.