RAG
Latest 31 papers in RAG.
Show HN: Atomic – Local-first, AI-augmented personal knowledge base
Atomic builds a self-hosted, open-source personal knowledge graph app that automatically embeds, tags, and links notes, web clips, and RSS feeds—supporting semantic search, LLM-powered wiki synthesis, and MCP integration.
We replaced RAG with a virtual filesystem for our AI documentation assistant
Explains how Mintlify overcame RAG chunking limitations by building a virtual filesystem (ChromaFs) on top of Chroma DB that mimics UNIX commands, reducing session boot time from 46 seconds to 100ms.
Chroma Context-1: Training a Self-Editing Search Agent
Chroma's newly released 20B parameter agentic search model claims frontier-LLM-level retrieval performance at 1/10 the cost and 10x the speed — though a significant controversy over failure to cite prior work has emerged in the community.
Show HN: Gemini can now natively embed video, so I built sub-second video search
Google's Gemini Embedding model can now embed video directly into vectors without text transcription, enabling natural language search over dashcam footage — describe 'red truck running a stop sign' and get the clip back.
From zero to a RAG system: successes and failures
A hands-on account of building a local LLM-based RAG system from scratch on 1TB of internal technical documentation, honestly sharing the trial and error encountered from data preprocessing to vector indexing.
I built an AI receptionist for a mechanic shop
A dev built an AI receptionist for their brother's auto shop — combining a RAG pipeline with Vapi's voice platform to actually answer phone calls — because missed calls were costing thousands per month.
Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents
An LLM memory system that compresses conversations into semantic triples, cutting tokens by 95% while maintaining top-tier accuracy.
BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection
Structuring long documents page-by-page and compressing without truncation achieves 26.4x faster compression than LongLLMLingua.
[R] Doc-to-LoRA: Learning to Instantly Internalize Contexts from Sakana AI
Sakana AI D2L — hypernetwork generates LoRA adapter from a document in a single forward pass, sub-second latency, extends context window 5x beyond base model capacity
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World
Multilingual embeddings supporting 200 languages without English bias that outperform Qwen3-Embedding at smaller sizes.
Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval
Instead of simple topic search in RAG, using a 'hypothesis → 3 targeted queries' approach retrieves documents that actually help select the right answer.
Launch HN: Captain (YC W26) – Automated RAG for Files
YC W26 startup Captain auto-builds your entire RAG pipeline from a file upload — no configuration required.
Structured Distillation for Personalized Agent Memory: 11x Token Reduction with Retrieval Preservation
Compresses AI coding agent conversation histories 11x into searchable memory — with almost no quality loss on vector search.
Tiny-Critic RAG: Empowering Agentic Fallback with Parameter-Efficient Small Language Models
Implementing GPT-4o-mini-level RAG noise filtering with a 1.7B small model — 98% cost reduction, 94.6% latency reduction.
Data Distribution Matters: A Data-Centric Perspective on Context Compression for Large Language Model
LLM context compression quality is determined by data distribution, not model architecture — and the decoder's training data dominates over the encoder's.
Large Language Models for Assisting American College Applications
A practical LLM system architecture paper for US college application assistance, built on RAG + Human-in-the-loop design.
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
When wrong retrieval results mix into RAG, LLMs become confidently wrong — this paper fixes it with just 2K data fine-tuning
Over-Searching in Search-Augmented Large Language Models
A systematic study on how LLMs equipped with search tools wastefully repeat searches even for unanswerable questions, driving up costs and error rates.
Decide Then Retrieve: A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval
A framework that reduces RAG noise by first judging whether retrieval is needed based on LLM uncertainty (instead of always retrieving), then searching via two parallel paths — the original query and a pseudo-document.
Empowering farmers with artificial intelligence: a retrieval-augmented generation based large language model advisory framework
In a RAG-based crop cultivation/pest/fertilizer AI advisor for farmers, Mistral and Qwen2.5 outperform larger models like GPT-4.
Synthesizing scientific literature with retrieval-augmented language models
A RAG-based scientific literature synthesis model that searches 45 million open-access papers and attaches citation sources.
The maths you need to start understanding LLMs
You only need high school-level vector and matrix math to understand how LLMs reason — this post walks through it step by step.
Show HN: Building a web search engine from scratch with 3B neural embeddings
One developer built a web search engine with 3 billion SBERT embeddings and 280 million indexed pages in 2 months — showing the real architecture and cost structure of a vector search-based system at scale.
DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning
A 3B model using RL to self-optimize queries achieved more than 2x better retrieval performance than GPT-4o and Claude-3.5-Sonnet.
GNN-RAG: Graph Neural Retrieval for Efficient Large Language Model Reasoning on Knowledge Graphs
Using a GNN to pre-filter relevant paths from a Knowledge Graph before passing them to an LLM improves both KGQA accuracy and speed.
Leveraging long context in retrieval augmented language models for medical question answering
Solving the problem of key information in the middle of long medical documents being ignored in RAG using a map-reduce strategy.
Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning
How LLMs use Knowledge Graph relation paths via a Plan-Retrieve-Reason pipeline to give accurate answers without hallucination.
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph
The ToG framework where LLMs directly traverse Knowledge Graphs using beam search to find reasoning paths — achieving SOTA on 6 of 9 datasets without additional training.
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph
LLMs traverse Knowledge Graphs step-by-step to reduce hallucinations and improve accuracy.
Rethinking with Retrieval: Faithful Large Language Model Inference
A post-processing technique that searches external knowledge at each CoT reasoning step and selects the answer most faithful to facts.
A Survey on Open Information Extraction from Rule-based Model to Large Language Model
A survey covering the evolution of OpenIE — extracting relation triples from unstructured text — from 2007 to 2024.