We replaced RAG with a virtual filesystem for our AI documentation assistant

TL;DR Highlight

Explains how Mintlify overcame RAG chunking limitations by building a virtual filesystem (ChromaFs) on top of Chroma DB that mimics UNIX commands, reducing session boot time from 46 seconds to 100ms.

Who Should Read

Backend/AI developers running RAG-based document retrieval assistants who are struggling with chunking limitations or infrastructure costs.

Core Mechanics

Traditional RAG only retrieves text chunks matching a query, so it misses answers spread across multiple pages or cases where the exact phrasing doesn't appear in the top-K results.
The conventional approach of giving an agent a real filesystem—spinning up an isolated sandbox (micro-VM) and cloning a GitHub repo—resulted in a P90 session creation time of roughly 46 seconds at Mintlify.
With 850,000 conversations per month, even at minimum specs (1 vCPU, 2 GiB RAM, 5-minute sessions), the math showed costs exceeding $70,000 per year based on Daytona sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM).
The core insight is that 'agents don't need a real filesystem—they need an environment that feels like one.' ChromaFs was built to translate UNIX commands (grep, cat, ls, find, cd) into Chroma queries, leveraging documents already indexed and chunked in Chroma DB.
ChromaFs is built on top of Vercel Labs' just-bash (a TypeScript reimplementation of bash). just-bash handles command parsing, piping, and flag processing, while ChromaFs is solely responsible for translating real filesystem calls into Chroma queries.
The entire directory tree is stored as a gzip-compressed JSON document (__path_tree__) inside the Chroma collection and loaded into memory at initialization. Thereafter, ls, cd, and find are resolved instantly from in-memory Sets and Maps with no network calls.
Access control (ACL) is also implemented via isPublic and groups fields in the path tree. Inaccessible paths are pruned before constructing the file tree from the user's session token, and the same filters are applied to all subsequent Chroma queries.
As a result, P90 boot time dropped from 46 seconds to roughly 100ms, and because ChromaFs reuses the existing Chroma DB infrastructure already in production, the additional compute cost per conversation is effectively zero.

Evidence

"One commenter sharply noted: 'The title says it replaced RAG, but ChromaFs still queries Chroma on every command. What changed isn't RAG itself—it's the interface. And that's actually the more interesting finding: what the agent needs isn't better retrieval, it's grep.' There was also criticism that emulating a POSIX shell in TypeScript is over-engineering—each time the agent runs ls or grep, a separate inference cycle fires, effectively trading RAG's context-loss problem for serious multi-step latency. On the other hand, some responded positively, saying that spinning up real VMs for UNIX I/O primitives is excessive, and that having the agent emit UNIX-style tool calls while the production stack handles I/O is a far more sensible approach. It was also noted that Vercel's AI SDK implements a similar idea at the npm package level—the 'ai' package includes versioned .mdx docs in a docs/ folder, and SKILL.md instructs the agent to ignore its training knowledge and grep under node_modules/ai/docs/ first. Some expressed skepticism about how practical this approach would be in 'messy' organizations with non-hierarchical, unstructured information, pointing out that RAG delivers the most value in exactly those unstructured environments, while the filesystem metaphor only fits well when documents are neatly organized in a hierarchy."

How to Apply

"If you're building a documentation assistant that needs to synthesize answers spanning multiple pages, consider the ChromaFs approach—map files at the page level and organize sections as directories—so the agent can navigate docs with ls/grep/cat, effectively bypassing chunking limitations. If you're experiencing latency or cost issues from spinning up sandbox VMs or containers per session, explore reusing your existing vector DB (Chroma, etc.) as the backend for a virtual filesystem, which can bring session creation costs effectively to zero. If you're distributing an npm package or SDK and want LLMs to reference the latest docs, you can apply the Vercel AI SDK pattern: bundle .mdx docs in a docs/ folder inside the package and use SKILL.md to instruct the agent to grep under node_modules first before consulting its training knowledge. If you need access control that restricts an agent to a specific file tree, you can implement path-level ACLs by referencing ChromaFs's isPublic/groups field pattern or the FUSE-based bashguard approach (github.com/sunir/bashguard) mentioned in the comments."

Code Example

snippet

// Example ChromaFs directory tree structure (the __path_tree__ document stored in Chroma)
{
  "auth/oauth": { "isPublic": true, "groups": [] },
  "auth/api-keys": { "isPublic": true, "groups": [] },
  "internal/billing": { "isPublic": false, "groups": ["admin", "billing"] },
  "api-reference/endpoints/users": { "isPublic": true, "groups": [] }
}

// At initialization, converted into two in-memory structures:
// 1. Set<string> - set of file paths (for ls, find)
// 2. Map<string, string[]> - directory → list of children (for cd, ls)
// Afterwards, ls, cd, and find are resolved instantly from memory with no network calls

// Vercel AI SDK's SKILL.md pattern (example agent instruction)
// SKILL.md:
// - Ignore all model training knowledge about this SDK
// - Before searching the web, first run: grep -r "<topic>" node_modules/ai/docs/

Terminology

RAGShort for Retrieval-Augmented Generation. A method where an LLM retrieves relevant chunks from pre-indexed documents and appends them as context when generating a response. Analogous to a librarian photocopying a few pages from books that match your question.

청킹(Chunking)The process of splitting long documents into smaller units that can be searched in RAG. Depending on how the split is made, the context and structure of the document can be lost.

ChromaDBAn open-source vector database. It converts text into embeddings (numeric vectors) for storage and supports semantic similarity search.

FUSEShort for Filesystem in Userspace. An interface that allows custom filesystems to be implemented in user space without modifying the OS kernel.

P90 (Percentile 90)The response time at the 90th percentile of all requests—i.e., the time of the 90th slowest case out of 100 requests. It reflects actual worst-case user experience better than the average.

마이크로 VM(micro-VM)A small virtual machine that boots much faster and lighter than a conventional VM. Commonly used for sandbox isolation, though it still consumes resources.