We replaced RAG with a virtual filesystem for our AI documentation assistant
TL;DR Highlight
Explains how Mintlify overcame RAG chunking limitations by building a virtual filesystem (ChromaFs) on top of Chroma DB that mimics UNIX commands, reducing session boot time from 46 seconds to 100ms.
Who Should Read
Backend/AI developers running RAG-based document retrieval assistants who are struggling with chunking limitations or infrastructure costs.
Core Mechanics
- Traditional RAG only retrieves text chunks matching a query, so it misses answers spread across multiple pages or cases where the exact phrasing doesn't appear in the top-K results.
- The conventional approach of giving an agent a real filesystem—spinning up an isolated sandbox (micro-VM) and cloning a GitHub repo—resulted in a P90 session creation time of roughly 46 seconds at Mintlify.
- With 850,000 conversations per month, even at minimum specs (1 vCPU, 2 GiB RAM, 5-minute sessions), the math showed costs exceeding $70,000 per year based on Daytona sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM).
- The core insight is that 'agents don't need a real filesystem—they need an environment that feels like one.' ChromaFs was built to translate UNIX commands (grep, cat, ls, find, cd) into Chroma queries, leveraging documents already indexed and chunked in Chroma DB.
- ChromaFs is built on top of Vercel Labs' just-bash (a TypeScript reimplementation of bash). just-bash handles command parsing, piping, and flag processing, while ChromaFs is solely responsible for translating real filesystem calls into Chroma queries.
- The entire directory tree is stored as a gzip-compressed JSON document (__path_tree__) inside the Chroma collection and loaded into memory at initialization. Thereafter, ls, cd, and find are resolved instantly from in-memory Sets and Maps with no network calls.
- Access control (ACL) is also implemented via isPublic and groups fields in the path tree. Inaccessible paths are pruned before constructing the file tree from the user's session token, and the same filters are applied to all subsequent Chroma queries.
- As a result, P90 boot time dropped from 46 seconds to roughly 100ms, and because ChromaFs reuses the existing Chroma DB infrastructure already in production, the additional compute cost per conversation is effectively zero.
Evidence
- "One commenter sharply noted: 'The title says it replaced RAG, but ChromaFs still queries Chroma on every command. What changed isn't RAG itself—it's the interface. And that's actually the more interesting finding: what the agent needs isn't better retrieval, it's grep.' There was also criticism that emulating a POSIX shell in TypeScript is over-engineering—each time the agent runs ls or grep, a separate inference cycle fires, effectively trading RAG's context-loss problem for serious multi-step latency. On the other hand, some responded positively, saying that spinning up real VMs for UNIX I/O primitives is excessive, and that having the agent emit UNIX-style tool calls while the production stack handles I/O is a far more sensible approach. It was also noted that Vercel's AI SDK implements a similar idea at the npm package level—the 'ai' package includes versioned .mdx docs in a docs/ folder, and SKILL.md instructs the agent to ignore its training knowledge and grep under node_modules/ai/docs/ first. Some expressed skepticism about how practical this approach would be in 'messy' organizations with non-hierarchical, unstructured information, pointing out that RAG delivers the most value in exactly those unstructured environments, while the filesystem metaphor only fits well when documents are neatly organized in a hierarchy."
How to Apply
- "If you're building a documentation assistant that needs to synthesize answers spanning multiple pages, consider the ChromaFs approach—map files at the page level and organize sections as directories—so the agent can navigate docs with ls/grep/cat, effectively bypassing chunking limitations. If you're experiencing latency or cost issues from spinning up sandbox VMs or containers per session, explore reusing your existing vector DB (Chroma, etc.) as the backend for a virtual filesystem, which can bring session creation costs effectively to zero. If you're distributing an npm package or SDK and want LLMs to reference the latest docs, you can apply the Vercel AI SDK pattern: bundle .mdx docs in a docs/ folder inside the package and use SKILL.md to instruct the agent to grep under node_modules first before consulting its training knowledge. If you need access control that restricts an agent to a specific file tree, you can implement path-level ACLs by referencing ChromaFs's isPublic/groups field pattern or the FUSE-based bashguard approach (github.com/sunir/bashguard) mentioned in the comments."
Code Example
// Example ChromaFs directory tree structure (the __path_tree__ document stored in Chroma)
{
"auth/oauth": { "isPublic": true, "groups": [] },
"auth/api-keys": { "isPublic": true, "groups": [] },
"internal/billing": { "isPublic": false, "groups": ["admin", "billing"] },
"api-reference/endpoints/users": { "isPublic": true, "groups": [] }
}
// At initialization, converted into two in-memory structures:
// 1. Set<string> - set of file paths (for ls, find)
// 2. Map<string, string[]> - directory → list of children (for cd, ls)
// Afterwards, ls, cd, and find are resolved instantly from memory with no network calls
// Vercel AI SDK's SKILL.md pattern (example agent instruction)
// SKILL.md:
// - Ignore all model training knowledge about this SDK
// - Before searching the web, first run: grep -r "<topic>" node_modules/ai/docs/Terminology
Related Papers
Show HN: CLI tool for detecting non-exact code duplication with embedding models
복사-붙여넣기가 아닌 '의미적으로 유사한' 코드 중복을 임베딩 기반으로 찾아주는 CLI 도구로, AI 코딩 에이전트와 연계해 대규모 코드베이스의 숨겨진 중복을 제거하는 데 활용할 수 있다.
Show HN: Bible as RAG Database
성경 전체를 RAG(검색 증강 생성) 데이터베이스로 인덱싱해 주제나 키워드로 관련 성경 구절을 의미론적으로 검색할 수 있는 웹 서비스다. 종교 텍스트에 RAG를 적용한 실용적 예시로, 유사한 프로젝트를 만들려는 개발자에게 참고가 된다.
Haystack: Open-Source AI Framework for Production Ready Agents, RAG
deepset이 만든 오픈소스 AI 오케스트레이션 프레임워크로, LangChain의 대안으로 주목받고 있으며 모듈형 파이프라인 방식으로 RAG·Agent·멀티모달 앱을 프로덕션까지 구축할 수 있다.
We built a persistent agent memory layer on Elasticsearch with 0.89 recall
AI 에이전트가 세션이 끝나도 사용자 정보를 기억할 수 있도록 Elasticsearch 위에 구축한 멀티테넌트 장기 메모리 시스템 아키텍처 공개. 168개 질문 기준 R@10 0.89, 테넌트 간 데이터 누출 0건을 달성한 구체적인 구현 방법을 담았다.
TAHOE: Text-to-SQL with Automated Hint Optimization from Experience
LLM이 SQL 생성 실패에서 배운 힌트를 재사용 가능한 Hint Bank로 쌓아, 모델 재학습 없이 Snowflake 방언 SQL 정확도를 대폭 끌어올리는 시스템.
Inside FAISS: Billion-Scale Similarity Search
FAISS가 수십억 개 벡터를 빠르게 검색하는 핵심 알고리즘인 IVF(파티셔닝)와 Product Quantization(압축)을 시각적으로 설명한 글로, RAG나 벡터 검색 시스템을 구축하는 개발자에게 내부 동작 원리를 이해시켜 준다.
Related Resources
- Original post: How we built a virtual filesystem for our Assistant (Mintlify)
- just-bash: Vercel Labs' TypeScript reimplementation of bash
- Vercel AI SDK SKILL.md (pattern for instructing agents to grep local docs first)
- bashguard: FUSE-based virtual filesystem for agent access control (in development)
- Semantic search without embeddings (Software Doug blog)
- Charting the Knowledge Space (related notes)