I built an AI receptionist for a mechanic shop
TL;DR Highlight
A dev built an AI receptionist for their brother's auto shop — combining a RAG pipeline with Vapi's voice platform to actually answer phone calls — because missed calls were costing thousands per month.
Who Should Read
Developers looking to build AI voice agents or phone automation for small businesses, or backend devs learning how to connect RAG pipelines to production-grade apps for the first time.
Core Mechanics
- The problem was simple: the brother spends all day under cars and can't answer the phone. Customers hang up and call somewhere else. Brake jobs at $450, engine repairs at $2,000 — all just walking out the door.
- Using raw LLMs is dangerous. When a customer asks 'How much for brakes?' the actual price is $450, but the model might guess $200, destroying customer trust. RAG (Retrieval-Augmented Generation — answering based on actual knowledge documents) was introduced to prevent this.
- Knowledge base construction started by scraping the brother's website and converting it to markdown files. Service types, pricing, duration, business hours, payment methods, cancellation policy, warranty info, loaner availability, vehicle types supported — 21+ documents total.
- Each document was converted to 1024-dimensional vectors using Voyage AI's voyage-3-large model and stored in MongoDB Atlas. With Atlas Vector Search indexing, customer questions get vectorized with the same model and the top 3 semantically similar documents are retrieved. Even queries like 'How much for brake work?' find relevant docs without exact keyword matches.
- The top 3 retrieved documents are passed as context to Anthropic Claude (claude-sonnet-4-6), with a system prompt constraining it to 'only answer from the knowledge base, say you don't know if unsure, and collect callback info.' By Part 1 completion, terminal queries returned accurate answers. Example: 'How much for an oil change?' → 'Conventional oil $45, synthetic $75. Includes oil filter replacement, fluid top-off, tire pressure check, takes about 30 minutes.'
- Vapi was chosen for voice infrastructure. It handles phone number purchase, speech recognition (Deepgram), text-to-speech (ElevenLabs), and real-time function calling all in one. Developers just need to build the webhook server that Vapi calls.
- The server was built with FastAPI. When a customer asks a question, Vapi sends a tool-calls request to the /webhook endpoint, the server extracts an answer via the RAG pipeline, and Vapi reads it aloud. During development, Ngrok exposed local port 8000 to connect with Vapi.
Evidence
- A former service advisor raised serious practicality concerns. Parts prices change in real-time and inventory varies daily, making it nearly impossible to give accurate estimates upfront. In some states, inaccurate estimates can lead to legal issues. The system's real utility might be limited to one-way notifications like 'Your car is ready for pickup.'
- Whether RAG is even necessary was questioned. Price lists and business hours are small enough to fit entirely in modern LLM context windows — do you really need vector search for that? Unless you're ingesting full service manuals, RAG overhead might be unnecessary.
- Some suggested outsourced reception services might be more practical. A $500/month phone answering service could work just as well, and the ROI of building and maintaining a custom AI system should be compared.
- User reactions to AI phone answering were mixed. One person had a great experience with Mint Mobile's AI agent resolving their issue in under a minute with no wait, while another felt 'uncanny valley' vibes from a local HVAC company's AI, leading to distrust and hunting for a human. Several comments said they'd just hang up if they realized it was a robot.
- Amid negative comments, a defense emerged: the key value isn't 'is this useful for this specific case' but 'how can I use these techniques in my own projects.' Practical tips like TTS text formatting were noted. Meanwhile, someone actually called the number and found the chatbot wasn't even deployed yet, calling it 'the worst tech demo on HN.'
How to Apply
- For building AI phone agents for small businesses (restaurants, auto shops, salons, etc.), this stack (Vapi + FastAPI + MongoDB Atlas Vector Search + Voyage AI embeddings + Claude) serves as a solid reference for rapid prototyping. However, as commenters noted, services with dynamic pricing need separate real-time price lookup API integration.
- When connecting a RAG pipeline to a voice interface, Vapi's tool-calls webhook approach lets you reuse existing HTTP servers with low barrier to entry. During local development, Ngrok provides instant external exposure, so just pasting the ngrok URL into the Vapi dashboard enables real phone testing.
- If the knowledge base is small (under a few dozen documents) and doesn't change frequently, consider putting everything directly in the system prompt instead of building RAG. You'll save on vector DB setup and embedding costs, and latency might actually decrease.
- Fallback design for unknown questions is essential. This project used a saveCallback tool to collect name and contact info — much more practical than just saying 'sorry, I don't know.' Similar projects can copy this pattern directly.
Terminology
Related Papers
Show HN: Bible as RAG Database
성경 전체를 RAG(검색 증강 생성) 데이터베이스로 인덱싱해 주제나 키워드로 관련 성경 구절을 의미론적으로 검색할 수 있는 웹 서비스다. 종교 텍스트에 RAG를 적용한 실용적 예시로, 유사한 프로젝트를 만들려는 개발자에게 참고가 된다.
Haystack: Open-Source AI Framework for Production Ready Agents, RAG
deepset이 만든 오픈소스 AI 오케스트레이션 프레임워크로, LangChain의 대안으로 주목받고 있으며 모듈형 파이프라인 방식으로 RAG·Agent·멀티모달 앱을 프로덕션까지 구축할 수 있다.
We built a persistent agent memory layer on Elasticsearch with 0.89 recall
AI 에이전트가 세션이 끝나도 사용자 정보를 기억할 수 있도록 Elasticsearch 위에 구축한 멀티테넌트 장기 메모리 시스템 아키텍처 공개. 168개 질문 기준 R@10 0.89, 테넌트 간 데이터 누출 0건을 달성한 구체적인 구현 방법을 담았다.
TAHOE: Text-to-SQL with Automated Hint Optimization from Experience
LLM이 SQL 생성 실패에서 배운 힌트를 재사용 가능한 Hint Bank로 쌓아, 모델 재학습 없이 Snowflake 방언 SQL 정확도를 대폭 끌어올리는 시스템.
Inside FAISS: Billion-Scale Similarity Search
FAISS가 수십억 개 벡터를 빠르게 검색하는 핵심 알고리즘인 IVF(파티셔닝)와 Product Quantization(압축)을 시각적으로 설명한 글로, RAG나 벡터 검색 시스템을 구축하는 개발자에게 내부 동작 원리를 이해시켜 준다.
Show HN: Airbyte Agents – context for agents across multiple data sources
Airbyte가 Slack, Salesforce, Linear 등 여러 SaaS 시스템의 데이터를 미리 인덱싱해서 Agent가 API를 일일이 뒤지지 않아도 되는 Context Store를 출시했다. 기존 MCP 방식보다 토큰을 최대 90%까지 줄이는 효과를 확인했다.