I built an AI receptionist for a mechanic shop
TL;DR Highlight
A dev built an AI receptionist for their brother's auto shop — combining a RAG pipeline with Vapi's voice platform to actually answer phone calls — because missed calls were costing thousands per month.
Who Should Read
Developers looking to build AI voice agents or phone automation for small businesses, or backend devs learning how to connect RAG pipelines to production-grade apps for the first time.
Core Mechanics
- The problem was simple: the brother spends all day under cars and can't answer the phone. Customers hang up and call somewhere else. Brake jobs at $450, engine repairs at $2,000 — all just walking out the door.
- Using raw LLMs is dangerous. When a customer asks 'How much for brakes?' the actual price is $450, but the model might guess $200, destroying customer trust. RAG (Retrieval-Augmented Generation — answering based on actual knowledge documents) was introduced to prevent this.
- Knowledge base construction started by scraping the brother's website and converting it to markdown files. Service types, pricing, duration, business hours, payment methods, cancellation policy, warranty info, loaner availability, vehicle types supported — 21+ documents total.
- Each document was converted to 1024-dimensional vectors using Voyage AI's voyage-3-large model and stored in MongoDB Atlas. With Atlas Vector Search indexing, customer questions get vectorized with the same model and the top 3 semantically similar documents are retrieved. Even queries like 'How much for brake work?' find relevant docs without exact keyword matches.
- The top 3 retrieved documents are passed as context to Anthropic Claude (claude-sonnet-4-6), with a system prompt constraining it to 'only answer from the knowledge base, say you don't know if unsure, and collect callback info.' By Part 1 completion, terminal queries returned accurate answers. Example: 'How much for an oil change?' → 'Conventional oil $45, synthetic $75. Includes oil filter replacement, fluid top-off, tire pressure check, takes about 30 minutes.'
- Vapi was chosen for voice infrastructure. It handles phone number purchase, speech recognition (Deepgram), text-to-speech (ElevenLabs), and real-time function calling all in one. Developers just need to build the webhook server that Vapi calls.
- The server was built with FastAPI. When a customer asks a question, Vapi sends a tool-calls request to the /webhook endpoint, the server extracts an answer via the RAG pipeline, and Vapi reads it aloud. During development, Ngrok exposed local port 8000 to connect with Vapi.
Evidence
- A former service advisor raised serious practicality concerns. Parts prices change in real-time and inventory varies daily, making it nearly impossible to give accurate estimates upfront. In some states, inaccurate estimates can lead to legal issues. The system's real utility might be limited to one-way notifications like 'Your car is ready for pickup.'
- Whether RAG is even necessary was questioned. Price lists and business hours are small enough to fit entirely in modern LLM context windows — do you really need vector search for that? Unless you're ingesting full service manuals, RAG overhead might be unnecessary.
- Some suggested outsourced reception services might be more practical. A $500/month phone answering service could work just as well, and the ROI of building and maintaining a custom AI system should be compared.
- User reactions to AI phone answering were mixed. One person had a great experience with Mint Mobile's AI agent resolving their issue in under a minute with no wait, while another felt 'uncanny valley' vibes from a local HVAC company's AI, leading to distrust and hunting for a human. Several comments said they'd just hang up if they realized it was a robot.
- Amid negative comments, a defense emerged: the key value isn't 'is this useful for this specific case' but 'how can I use these techniques in my own projects.' Practical tips like TTS text formatting were noted. Meanwhile, someone actually called the number and found the chatbot wasn't even deployed yet, calling it 'the worst tech demo on HN.'
How to Apply
- For building AI phone agents for small businesses (restaurants, auto shops, salons, etc.), this stack (Vapi + FastAPI + MongoDB Atlas Vector Search + Voyage AI embeddings + Claude) serves as a solid reference for rapid prototyping. However, as commenters noted, services with dynamic pricing need separate real-time price lookup API integration.
- When connecting a RAG pipeline to a voice interface, Vapi's tool-calls webhook approach lets you reuse existing HTTP servers with low barrier to entry. During local development, Ngrok provides instant external exposure, so just pasting the ngrok URL into the Vapi dashboard enables real phone testing.
- If the knowledge base is small (under a few dozen documents) and doesn't change frequently, consider putting everything directly in the system prompt instead of building RAG. You'll save on vector DB setup and embedding costs, and latency might actually decrease.
- Fallback design for unknown questions is essential. This project used a saveCallback tool to collect name and contact info — much more practical than just saying 'sorry, I don't know.' Similar projects can copy this pattern directly.
Terminology
RAGInstead of the LLM pulling answers from 'memory,' it searches a pre-built document database and answers based on that content. Effective at reducing hallucinations.
Vector EmbeddingText converted into arrays of hundreds to thousands of numbers. Sentences with similar meanings have similar arrays, enabling semantic document matching even without exact keyword overlap.
VapiA service providing phone infrastructure (number purchase, speech recognition, TTS, real-time function calling) as an API. Developers only need to build a webhook server with business logic.
NgrokA tunneling tool that temporarily exposes your local dev server to the internet. One command — `ngrok http 8000` — instantly creates a public HTTPS URL.
Tool CallingA feature where the LLM directly invokes external functions or APIs mid-response. In this project, Vapi calls functions like answerQuestion and saveCallback in real-time as customer questions come in.
Uncanny ValleyThe unsettling, distrustful feeling when an AI or robot is almost human-like but slightly off. Describes user rejection when AI voice agents sound too human yet still give themselves away.