Show HN: Building a web search engine from scratch with 3B neural embeddings
TL;DR Highlight
One developer built a web search engine with 3 billion SBERT embeddings and 280 million indexed pages in 2 months — showing the real architecture and cost structure of a vector search-based system at scale.
Who Should Read
Backend/infrastructure engineers building vector search or embedding-based retrieval systems, or developers curious about the real-world architecture of large-scale crawling and indexing pipelines.
Core Mechanics
- Instead of keyword matching, uses SBERT (Multi-QA-MPNET-base-dot-v1, 768-dim) embeddings for intent-based search — e.g., a vague query like 'why isn't CORS working' finds exact eventual consistency answers in S3 docs.
- Generated 3 billion embeddings at 100K/sec using 250 Runpod RTX 4090 GPUs — total compute cost under $50K.
- Used Hetzner auction servers (42x cheaper than AWS for high-memory) and Runpod GPUs (4.3x cheaper) to dramatically cut costs vs. mainstream cloud.
- Built custom HNSW + RocksDB instead of managed vector DB services for cost control at scale.
Evidence
- Community reactions ranged from 'this is a 10x engineer building Google in spare time' to offers of seed investment based on the $50K operating cost.
- Pure vector search limitations noted — searching 'garbanzo bean stew' returned wrong bean recipes, highlighting the need for hybrid approaches combining keyword and semantic search.
- Architecture choices (Hetzner over AWS, custom HNSW over managed vector DB) were validated as pragmatic cost optimizations.
How to Apply
- For embedding-based search systems: combine Hetzner auction servers (42x cheaper high-memory) and Runpod GPUs (4.3x cheaper) instead of AWS to dramatically cut costs. Consider custom HNSW + RocksDB over managed vector DB services like Turbopuffer.
- To improve chunking quality in RAG pipelines: go beyond simple token-count splitting. Use semantic boundaries (paragraph breaks, section headers) and overlap chunks for better retrieval.
- Pure vector search has limitations — consider hybrid search combining BM25 keyword matching with semantic embedding similarity for production systems.
Code Example
// RocksDB optimization settings (Rust)
opt.set_max_background_jobs(num_cpus::get() as i32 * 2);
opt.set_bytes_per_sync(1024 * 1024 * 4);
opt.set_enable_blob_files(true);
opt.set_min_blob_size(1024); // Separate values over 1KB into BlobDB
// Block cache: 32GB, write buffer: 256MBTerminology
Related Papers
Show HN: Bible as RAG Database
성경 전체를 RAG(검색 증강 생성) 데이터베이스로 인덱싱해 주제나 키워드로 관련 성경 구절을 의미론적으로 검색할 수 있는 웹 서비스다. 종교 텍스트에 RAG를 적용한 실용적 예시로, 유사한 프로젝트를 만들려는 개발자에게 참고가 된다.
Haystack: Open-Source AI Framework for Production Ready Agents, RAG
deepset이 만든 오픈소스 AI 오케스트레이션 프레임워크로, LangChain의 대안으로 주목받고 있으며 모듈형 파이프라인 방식으로 RAG·Agent·멀티모달 앱을 프로덕션까지 구축할 수 있다.
We built a persistent agent memory layer on Elasticsearch with 0.89 recall
AI 에이전트가 세션이 끝나도 사용자 정보를 기억할 수 있도록 Elasticsearch 위에 구축한 멀티테넌트 장기 메모리 시스템 아키텍처 공개. 168개 질문 기준 R@10 0.89, 테넌트 간 데이터 누출 0건을 달성한 구체적인 구현 방법을 담았다.
TAHOE: Text-to-SQL with Automated Hint Optimization from Experience
LLM이 SQL 생성 실패에서 배운 힌트를 재사용 가능한 Hint Bank로 쌓아, 모델 재학습 없이 Snowflake 방언 SQL 정확도를 대폭 끌어올리는 시스템.
Inside FAISS: Billion-Scale Similarity Search
FAISS가 수십억 개 벡터를 빠르게 검색하는 핵심 알고리즘인 IVF(파티셔닝)와 Product Quantization(압축)을 시각적으로 설명한 글로, RAG나 벡터 검색 시스템을 구축하는 개발자에게 내부 동작 원리를 이해시켜 준다.
Show HN: Airbyte Agents – context for agents across multiple data sources
Airbyte가 Slack, Salesforce, Linear 등 여러 SaaS 시스템의 데이터를 미리 인덱싱해서 Agent가 API를 일일이 뒤지지 않아도 되는 Context Store를 출시했다. 기존 MCP 방식보다 토큰을 최대 90%까지 줄이는 효과를 확인했다.