Show HN: Building a web search engine from scratch with 3B neural embeddings
TL;DR Highlight
One developer built a web search engine with 3 billion SBERT embeddings and 280 million indexed pages in 2 months — showing the real architecture and cost structure of a vector search-based system at scale.
Who Should Read
Backend/infrastructure engineers building vector search or embedding-based retrieval systems, or developers curious about the real-world architecture of large-scale crawling and indexing pipelines.
Core Mechanics
- Instead of keyword matching, uses SBERT (Multi-QA-MPNET-base-dot-v1, 768-dim) embeddings for intent-based search — e.g., a vague query like 'why isn't CORS working' finds exact eventual consistency answers in S3 docs.
- Generated 3 billion embeddings at 100K/sec using 250 Runpod RTX 4090 GPUs — total compute cost under $50K.
- Used Hetzner auction servers (42x cheaper than AWS for high-memory) and Runpod GPUs (4.3x cheaper) to dramatically cut costs vs. mainstream cloud.
- Built custom HNSW + RocksDB instead of managed vector DB services for cost control at scale.
Evidence
- Community reactions ranged from 'this is a 10x engineer building Google in spare time' to offers of seed investment based on the $50K operating cost.
- Pure vector search limitations noted — searching 'garbanzo bean stew' returned wrong bean recipes, highlighting the need for hybrid approaches combining keyword and semantic search.
- Architecture choices (Hetzner over AWS, custom HNSW over managed vector DB) were validated as pragmatic cost optimizations.
How to Apply
- For embedding-based search systems: combine Hetzner auction servers (42x cheaper high-memory) and Runpod GPUs (4.3x cheaper) instead of AWS to dramatically cut costs. Consider custom HNSW + RocksDB over managed vector DB services like Turbopuffer.
- To improve chunking quality in RAG pipelines: go beyond simple token-count splitting. Use semantic boundaries (paragraph breaks, section headers) and overlap chunks for better retrieval.
- Pure vector search has limitations — consider hybrid search combining BM25 keyword matching with semantic embedding similarity for production systems.
Code Example
// RocksDB optimization settings (Rust)
opt.set_max_background_jobs(num_cpus::get() as i32 * 2);
opt.set_bytes_per_sync(1024 * 1024 * 4);
opt.set_enable_blob_files(true);
opt.set_min_blob_size(1024); // Separate values over 1KB into BlobDB
// Block cache: 32GB, write buffer: 256MBTerminology
Related Papers
Show HN: Airbyte Agents – context for agents across multiple data sources
Airbyte가 Slack, Salesforce, Linear 등 여러 SaaS 시스템의 데이터를 미리 인덱싱해서 Agent가 API를 일일이 뒤지지 않아도 되는 Context Store를 출시했다. 기존 MCP 방식보다 토큰을 최대 90%까지 줄이는 효과를 확인했다.
A polynomial autoencoder beats PCA on transformer embeddings
PCA 인코더에 2차 다항식 디코더를 붙여서 닫힌 형태(closed-form)로 embedding 압축 품질을 크게 개선하는 기법으로, SGD 없이 numpy만으로 구현 가능하다.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
RAG 스타일 텍스트 검색 대신 Schema로 정의된 구조화 레코드에 메모리를 저장하면, 정확한 사실 조회·상태 추적·집계 쿼리에서 압도적으로 높은 정확도를 얻을 수 있다.
Show HN: Atomic – Local-first, AI-augmented personal knowledge base
Atomic builds a self-hosted, open-source personal knowledge graph app that automatically embeds, tags, and links notes, web clips, and RSS feeds—supporting semantic search, LLM-powered wiki synthesis, and MCP integration.
We replaced RAG with a virtual filesystem for our AI documentation assistant
Explains how Mintlify overcame RAG chunking limitations by building a virtual filesystem (ChromaFs) on top of Chroma DB that mimics UNIX commands, reducing session boot time from 46 seconds to 100ms.
Chroma Context-1: Training a Self-Editing Search Agent