Show HN: Gemini can now natively embed video, so I built sub-second video search
TL;DR Highlight
Google's Gemini Embedding model can now embed video directly into vectors without text transcription, enabling natural language search over dashcam footage — describe 'red truck running a stop sign' and get the clip back.
Who Should Read
Backend developers working with video analysis, security cameras, or dashcam footage, and developers looking to build multimodal search systems using the Gemini API.
Core Mechanics
- Gemini's new video embedding capability generates vectors directly from video frames, capturing visual semantics without requiring a transcription or OCR step.
- The embeddings support cross-modal retrieval: you can query with text and retrieve semantically matching video segments, or query with a video clip and find similar scenes.
- The dashcam search demo indexes hours of footage and returns the specific timestamp and clip matching a natural language description — working even for events that occur without audio cues.
- Embedding quality degrades for fast-motion scenes and low-light footage, which is a known limitation of frame-based visual embedding approaches.
- The API returns embeddings per video segment (configurable window size), making it suitable for long-form video indexing at scale.
Evidence
- The demo showing 'red truck running a stop sign' retrieval from hours of dashcam footage generated excitement in the CV/ML community — the latency and accuracy shown were impressive.
- Commenters noted this makes real-time video search practical for applications that were previously too expensive (requiring full transcription or frame-by-frame human review).
- Some questioned the privacy implications: if you can search video this easily, surveillance footage becomes much more powerful — a concern for civil liberties advocates.
- Developers who tried the API in beta reported it works well for well-lit, reasonably paced footage but struggles with fast action, night footage, or very subtle events.
How to Apply
- For security camera applications, pre-index all footage using Gemini video embeddings and store vectors in a vector database. Enable natural language search without real-time processing.
- Set the embedding window size based on your event duration — for traffic incidents (2-5 seconds), use shorter windows; for activity patterns (minutes), use longer windows.
- Combine video embeddings with text embeddings of metadata (timestamps, location, camera ID) for richer search — a hybrid search approach works better than pure video similarity.
- Test your use case at small scale first: embed a representative sample of your footage and verify retrieval quality before committing to full indexing.
Code Example
# Installation and basic usage
git clone https://github.com/ssrajadh/sentrysearch.git
cd sentrysearch
python -m venv venv && source venv/bin/activate
pip install -e .
# Set Gemini API key (obtain from aistudio.google.com/apikey)
sentrysearch init
# Index video directory (default: 30-second chunks, 5-second overlap)
sentrysearch index /path/to/dashcam/footage
# Custom chunk settings example
sentrysearch index /path/to/footage --chunk-duration 30 --overlap 5
# Search using natural language
sentrysearch search "red truck running a stop sign"
# → The matching segment is saved as a clip file extracted from the originalTerminology
Related Papers
Show HN: Airbyte Agents – context for agents across multiple data sources
Airbyte가 Slack, Salesforce, Linear 등 여러 SaaS 시스템의 데이터를 미리 인덱싱해서 Agent가 API를 일일이 뒤지지 않아도 되는 Context Store를 출시했다. 기존 MCP 방식보다 토큰을 최대 90%까지 줄이는 효과를 확인했다.
A polynomial autoencoder beats PCA on transformer embeddings
PCA 인코더에 2차 다항식 디코더를 붙여서 닫힌 형태(closed-form)로 embedding 압축 품질을 크게 개선하는 기법으로, SGD 없이 numpy만으로 구현 가능하다.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
RAG 스타일 텍스트 검색 대신 Schema로 정의된 구조화 레코드에 메모리를 저장하면, 정확한 사실 조회·상태 추적·집계 쿼리에서 압도적으로 높은 정확도를 얻을 수 있다.
Show HN: Atomic – Local-first, AI-augmented personal knowledge base
Atomic builds a self-hosted, open-source personal knowledge graph app that automatically embeds, tags, and links notes, web clips, and RSS feeds—supporting semantic search, LLM-powered wiki synthesis, and MCP integration.
We replaced RAG with a virtual filesystem for our AI documentation assistant
Explains how Mintlify overcame RAG chunking limitations by building a virtual filesystem (ChromaFs) on top of Chroma DB that mimics UNIX commands, reducing session boot time from 46 seconds to 100ms.
Chroma Context-1: Training a Self-Editing Search Agent