Show HN: Gemini can now natively embed video, so I built sub-second video search
TL;DR Highlight
Google's Gemini Embedding model can now embed video directly into vectors without text transcription, enabling natural language search over dashcam footage — describe 'red truck running a stop sign' and get the clip back.
Who Should Read
Backend developers working with video analysis, security cameras, or dashcam footage, and developers looking to build multimodal search systems using the Gemini API.
Core Mechanics
- Gemini's new video embedding capability generates vectors directly from video frames, capturing visual semantics without requiring a transcription or OCR step.
- The embeddings support cross-modal retrieval: you can query with text and retrieve semantically matching video segments, or query with a video clip and find similar scenes.
- The dashcam search demo indexes hours of footage and returns the specific timestamp and clip matching a natural language description — working even for events that occur without audio cues.
- Embedding quality degrades for fast-motion scenes and low-light footage, which is a known limitation of frame-based visual embedding approaches.
- The API returns embeddings per video segment (configurable window size), making it suitable for long-form video indexing at scale.
Evidence
- The demo showing 'red truck running a stop sign' retrieval from hours of dashcam footage generated excitement in the CV/ML community — the latency and accuracy shown were impressive.
- Commenters noted this makes real-time video search practical for applications that were previously too expensive (requiring full transcription or frame-by-frame human review).
- Some questioned the privacy implications: if you can search video this easily, surveillance footage becomes much more powerful — a concern for civil liberties advocates.
- Developers who tried the API in beta reported it works well for well-lit, reasonably paced footage but struggles with fast action, night footage, or very subtle events.
How to Apply
- For security camera applications, pre-index all footage using Gemini video embeddings and store vectors in a vector database. Enable natural language search without real-time processing.
- Set the embedding window size based on your event duration — for traffic incidents (2-5 seconds), use shorter windows; for activity patterns (minutes), use longer windows.
- Combine video embeddings with text embeddings of metadata (timestamps, location, camera ID) for richer search — a hybrid search approach works better than pure video similarity.
- Test your use case at small scale first: embed a representative sample of your footage and verify retrieval quality before committing to full indexing.
Code Example
# Installation and basic usage
git clone https://github.com/ssrajadh/sentrysearch.git
cd sentrysearch
python -m venv venv && source venv/bin/activate
pip install -e .
# Set Gemini API key (obtain from aistudio.google.com/apikey)
sentrysearch init
# Index video directory (default: 30-second chunks, 5-second overlap)
sentrysearch index /path/to/dashcam/footage
# Custom chunk settings example
sentrysearch index /path/to/footage --chunk-duration 30 --overlap 5
# Search using natural language
sentrysearch search "red truck running a stop sign"
# → The matching segment is saved as a clip file extracted from the originalTerminology
Related Papers
Show HN: Bible as RAG Database
성경 전체를 RAG(검색 증강 생성) 데이터베이스로 인덱싱해 주제나 키워드로 관련 성경 구절을 의미론적으로 검색할 수 있는 웹 서비스다. 종교 텍스트에 RAG를 적용한 실용적 예시로, 유사한 프로젝트를 만들려는 개발자에게 참고가 된다.
Haystack: Open-Source AI Framework for Production Ready Agents, RAG
deepset이 만든 오픈소스 AI 오케스트레이션 프레임워크로, LangChain의 대안으로 주목받고 있으며 모듈형 파이프라인 방식으로 RAG·Agent·멀티모달 앱을 프로덕션까지 구축할 수 있다.
We built a persistent agent memory layer on Elasticsearch with 0.89 recall
AI 에이전트가 세션이 끝나도 사용자 정보를 기억할 수 있도록 Elasticsearch 위에 구축한 멀티테넌트 장기 메모리 시스템 아키텍처 공개. 168개 질문 기준 R@10 0.89, 테넌트 간 데이터 누출 0건을 달성한 구체적인 구현 방법을 담았다.
TAHOE: Text-to-SQL with Automated Hint Optimization from Experience
LLM이 SQL 생성 실패에서 배운 힌트를 재사용 가능한 Hint Bank로 쌓아, 모델 재학습 없이 Snowflake 방언 SQL 정확도를 대폭 끌어올리는 시스템.
Inside FAISS: Billion-Scale Similarity Search
FAISS가 수십억 개 벡터를 빠르게 검색하는 핵심 알고리즘인 IVF(파티셔닝)와 Product Quantization(압축)을 시각적으로 설명한 글로, RAG나 벡터 검색 시스템을 구축하는 개발자에게 내부 동작 원리를 이해시켜 준다.
Show HN: Airbyte Agents – context for agents across multiple data sources
Airbyte가 Slack, Salesforce, Linear 등 여러 SaaS 시스템의 데이터를 미리 인덱싱해서 Agent가 API를 일일이 뒤지지 않아도 되는 Context Store를 출시했다. 기존 MCP 방식보다 토큰을 최대 90%까지 줄이는 효과를 확인했다.