Reverse engineering Gemini's SynthID detection
TL;DR Highlight
A project has been released that detects and removes SynthID, an invisible watermark inserted by Google Gemini into AI-generated images, using only signal processing and spectral analysis. This is controversial as it demonstrates vulnerabilities in AI-generated image identification technology.
Who Should Read
Developers interested in AI-generated image detection technology or the security of digital watermarking, or researchers and security engineers who need to apply or evaluate the bypass possibilities of content authentication technologies like SynthID.
Core Mechanics
- This project reverse-engineered Google's SynthID watermarking system, analyzing the invisible watermark embedded in all images generated by Gemini using pure signal processing and spectral analysis without a proprietary encoder/decoder.
- The project claims to have successfully identified SynthID watermarks with 90% accuracy using a custom-built detector, but this figure is based on their own detector and has not been verified by Google's official SynthID detector.
- A key finding is that SynthID watermarks have a 'resolution-dependent carrier frequency structure'. That is, the watermark is embedded in different frequency bands depending on the image resolution.
- They developed a removal technique called 'V3 Multi-Resolution Spectral Bypass', claiming to achieve a 75% reduction in carrier energy, a 91% reduction in phase consistency, and a PSNR (image quality loss metric) of 43dB or higher. A PSNR of 43dB or higher is almost indistinguishable to the human eye.
- It's been deployed as a CLI tool installable via pip, with options like 'aggressive' and 'maximum' for setting the watermark removal strength, effectively making it a turnkey watermark removal tool.
- The community collection method is unique: they generate reference images by pasting pure black (#000000) or white (#FFFFFF) images into Gemini (Nano Banana Pro) and prompting 'reproduce it exactly', then use these images to extract watermark patterns.
- There are many criticisms regarding the README and code quality. It displays a '90% detection rate' badge with 88 verification images, no CI, and no test suite. The code example also contains a mix of two import styles, which could cause an ImportError.
- As one comment mentioned, simply downscaling and upscaling the image is said to remove the watermark, raising questions about the practicality of SynthID itself given this level of vulnerability.
Evidence
- "Google may actually operate two watermarks, one 'loose' version provided as a public oracle and another version held internally for law enforcement requests. It was also pointed out that Google is likely storing all generated images (or their neural hashes) linked to accounts. \n\nThe project was critically reviewed for testing only with its own detector and not verifying it with Google's official SynthID detection app. This means it doesn't prove it can actually fool Google's detection system. One comment pointed out that ground truth could be obtained by reverse-engineering the network request to directly call SynthID detection without a browser instance or Gemini access, but this wasn't done.\n\nThe README is heavily criticized for showing traces of AI-generated text. V1 and V2 only appear in tables and diagrams without explanation, the same content is repeated three times in Overview, Architecture, and Technical Deep Dive, and approximately 1600 words are padded to look like a paper, but lack actual rigor. Ironically, one comment noted that misaligned tables suggest they were written by Claude.\n\nThere was discomfort expressed about deploying a pip-installable CLI with 'aggressive' and 'maximum' strength settings for watermark removal while stating 'research purposes'. Ethical concerns were also raised about removing the only means of distinguishing AI-generated images from those created by humans.\n\nExperiences sharing that simply downscaling and upscaling images removes the watermark were shared. This led to reactions like 'if it's this easily removed, SynthID wasn't a good solution to begin with'. A similar bypass method was also described in a separate blog post (deepwalker.xyz)."
How to Apply
- If you rely on SynthID to implement AI-generated image detection in your service, be aware that this project demonstrates that watermarks can be removed with simple spectral processing and avoid relying on single watermark detection. Employ a multi-layered authentication strategy (neural hash + metadata + generation history) instead.
- If you are conducting security audits or red team operations, you can use the methodology of this repo (extracting carrier frequency patterns from a pure image set) to assess the spectral analysis vulnerabilities of the watermarking systems used within your organization.
- If you are designing an AI-generated image verification pipeline, consider prioritizing server-side verification, such as Google account-linked generation history or C2PA (Coalition for Content Provenance and Authenticity) metadata, over client-side watermark detection based on this case.
Code Example
# Repo basic usage (based on README)
pip install huggingface_hub
# Download reference images
python generate_references.py
# Detect watermark
python src/detect.py --image path/to/image.png
# Remove watermark (V3 Multi-Resolution Spectral Bypass)
python src/remove.py --image path/to/image.png --strength aggressive
# strength option: normal | aggressive | maximumTerminology
Related Papers
Constraint Decay: The Fragility of LLM Agents in Back End Code Generation
LLM 코딩 에이전트는 구조적 제약(아키텍처 패턴, ORM, DB 설계)이 쌓일수록 성능이 급격히 떨어지는 'constraint decay' 현상을 보인다는 연구 결과로, AI 코딩 도구를 프로덕션에 쓰려는 개발자라면 반드시 알아야 할 한계다.
AMEL: Accumulated Message Effects on LLM Judgments
LLM을 자동 평가자로 쓸 때 이전 대화 기록의 긍정/부정 분위기가 이후 판단을 오염시킨다는 걸 75,898개 API 호출로 증명한 연구.
Language-Switching Triggers Take a Latent Detour Through Language Models
8B LLM에 심어진 백도어 트리거가 중간 레이어에서 언어 탐지기를 완전히 속이는 직교 부분공간(orthogonal subspace)으로 숨어 이동한다는 걸 회로 분석으로 밝혀냈다.
Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems
LLM이 규칙을 잘 지키고 있는지 감시하려면 LLM에게 맡기지 말고 LTL(시간 논리 공식) 기반 모니터를 쓰세요.
Bun Rust rewrite: "codebase fails basic miri checks, allows for UB in safe rust"
Anthropic이 인수한 Bun 런타임이 Zig 코드베이스를 AI로 Rust에 재작성했는데, 가장 기본적인 메모리 안전성 검사(miri)조차 통과하지 못하는 UB(Undefined Behavior)가 발견됐다는 이슈가 제기됐다.
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
입력 텍스트는 멀쩡한데 입력 길이만으로 LLM 백도어가 발동되는 새로운 공격 기법 발견.