Google Engineers Launch "Sashiko" for Agentic AI Code Review of the Linux Kernel
TL;DR Highlight
Google's Linux kernel team open-sourced 'Sashiko,' a Gemini 3.1 Pro-based AI code review agent that claims to detect 53% of bugs missed by human reviewers.
Who Should Read
Linux kernel contributors or large open-source project maintainers considering automated code review pipelines. Backend/systems developers curious about real-world cases of applying AI agents to code quality verification.
Core Mechanics
- Roman Gushchin from Google's Linux kernel team released Sashiko. A system used internally at Google for months is now being extended to all Linux kernel mailing list patch submissions.
- Sashiko detected 53% of bugs when tested against 1,000 recent upstream Linux kernel issues with 'Fixes:' tags. The presenter emphasized 'this 53% are issues that human reviewers missed 100% of the time.'
- Designed to use Gemini 2.5 Pro by default (listed as 'Gemini 3.1 Pro' in original), but built to work with Claude and other LLMs. Interestingly, the system itself was written in Rust and co-authored with Claude.
- Google is covering Sashiko's token costs and infrastructure, with project hosting planned to transfer to the Linux Foundation. Code is open-sourced on GitHub (github.com/sashiko-dev/sashiko).
- A web interface (sashiko.dev) shows patchsets currently under review and results. Review results include findings tagged with severity like 'Critical' and 'High.'
- A key design principle: Sashiko is designed not to spam the mailing list with comments directly. Review results are only viewable on a separate web interface, choosing not to disrupt the kernel community's existing workflow.
- As an agentic AI code review (unlike simple static analysis), the LLM understands patch context and judges bug likelihood. One commenter noted 'separating the model that writes code from the model that reviews code is the key insight.'
Evidence
- The 53% detection rate was criticized for not disclosing the false positive rate. 'Flag all code as buggy and you get 100% detection rate' — without precision alongside recall, actual usefulness is hard to judge. Concerns that human reviewers overwhelmed by AI false positive reports could lose trust in the entire system.
- The claim '100% of issues were missed by human reviewers' sparked interpretation debate. One commenter noted 'missed at the initial code review stage doesn't mean developers didn't find these bugs later in development.' Code gets reviewed continuously, so the framing was considered somewhat exaggerated.
- UX feedback on the web UI (sashiko.dev): the Status column shows internal pipeline states like 'Pending' and 'In Review,' while actually important findings are buried on the far right. No filtering or highlighting for Critical/High severity findings, reducing practical utility.
- Concerns about auto-submitting style/structural change patches. A commenter shared an actual Sashiko review result link, noting that automated style changes applied at scale to the kernel codebase could burden the existing development flow. The worry was it seemed to focus more on style cleanup than bug detection.
- Positive reactions to separating the writing model from the reviewing model. One commenter shared 'I use the same approach at small scale — for the same reason you don't self-review your own PRs, self-review misses things.' The system being written in Rust and co-developed with Claude was also noted as interesting.
How to Apply
- If you submit patches to the Linux kernel, check your patchset's review results on sashiko.dev before sending to the mailing list. Fixing Critical/High findings before submission can shorten review cycles.
- To build a similar AI code review pipeline for your own large codebase, reference Sashiko's open-source code (github.com/sashiko-dev/sashiko) and apply the 'writing model != reviewing model' principle. Designed to swap in Claude API as well, making it easy for teams already using Claude.
- When considering AI code review system adoption, follow Sashiko's approach: separate review results into a dashboard rather than spamming existing communication channels (mailing lists, PR comments). When false positives are high, a separate UI maintains team trust better than direct notifications.
- Rather than trusting the 53% bug detection metric at face value, measure false positive rate as well before actual adoption. Pull 100-200 recent 'Fixes:' commits from your own codebase and compare against AI review results to measure precision/recall yourself.
Terminology
Related Papers
Show HN: adamsreview – better multi-agent PR reviews for Claude Code
Claude Code에서 최대 7개의 병렬 서브 에이전트가 각각 다른 관점으로 PR을 리뷰하고, 자동 수정까지 해주는 오픈소스 플러그인이다. 기존 /review나 CodeRabbit보다 실제 버그를 더 많이 잡는다고 주장하지만 커뮤니티에서는 복잡도와 실효성에 대한 회의론도 나왔다.
How Fast Does Claude, Acting as a User Space IP Stack, Respond to Pings?
Claude Code에게 IP 패킷을 직접 파싱하고 ICMP echo reply를 구성하도록 시켜서 실제로 ping에 응답하게 만든 실험으로, 'Markdown이 곧 코드이고 LLM이 프로세서'라는 아이디어를 네트워크 스택 수준까지 밀어붙인 재미있는 사례다.
Show HN: Git for AI Agents
AI 코딩 에이전트(Claude Code 등)가 수행한 모든 툴 호출을 자동으로 추적하고, 어떤 프롬프트가 어느 코드 줄을 작성했는지 blame까지 가능한 버전 관리 도구다.
Principles for agent-native CLIs
AI 에이전트가 CLI 도구를 더 잘 사용할 수 있도록 설계하는 원칙들을 정리한 글로, 에이전트가 CLI를 도구로 활용하는 빈도가 높아지면서 이 설계 방식이 실용적으로 중요해지고 있다.
Agent-harness-kit scaffolding for multi-agent workflows (MCP, provider-agnostic)
여러 AI 에이전트가 서로 역할을 나눠 협업할 수 있도록 조율하는 scaffolding 도구로, Vite처럼 설정 없이 빠르게 멀티 에이전트 파이프라인을 구성할 수 있다.
Show HN: Tilde.run – Agent sandbox with a transactional, versioned filesystem
AI 에이전트가 실제 프로덕션 데이터를 건드려도 롤백할 수 있는 격리된 샌드박스 환경을 제공하는 도구로, GitHub/S3/Google Drive를 하나의 버전 관리 파일시스템으로 묶어준다.