AI assistance when contributing to the Linux kernel
TL;DR Highlight
An AI coding tool usage policy has been added to the official Linux kernel documentation, stating that legal responsibility for AI-generated code lies entirely with humans and AI usage must be explicitly indicated with an 'Assisted-by' tag.
Who Should Read
Developers submitting patches to the Linux kernel or contributing to open-source projects using AI tools. Also, maintainers looking to adopt an AI usage policy for their own open-source projects.
Core Mechanics
- Guidelines for using AI coding assistants have been officially added to the official Linux kernel documentation (Documentation/process/coding-assistants.rst). The use of AI tools is permitted, but existing kernel development processes (coding style, patch submission rules, etc.) must still be followed.
- AI-generated code must be compatible with GPL-2.0-only and must have the correct SPDX license identifier. The responsibility for verifying license compliance lies with the human contributor, not the AI.
- AI agents cannot add a 'Signed-off-by' tag. The DCO (Developer Certificate of Origin, a signature guaranteeing the origin and license of the code) can only be legally certified by humans.
- Human contributors must directly review all AI-generated code, verify license requirements, and add their own Signed-off-by tag to take full responsibility for the contribution.
- When contributing with AI tools, a tag in the format 'Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]' must be added to the commit message. For example, 'Assisted-by: Claude:claude-3-opus coccinelle sparse'.
- AGENT_NAME is the name of the AI tool or framework used, MODEL_VERSION is the specific model version, and specialized analysis tools like coccinelle (C code pattern matching tool), sparse (kernel static analysis tool), smatch, and clang-tidy can be optionally specified. Basic development tools like git, gcc, and make are not included.
- The purpose of these guidelines is to create an appropriate attribution system for tracking the role of AI in the development process. This will allow us to understand how the role of AI changes in kernel development over time.
Evidence
- "Most comments reacted that this policy was common sense and reasonable. The principle of 'humans are responsible, adhere to licenses' was not particularly new and was seen as a natural direction. However, some reactions noted that they had not seen such a clearly documented policy in other well-known open-source repositories. \n\nThere were also skeptical views about the possibility of license compliance. LLMs are trained on a vast amount of code with various licenses on the internet, even including code collected without copyright holder consent. The question was raised as to how contributors could guarantee the GPL compatibility of AI-generated code. It was also criticized that guaranteeing GPL compliance was practically impossible, given that LLMs can 'regurgitate' training data with high fidelity.\n\nConcerns were also raised about the increased burden of code review. Thanks to AI tools, anyone can easily create large patches, potentially overwhelming maintainers with the amount of code they need to review. There was concern that reviewers, overwhelmed by volume, would end up 'trusting' and merging AI-generated code without sufficient review, creating a vicious cycle.\n\nA point was made that AI is most confidently wrong in the areas that are most critical to the kernel. There were shared experiences of models generating superficially clean code but creating bugs that cause deadlocks weeks later under actual load. It was suggested that an approach of filtering patches with AI on the maintainer side, like Greg KH's Sashiko, might be more effective than contributor policies.\n\nThere were also opinions that using the 'Assisted-by' tag to indicate AI tools was awkward. The tag was originally used when someone else helped with a commit, and now it is also used for a completely different purpose – indicating AI tools – leading to a mixing of meanings. Some comments suggested creating a separate tag like 'AI-assistant:' would have been better."
How to Apply
- When writing patches for the Linux kernel using AI tools (Claude, Copilot, etc.), be sure to add a tag like 'Assisted-by: Claude:claude-3-7-sonnet' to the commit message, and review the code yourself before adding your own Signed-off-by tag. If you have configured the AI to automatically add Signed-off-by, you should disable that feature.
- Before submitting AI-generated kernel code, it is a good idea to run it through kernel-specific static analysis tools such as coccinelle (pattern-based C code analysis), sparse (kernel-specific static analysis), and smatch. If you use these tools, you can include them in the tag, such as 'Assisted-by: Claude:claude-3-7-sonnet coccinelle sparse'.
- If you are a maintainer looking to adopt an AI usage policy for your own open-source project, you can use this Linux kernel document (coding-assistants.rst) as a template and simply adopt the 'humans are responsible + AI usage indication tag' structure. Its simplicity and clear legal responsibility make it a good model for immediate application to other projects.
- Real-world experience shared in comments suggests that submitting AI-generated code in multiple iterations with the same model can help find bugs. In particular, concurrency bugs like race conditions can be difficult to detect and may only appear under specific load conditions. Therefore, it is even more important to perform stress tests on AI-generated code.
Code Example
# Example of specifying AI usage in a commit message
git commit -m "drivers/net: fix memory leak in foo_driver
Fixed a memory leak in the error path of foo_probe() where
the allocated buffer was not freed on failure.
Assisted-by: Claude:claude-3-opus coccinelle sparse
Signed-off-by: Your Name <your@email.com>"Terminology
Related Papers
Constraint Decay: The Fragility of LLM Agents in Back End Code Generation
LLM 코딩 에이전트는 구조적 제약(아키텍처 패턴, ORM, DB 설계)이 쌓일수록 성능이 급격히 떨어지는 'constraint decay' 현상을 보인다는 연구 결과로, AI 코딩 도구를 프로덕션에 쓰려는 개발자라면 반드시 알아야 할 한계다.
AMEL: Accumulated Message Effects on LLM Judgments
LLM을 자동 평가자로 쓸 때 이전 대화 기록의 긍정/부정 분위기가 이후 판단을 오염시킨다는 걸 75,898개 API 호출로 증명한 연구.
Language-Switching Triggers Take a Latent Detour Through Language Models
8B LLM에 심어진 백도어 트리거가 중간 레이어에서 언어 탐지기를 완전히 속이는 직교 부분공간(orthogonal subspace)으로 숨어 이동한다는 걸 회로 분석으로 밝혀냈다.
Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems
LLM이 규칙을 잘 지키고 있는지 감시하려면 LLM에게 맡기지 말고 LTL(시간 논리 공식) 기반 모니터를 쓰세요.
Bun Rust rewrite: "codebase fails basic miri checks, allows for UB in safe rust"
Anthropic이 인수한 Bun 런타임이 Zig 코드베이스를 AI로 Rust에 재작성했는데, 가장 기본적인 메모리 안전성 검사(miri)조차 통과하지 못하는 UB(Undefined Behavior)가 발견됐다는 이슈가 제기됐다.
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs
입력 텍스트는 멀쩡한데 입력 길이만으로 LLM 백도어가 발동되는 새로운 공격 기법 발견.