AI assistance when contributing to the Linux kernel
TL;DR Highlight
An AI coding tool usage policy has been added to the official Linux kernel documentation, stating that legal responsibility for AI-generated code lies entirely with humans and AI usage must be explicitly indicated with an 'Assisted-by' tag.
Who Should Read
Developers submitting patches to the Linux kernel or contributing to open-source projects using AI tools. Also, maintainers looking to adopt an AI usage policy for their own open-source projects.
Core Mechanics
- Guidelines for using AI coding assistants have been officially added to the official Linux kernel documentation (Documentation/process/coding-assistants.rst). The use of AI tools is permitted, but existing kernel development processes (coding style, patch submission rules, etc.) must still be followed.
- AI-generated code must be compatible with GPL-2.0-only and must have the correct SPDX license identifier. The responsibility for verifying license compliance lies with the human contributor, not the AI.
- AI agents cannot add a 'Signed-off-by' tag. The DCO (Developer Certificate of Origin, a signature guaranteeing the origin and license of the code) can only be legally certified by humans.
- Human contributors must directly review all AI-generated code, verify license requirements, and add their own Signed-off-by tag to take full responsibility for the contribution.
- When contributing with AI tools, a tag in the format 'Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]' must be added to the commit message. For example, 'Assisted-by: Claude:claude-3-opus coccinelle sparse'.
- AGENT_NAME is the name of the AI tool or framework used, MODEL_VERSION is the specific model version, and specialized analysis tools like coccinelle (C code pattern matching tool), sparse (kernel static analysis tool), smatch, and clang-tidy can be optionally specified. Basic development tools like git, gcc, and make are not included.
- The purpose of these guidelines is to create an appropriate attribution system for tracking the role of AI in the development process. This will allow us to understand how the role of AI changes in kernel development over time.
Evidence
- "Most comments reacted that this policy was common sense and reasonable. The principle of 'humans are responsible, adhere to licenses' was not particularly new and was seen as a natural direction. However, some reactions noted that they had not seen such a clearly documented policy in other well-known open-source repositories. \n\nThere were also skeptical views about the possibility of license compliance. LLMs are trained on a vast amount of code with various licenses on the internet, even including code collected without copyright holder consent. The question was raised as to how contributors could guarantee the GPL compatibility of AI-generated code. It was also criticized that guaranteeing GPL compliance was practically impossible, given that LLMs can 'regurgitate' training data with high fidelity.\n\nConcerns were also raised about the increased burden of code review. Thanks to AI tools, anyone can easily create large patches, potentially overwhelming maintainers with the amount of code they need to review. There was concern that reviewers, overwhelmed by volume, would end up 'trusting' and merging AI-generated code without sufficient review, creating a vicious cycle.\n\nA point was made that AI is most confidently wrong in the areas that are most critical to the kernel. There were shared experiences of models generating superficially clean code but creating bugs that cause deadlocks weeks later under actual load. It was suggested that an approach of filtering patches with AI on the maintainer side, like Greg KH's Sashiko, might be more effective than contributor policies.\n\nThere were also opinions that using the 'Assisted-by' tag to indicate AI tools was awkward. The tag was originally used when someone else helped with a commit, and now it is also used for a completely different purpose – indicating AI tools – leading to a mixing of meanings. Some comments suggested creating a separate tag like 'AI-assistant:' would have been better."
How to Apply
- When writing patches for the Linux kernel using AI tools (Claude, Copilot, etc.), be sure to add a tag like 'Assisted-by: Claude:claude-3-7-sonnet' to the commit message, and review the code yourself before adding your own Signed-off-by tag. If you have configured the AI to automatically add Signed-off-by, you should disable that feature.
- Before submitting AI-generated kernel code, it is a good idea to run it through kernel-specific static analysis tools such as coccinelle (pattern-based C code analysis), sparse (kernel-specific static analysis), and smatch. If you use these tools, you can include them in the tag, such as 'Assisted-by: Claude:claude-3-7-sonnet coccinelle sparse'.
- If you are a maintainer looking to adopt an AI usage policy for your own open-source project, you can use this Linux kernel document (coding-assistants.rst) as a template and simply adopt the 'humans are responsible + AI usage indication tag' structure. Its simplicity and clear legal responsibility make it a good model for immediate application to other projects.
- Real-world experience shared in comments suggests that submitting AI-generated code in multiple iterations with the same model can help find bugs. In particular, concurrency bugs like race conditions can be difficult to detect and may only appear under specific load conditions. Therefore, it is even more important to perform stress tests on AI-generated code.
Code Example
snippet
# Example of specifying AI usage in a commit message
git commit -m "drivers/net: fix memory leak in foo_driver
Fixed a memory leak in the error path of foo_probe() where
the allocated buffer was not freed on failure.
Assisted-by: Claude:claude-3-opus coccinelle sparse
Signed-off-by: Your Name <your@email.com>"Terminology
DCOAbbreviation for Developer Certificate of Origin. It is a procedure in which a contributor legally signs to confirm that 'I created this code or verified the license, and I have the right to contribute to this project.' The Signed-off-by tag represents this certification.
Signed-off-byA tag appended to the bottom of a commit message, indicating that the contributor has certified the DCO. Maintainers will not accept patches without this tag.
SPDXAbbreviation for Software Package Data Exchange. It is a method for representing software license information in a machine-readable standard format. It is written at the top of a file as 'SPDX-License-Identifier: GPL-2.0-only', for example.
coccinelleA C code pattern matching and automatic transformation tool used in Linux kernel development. It is mainly used to find specific coding patterns and replace them with safer patterns.
sparseA kernel-specific static analysis tool that detects kernel-specific bugs such as kernel address space misuse, incorrect type casting, and locking rule violations before compilation.
레이스 컨디션A bug that occurs when multiple threads or processes access the same resource simultaneously, and the result varies depending on the execution order. It is particularly difficult to detect in AI-generated code and often only occurs under specific load conditions.