If you don't opt out by Apr 24 GitHub will train on your private repos
TL;DR Highlight
Starting April 24, GitHub changed its policy to use Copilot users' private repo interaction data for AI training by default. You need to know exactly where the opt-out link is and what data is actually in scope.
Who Should Read
Developers and teams who own private repos on GitHub or are currently using GitHub Copilot. Especially development team leads who need to manage code data security at the organizational level.
Core Mechanics
- Starting April 24, 2025, GitHub changed its policy to include interaction data from Free, Pro, and Pro+ Copilot users in AI model training by default (opt-out basis). Headlines were somewhat exaggerated, causing confusion — the policy does not use entire private repos for training, but rather the 'interaction data' generated while using Copilot.
- Business and Enterprise plan subscribers are not affected by this change. GitHub has officially stated that 'usage data from Business/Enterprise subscribers will not be used for training.'
- People who do not use Copilot at all are not directly affected by this change. However, if you plan to use Copilot in the future, opting out now will preserve that setting.
- The opt-out setting is located on the github.com/settings/copilot/features page, under the Privacy section at the bottom — toggle off 'Allow GitHub to use my data for AI model training.' It takes about 30 seconds to configure.
- There are concerns that the method for bulk disabling at the organization level is unclear. The currently confirmed setting is per individual account, and it remains ambiguous whether repo data could be included if even one team member fails to opt out.
- Users belonging to Enterprise accounts have reported that the opt-out option disappears from their personal Copilot Pro subscription settings. Enterprise policies override individual settings, causing confusion.
- GitHub stated that it had been continuously notifying users of this change via banners, but many users reported only becoming aware of it after seeing the HN post, indicating that very few actually read the banners.
- This policy change is interpreted as an extension of the industry trend that 'any data a company can freely read will eventually be used for AI training.' The view that ToS changes can enable this at any time — unless end-to-end encrypted — resonated widely in the community.
Evidence
- "A commenter believed to be a GitHub employee directly disputed the headline, stating it was inaccurate. They clarified — with a link to the official GitHub blog (github.blog) — that entire private repos are not used for training; only interaction data generated during Copilot usage is collected, and Business/Enterprise subscribers are not affected. An org admin expressed concern about whether one team member failing to opt out could expose the entire repo's code via their Copilot usage, and the lack of a clear official response heightened anxiety — the inability to control this at the org level, with only per-account settings available, was flagged as a problem. A humorous comment — 'my private repo is such a mess that training on it would hurt GitHub more than me' — got a lot of upvotes, while a parallel observation noted that messy, uncommented code could degrade training data quality. Some users said they didn't mind, noting their repos contain no client data or credentials and that they actually appreciate AI learning their code style. Others expressed distrust of GitHub/Microsoft, arguing that even with policies in place, accidents like accidentally ignoring private flags could happen. There was significant criticism of GitHub designing the policy as opt-out rather than opt-in. Concrete alternatives were proposed — such as switching to opt-in with participation incentives like increased token quotas to rebuild trust — and some users said this was a good reason to reduce their dependence on GitHub."
How to Apply
- "If you are a developer using GitHub Copilot (Free/Pro/Pro+), go to github.com/settings/copilot/features right now and disable 'Allow GitHub to use my data for AI model training' under the Privacy section at the bottom. This must be done before April 24 to take effect. If you are a team lead responsible for managing code security at the organizational level, instruct all team members to opt out from their individual accounts, and consider upgrading to a GitHub Enterprise/Business plan, as this policy does not apply to those plans. Even if you are not currently using Copilot, it is worth opting out in advance if there is any chance you will use it in the future — GitHub states that opt-out settings are preserved even after Copilot is later activated. When storing sensitive code on cloud services like GitHub/Microsoft, design with the assumption that service ToS can change at any time. For critical business logic and secrets, consider separating them into self-hosted Git solutions (such as Gitea or GitLab) or end-to-end encrypted storage."
Terminology
Related Papers
Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library
PyTorch Lightning packages 2.6.2 and 2.6.3 delivered credential-stealing malware via a supply chain attack.
Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs
Fine-tuning even safety-aligned LLMs can bypass safeguards and reproduce copyrighted text verbatim, revealing prompt filtering alone isn't enough to prevent copyright infringement.
Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh
This is an educational project implementing a single-layer Transformer with 1,216 parameters in the scripting language HyperTalk (1987) and training it on a real Macintosh SE/30. It demonstrates that the core mathematics of modern LLMs works the same on hardware from 30 years ago.
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU
Introducing MegaTrain, a system that leverages CPU memory as the primary storage and utilizes the GPU solely as a compute engine, enabling full-precision training of 120B parameter models with just a single H200 GPU.
Show HN: I built a tiny LLM to demystify how language models work
This educational project allows you to build a mini LLM with 8.7 million parameters, trained on a Guppy fish character, from scratch in just 5 minutes using a single Colab notebook, focusing on demystifying the black box nature of LLMs.