If you don't opt out by Apr 24 GitHub will train on your private repos
TL;DR Highlight
Starting April 24, GitHub changed its policy to use Copilot users' private repo interaction data for AI training by default. You need to know exactly where the opt-out link is and what data is actually in scope.
Who Should Read
Developers and teams who own private repos on GitHub or are currently using GitHub Copilot. Especially development team leads who need to manage code data security at the organizational level.
Core Mechanics
- Starting April 24, 2025, GitHub changed its policy to include interaction data from Free, Pro, and Pro+ Copilot users in AI model training by default (opt-out basis). Headlines were somewhat exaggerated, causing confusion — the policy does not use entire private repos for training, but rather the 'interaction data' generated while using Copilot.
- Business and Enterprise plan subscribers are not affected by this change. GitHub has officially stated that 'usage data from Business/Enterprise subscribers will not be used for training.'
- People who do not use Copilot at all are not directly affected by this change. However, if you plan to use Copilot in the future, opting out now will preserve that setting.
- The opt-out setting is located on the github.com/settings/copilot/features page, under the Privacy section at the bottom — toggle off 'Allow GitHub to use my data for AI model training.' It takes about 30 seconds to configure.
- There are concerns that the method for bulk disabling at the organization level is unclear. The currently confirmed setting is per individual account, and it remains ambiguous whether repo data could be included if even one team member fails to opt out.
- Users belonging to Enterprise accounts have reported that the opt-out option disappears from their personal Copilot Pro subscription settings. Enterprise policies override individual settings, causing confusion.
- GitHub stated that it had been continuously notifying users of this change via banners, but many users reported only becoming aware of it after seeing the HN post, indicating that very few actually read the banners.
- This policy change is interpreted as an extension of the industry trend that 'any data a company can freely read will eventually be used for AI training.' The view that ToS changes can enable this at any time — unless end-to-end encrypted — resonated widely in the community.
Evidence
- "A commenter believed to be a GitHub employee directly disputed the headline, stating it was inaccurate. They clarified — with a link to the official GitHub blog (github.blog) — that entire private repos are not used for training; only interaction data generated during Copilot usage is collected, and Business/Enterprise subscribers are not affected. An org admin expressed concern about whether one team member failing to opt out could expose the entire repo's code via their Copilot usage, and the lack of a clear official response heightened anxiety — the inability to control this at the org level, with only per-account settings available, was flagged as a problem. A humorous comment — 'my private repo is such a mess that training on it would hurt GitHub more than me' — got a lot of upvotes, while a parallel observation noted that messy, uncommented code could degrade training data quality. Some users said they didn't mind, noting their repos contain no client data or credentials and that they actually appreciate AI learning their code style. Others expressed distrust of GitHub/Microsoft, arguing that even with policies in place, accidents like accidentally ignoring private flags could happen. There was significant criticism of GitHub designing the policy as opt-out rather than opt-in. Concrete alternatives were proposed — such as switching to opt-in with participation incentives like increased token quotas to rebuild trust — and some users said this was a good reason to reduce their dependence on GitHub."
How to Apply
- "If you are a developer using GitHub Copilot (Free/Pro/Pro+), go to github.com/settings/copilot/features right now and disable 'Allow GitHub to use my data for AI model training' under the Privacy section at the bottom. This must be done before April 24 to take effect. If you are a team lead responsible for managing code security at the organizational level, instruct all team members to opt out from their individual accounts, and consider upgrading to a GitHub Enterprise/Business plan, as this policy does not apply to those plans. Even if you are not currently using Copilot, it is worth opting out in advance if there is any chance you will use it in the future — GitHub states that opt-out settings are preserved even after Copilot is later activated. When storing sensitive code on cloud services like GitHub/Microsoft, design with the assumption that service ToS can change at any time. For critical business logic and secrets, consider separating them into self-hosted Git solutions (such as Gitea or GitLab) or end-to-end encrypted storage."
Terminology
Related Papers
Show HN: Neural Particle Automata
고정된 격자 대신 움직이는 파티클 위에서 동작하는 Neural Cellular Automata의 확장 버전으로, 형태 생성·포인트 클라우드 분류·텍스처 합성 등 다양한 작업에서 자기조직화 동작을 학습할 수 있다.
When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks
VLM 자가학습 루프에서 verifier가 특정 태스크에 맞지 않으면 학습할수록 오히려 성능이 떨어지는데, DPO 손실값은 멀쩡히 내려가서 눈치채기도 어렵다.
The Role of Feedback Alignment in Self-Distillation
LLM이 스스로를 가르칠 때, 피드백을 모델의 추론 흐름에 단계별로 맞추면 GRPO보다 16점 이상 수학 추론 성능이 오른다.
Tiny hackable CUDA language model implementation
CUDA로 작성된 GPT(Generative Pretrained Transformer) 미니멀 구현체로, 텍스트뿐 아니라 모든 바이트 스트림을 학습할 수 있어 LLM 내부 구조를 직접 뜯어보고 싶은 개발자에게 유용하다.
CS336: Language Modeling from Scratch
Stanford에서 운영하는 LLM 전 과정 구현 강의로, 토크나이저부터 데이터 수집, 트랜스포머 구현, 분산 학습, RL 기반 정렬까지 직접 코딩하며 배운다. 이론이 아닌 구현 중심이라 실제로 LLM이 어떻게 작동하는지 깊이 이해하고 싶은 개발자에게 가장 체계적인 커리큘럼 중 하나다.
Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection
HuggingFace에서 다운받는 LoRA 어댑터에 백도어를 숨길 수 있고, 이를 탐지하는 방법도 있다.