Neural Networks: Zero to Hero
TL;DR Highlight
Andrej Karpathy teaches everything from backprop to GPT by building it in code — hands-on lectures for engineers who learn best by implementing.
Who Should Read
Software engineers and ML practitioners who want deep intuition for how neural networks and LLMs actually work, not just how to use APIs.
Core Mechanics
- Karpathy's lecture series (Neural Networks: Zero to Hero) covers the full stack from basic backpropagation through modern GPT architecture, all built from scratch in Python/PyTorch.
- The pedagogical approach: implement everything yourself rather than using libraries as black boxes — you understand autograd by building it, not by reading about it.
- The series covers: micrograd (backprop from scratch), makemore (bigram → MLP → attention), and nanoGPT (a minimal but complete GPT implementation).
- The content is targeted at engineers with Python knowledge but limited ML background — accessible without requiring a deep math background.
- nanoGPT became a widely used reference implementation because it's readable, not just functional.
- The lectures are freely available on YouTube and have become a standard self-study resource for engineers entering ML.
Evidence
- The series has millions of views and is frequently cited as the best free resource for engineers learning ML fundamentals.
- HN discussions of the series are consistently positive, with experienced ML engineers recommending it even to practitioners with existing backgrounds.
- nanoGPT's GitHub repo has tens of thousands of stars and is regularly forked for research experiments — evidence of practical utility beyond just education.
- Several professional ML engineers noted that working through the series filled gaps in their understanding that years of using high-level frameworks hadn't addressed.
How to Apply
- Work through the series sequentially — don't skip micrograd, even if you already use autograd. The implementation details matter for debugging mental models.
- After each lecture, try to extend the implementation yourself before looking at solutions — the struggle is where the learning happens.
- Use nanoGPT as a starting point for research experiments: it's small enough to fit in your head and modify confidently.
- After completing the series, you'll have the foundation to read ML papers directly rather than relying on blog post summaries.
Terminology
Related Papers
Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library
PyTorch Lightning packages 2.6.2 and 2.6.3 delivered credential-stealing malware via a supply chain attack.
Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs
Fine-tuning even safety-aligned LLMs can bypass safeguards and reproduce copyrighted text verbatim, revealing prompt filtering alone isn't enough to prevent copyright infringement.
Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh
This is an educational project implementing a single-layer Transformer with 1,216 parameters in the scripting language HyperTalk (1987) and training it on a real Macintosh SE/30. It demonstrates that the core mathematics of modern LLMs works the same on hardware from 30 years ago.
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU
Introducing MegaTrain, a system that leverages CPU memory as the primary storage and utilizes the GPU solely as a compute engine, enabling full-precision training of 120B parameter models with just a single H200 GPU.
Show HN: I built a tiny LLM to demystify how language models work
This educational project allows you to build a mini LLM with 8.7 million parameters, trained on a Guppy fish character, from scratch in just 5 minutes using a single Colab notebook, focusing on demystifying the black box nature of LLMs.