Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs
TL;DR Highlight
An open-source library that allows you to train a 1.3B parameter coding agent model from scratch on a $200 (approximately 270,000 KRW) TPU, following Anthropic's Constitutional AI approach. It can serve as a hands-on reference for developers who want to directly understand the entire AI training pipeline.
Who Should Read
ML engineers who want to directly implement alignment learning techniques such as Constitutional AI or RLHF, or developers who want to understand end-to-end how coding agents like Claude are created internally.
Core Mechanics
- nanocode is a library that demonstrates how anyone can train their own coding agent model from scratch, closely following the Constitutional AI approach used in Anthropic's Claude model training.
- The entire training infrastructure and philosophy are derived from Karpathy's nanochat project, and the structure is so similar that those familiar with nanochat will find nanocode to be quite familiar.
- The code is entirely written in JAX and optimized for training on Google TPUs. While it does run on NVIDIA GPUs, it's important to consider that it's TPU-optimized code.
- The 1.3B parameter model, nanocode-d24, can be trained in about 9 hours on a TPU v6e-8, costing approximately $200 (approximately 270,000 KRW).
- The smaller 477M parameter model, nanocode-d20, can be trained in about 1.5 hours for $34 (approximately 45,000 KRW), making it good for quick experimentation.
- You can receive free TPU access for a month through Google's TRC (TPU Research Cloud) program, and new Google Cloud accounts can also receive a $300 credit, allowing you to get started without any cost.
- The training pipeline consists of SOUL.md creation (defining the model's value criteria) → agentic interface definition → synthetic data generation → preference optimisation.
- The author used TPUs for 3 months through the TRC program and was able to maintain the same pod for over a week, as spot instances were rarely interrupted.
Evidence
- In a demo video, nanocode was criticized for answering a prompt ('remove falsey values from a list without creating a new list') with a list comprehension (a method that creates a new list), suggesting it didn't fully understand the requirements and generated incorrect code. Correct implementation examples using reversed range + pop were also shared in the comments.
- A question arose, 'Why spend $200 to train it yourself when there are many free coding models?' This appears to have missed the context that the project is aimed at understanding the learning principles directly, rather than just using it. No clear answer was provided.
- A sharp observation was made, 'Claude Code is a harness (execution framework) for calling LLMs and executing tools, not something that can be trained itself. Are you using the term incorrectly?' It's important to understand that the project name is an homage to Claude Code, not a claim to train the actual Claude Code.
- A cautionary comment was made that there is another open-source project with a similar name, nanocoder (https://github.com/Nano-Collective/nanocoder), which could cause confusion.
- There was positive feedback that the content was well-written and easy to understand even for those with no ML experience, while some skeptical comments demanded verification, asking 'Does Anthropic actually use this method, and does it actually work?'
How to Apply
- If you want to create an alignment-trained model using the Constitutional AI approach, apply to the Google TRC program to receive free TPU access and start by training nanocode-d20 (477M parameters). You can run the entire pipeline in $34 and 1.5 hours.
- If you want to create a coding agent that reflects your company or team's unique coding style, rules, and values, refer to the nanocode pipeline to create synthetic data based on your own SOUL.md and train it with preference optimisation.
- If you are new to JAX and TPU-based training infrastructure, reading the nanochat project first will significantly lower the learning curve, as nanocode uses almost the same commands and structure.
- If you want to experiment in an NVIDIA GPU environment, nanocode does run on GPUs, but keep in mind that it's based on TPU-optimized code, so there may be a performance difference. It's best to measure the cost/speed trade-off directly.
Terminology
Constitutional AIAn alignment technique by Anthropic that defines principles (a constitution) for the model to follow in text and trains the model to self-evaluate and revise its outputs based on those principles.
preference optimisationA learning method that creates preference data on which of several responses generated by a model is better, and adjusts the model to generate better responses more often. A core step in RLHF.
SOUL.mdA file in nanocode that defines the model's values and behavioral principles. It acts as the 'constitution' in Constitutional AI and determines the model's personality and judgment criteria.
agentic interfaceAn interface that defines how the model interacts with external tools such as code execution, file reading, and terminal command execution. It allows the model to perform actual tasks beyond simply generating text.
spot instanceA way to rent unused computing resources in the cloud at a low cost. It can be reclaimed at any time and is also called a 'preemptible instance'. The author was able to maintain the same pod for over a week, as it was rarely interrupted.
synthetic dataTraining data automatically generated by a model or program, not written by humans. It is used to reduce labeling costs and quickly secure large amounts of training data.