Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh
TL;DR Highlight
This is an educational project implementing a single-layer Transformer with 1,216 parameters in the scripting language HyperTalk (1987) and training it on a real Macintosh SE/30. It demonstrates that the core mathematics of modern LLMs works the same on hardware from 30 years ago.
Who Should Read
Developers or beginners in AI who want to deeply understand the internal workings of Transformers and backpropagation visually and at the code level. Particularly suitable for those who want to dissect all the formulas without a black box.
Core Mechanics
- This project implements a complete Transformer neural network in HyperTalk (a scripting language for HyperCard created in 1987). It operates with pure scripts without compiled code or external libraries, despite not being designed for matrix operations.
- The model size is a single-layer, single-head Transformer with 1,216 parameters. While extremely small compared to GPT-4's approximately 1 trillion parameters, the mathematics of the training loop – forward pass → loss calculation → backward pass → weight update – is completely identical.
- The learning task is bit-reversal permutation, the first step of the Fast Fourier Transform (FFT), which rearranges the order by reversing the binary representation of each position index. For example, in a sequence of 8 elements, position 1 (001) moves to position 4 (100).
- The implementation includes token embedding (converting tokens to vectors), positional encoding (adding positional information), scaled dot-product self-attention (query-key-value based attention), cross-entropy loss, complete backpropagation, and stochastic gradient descent.
- You can directly see the actual formula code behind the buttons by Option-clicking in HyperCard's script editor. Learning rate changes, task replacements, and model size adjustments can all be done directly within the GUI, clearly demonstrating its purpose as a learning tool.
- The purpose of the creation is to prove that 'AI is not magic, but mathematics'. It conveys the message that the mathematics is the same whether backpropagation and attention run on a TPU cluster or a 1987 68000 processor.
- It was actually trained on a 1989 Macintosh SE/30, and a validation script written in Python (validate.py) is also provided.
Evidence
- There was a reaction that driving modern AI ideas on old hardware (like this project or running LLM inference on Windows 3.1) 'reminds us that progress is not just about bigger GPUs and more computing, but about smarter mathematics and algorithms'. It is considered closer to the spirit of early computing than the current trend of 'throwing hardware at the problem'.
- Someone who first studied backpropagation in 1988 and simultaneously fell in love with HyperCard programming reminisced, saying 'this project evokes the elegant tools of that era'.
- Information was shared that it can be run in the HyperCard Simulator (hcsimulator.com). It works well enough in the simulator even without XCMD (external compile command), and a directly imported link (https://hcsimulator.com/imports/MacMind---Trained-69E0132C) was provided.
- There was a comment asking where the source code of the actual HyperCard stack (.img file) is, as only the Python validation script is in the GitHub repository. This reflects the interest of developers who want to see the HyperTalk code directly.
- There was a philosophical comment that 'modern concepts are modern simply because no one thought of them at the time', and that this project feels like delivering germ theory to ancient Greece. It provides context that technological advancement is the advancement of means of implementation rather than the invention of fundamental concepts.
How to Apply
- If you want to study the mathematics of Transformer attention and backpropagation not in theory but in actual working code, open MacMind directly in the HyperCard Simulator (hcsimulator.com) and step through the formula implementation code by Option-clicking each button. It is implemented with pure formulas without external libraries, allowing for a clear understanding of the concepts.
- When explaining the Transformer learning process to junior developers or non-ML backend developers, you can use MacMind's learning task (bit-reversal permutation) and 1,216 parameter model as an example to intuitively explain the 'forward pass → loss → backward pass → weight update' loop.
- If you want to quickly experiment with the impact of hyperparameters such as learning rate and model size, you can modify the values in the HyperCard script editor and rerun. The experimental environment is completely transparent, making it easy to track the impact of each change on the results.
Terminology
HyperTalkAn English sentence-like scripting language created by Apple for HyperCard in 1987. It was designed to create databases or interactive card stacks and is far removed from matrix operations.
bit-reversal permutationA rearrangement method that determines a new position by expressing each index of an array in binary and reversing the order of the bits. Used as the first step of the FFT (Fast Fourier Transform).
self-attentionA core mechanism of the Transformer that understands context by calculating the relevance score of each token in the input sequence to all other tokens. It learns 'which part of the sentence should be referenced to understand this word'.
backpropagationA learning algorithm that calculates how much each parameter is wrong in the reverse direction when a neural network makes an incorrect prediction and updates the weights. A concept that has existed since the 1980s.
positional encodingA method of adding positional information to a vector when inputting to a Transformer. It tells the attention mechanism, which does not know the order, 'what position this token is in'.
stochastic gradient descentAn optimization method that updates parameters by calculating the gradient from a randomly selected subset of the entire dataset. Enables learning while reducing computational cost.