Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh
TL;DR Highlight
This is an educational project implementing a single-layer Transformer with 1,216 parameters in the scripting language HyperTalk (1987) and training it on a real Macintosh SE/30. It demonstrates that the core mathematics of modern LLMs works the same on hardware from 30 years ago.
Who Should Read
Developers or beginners in AI who want to deeply understand the internal workings of Transformers and backpropagation visually and at the code level. Particularly suitable for those who want to dissect all the formulas without a black box.
Core Mechanics
- This project implements a complete Transformer neural network in HyperTalk (a scripting language for HyperCard created in 1987). It operates with pure scripts without compiled code or external libraries, despite not being designed for matrix operations.
- The model size is a single-layer, single-head Transformer with 1,216 parameters. While extremely small compared to GPT-4's approximately 1 trillion parameters, the mathematics of the training loop – forward pass → loss calculation → backward pass → weight update – is completely identical.
- The learning task is bit-reversal permutation, the first step of the Fast Fourier Transform (FFT), which rearranges the order by reversing the binary representation of each position index. For example, in a sequence of 8 elements, position 1 (001) moves to position 4 (100).
- The implementation includes token embedding (converting tokens to vectors), positional encoding (adding positional information), scaled dot-product self-attention (query-key-value based attention), cross-entropy loss, complete backpropagation, and stochastic gradient descent.
- You can directly see the actual formula code behind the buttons by Option-clicking in HyperCard's script editor. Learning rate changes, task replacements, and model size adjustments can all be done directly within the GUI, clearly demonstrating its purpose as a learning tool.
- The purpose of the creation is to prove that 'AI is not magic, but mathematics'. It conveys the message that the mathematics is the same whether backpropagation and attention run on a TPU cluster or a 1987 68000 processor.
- It was actually trained on a 1989 Macintosh SE/30, and a validation script written in Python (validate.py) is also provided.
Evidence
- There was a reaction that driving modern AI ideas on old hardware (like this project or running LLM inference on Windows 3.1) 'reminds us that progress is not just about bigger GPUs and more computing, but about smarter mathematics and algorithms'. It is considered closer to the spirit of early computing than the current trend of 'throwing hardware at the problem'.
- Someone who first studied backpropagation in 1988 and simultaneously fell in love with HyperCard programming reminisced, saying 'this project evokes the elegant tools of that era'.
- Information was shared that it can be run in the HyperCard Simulator (hcsimulator.com). It works well enough in the simulator even without XCMD (external compile command), and a directly imported link (https://hcsimulator.com/imports/MacMind---Trained-69E0132C) was provided.
- There was a comment asking where the source code of the actual HyperCard stack (.img file) is, as only the Python validation script is in the GitHub repository. This reflects the interest of developers who want to see the HyperTalk code directly.
- There was a philosophical comment that 'modern concepts are modern simply because no one thought of them at the time', and that this project feels like delivering germ theory to ancient Greece. It provides context that technological advancement is the advancement of means of implementation rather than the invention of fundamental concepts.
How to Apply
- If you want to study the mathematics of Transformer attention and backpropagation not in theory but in actual working code, open MacMind directly in the HyperCard Simulator (hcsimulator.com) and step through the formula implementation code by Option-clicking each button. It is implemented with pure formulas without external libraries, allowing for a clear understanding of the concepts.
- When explaining the Transformer learning process to junior developers or non-ML backend developers, you can use MacMind's learning task (bit-reversal permutation) and 1,216 parameter model as an example to intuitively explain the 'forward pass → loss → backward pass → weight update' loop.
- If you want to quickly experiment with the impact of hyperparameters such as learning rate and model size, you can modify the values in the HyperCard script editor and rerun. The experimental environment is completely transparent, making it easy to track the impact of each change on the results.
Terminology
Related Papers
CS336: Language Modeling from Scratch
Stanford에서 운영하는 LLM 전 과정 구현 강의로, 토크나이저부터 데이터 수집, 트랜스포머 구현, 분산 학습, RL 기반 정렬까지 직접 코딩하며 배운다. 이론이 아닌 구현 중심이라 실제로 LLM이 어떻게 작동하는지 깊이 이해하고 싶은 개발자에게 가장 체계적인 커리큘럼 중 하나다.
Token-Level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection
HuggingFace에서 다운받는 LoRA 어댑터에 백도어를 숨길 수 있고, 이를 탐지하는 방법도 있다.
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
LLM이 자기 자신의 RLHF 학습 과정을 조작해 편향을 증폭시키는 구조적 취약점을 발견했다.
PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play
단일 모델 self-play의 고질적 문제인 '난이도 붕괴'를 교사-학생 LoRA 집단의 공진화(co-evolution)로 해결한 연구로, 수학·코드 벤치마크 다수에서 baseline을 뛰어넘었다.
Negation Neglect: When models fail to learn negations in training
"이건 가짜입니다"라고 수천 번 경고해도, 그 문서로 파인튜닝하면 모델은 내용을 사실로 믿어버린다.
Conceptors for Semantic Steering
LLM의 hidden state에 행렬 기반 'conceptor'를 끼워서 감정·정치성향·우울 같은 개념을 재학습 없이 정밀하게 조종하는 방법