The maths you need to start understanding LLMs
TL;DR Highlight
You only need high school-level vector and matrix math to understand how LLMs reason — this post walks through it step by step.
Who Should Read
Backend and full-stack developers who want to understand how LLMs work from the ground up. Especially suited for those without a deep learning background who want to grasp AI internals.
Core Mechanics
- Understanding LLM inference only requires high school-level vectors and matrix operations. Deeper math is needed for research/training, but this level suffices for understanding 'how it works.'
- Vectors aren't just arrays of numbers — they represent direction and distance in n-dimensional space. In LLMs, vectors are the core tool for expressing semantics numerically; similar concepts sit close together in vector space.
- Token embedding converts words or subwords into vectors of hundreds to thousands of dimensions. These embeddings are the LLM's input, followed by the transformer network's massive computations.
- Cosine similarity measures the angle between two vectors to judge semantic similarity. This is exactly what RAG uses to compare user queries with documents.
- LLM output is a logit vector — raw scores for each token's probability of coming next. Applying softmax exponentiates each value and normalizes so they sum to 1, creating a probability distribution.
- Softmax is based on the exponential function (exp) taught in high school. Its property of amplifying large values lets the LLM strongly prefer certain next-token candidates.
- LLM internals are essentially repeated addition and multiplication. The math itself is simple, but why networks with over 1 trillion parameters work so well remains not fully explained.
- Embeddings are just the input stage of the LLM. The actual 'intelligence' comes from transformer networks with 1.8T+ parameters, and what exactly happens inside remains a black box.
Evidence
- A physics MSc-turned-developer noted that vectors, linear algebra, and entropy concepts unused throughout their career came alive while studying LLMs. Backprop is tensor calculus, and everything is matrix multiplication — perfectly aligned with a physics background.
- Multiple people shared that actively coding along with Karpathy's LLM video series (not just watching) was decisive for understanding. Some said knowing how CPUs work is enough for practical purposes.
- Comments criticized the title as misleading — the article explains 'how LLMs compute internally,' not 'a mathematical explanation of why LLMs work.' LLM explainability research covers that domain and remains incomplete.
- The danger of multi-agent chains was discussed given that LLMs output logits. Uncertainty compounds with each call — one developer reported complete collapse after 3 chains and recommended single orchestrator + human-in-the-loop.
- A commenter claiming to have developed early math in this space at Google generated buzz — they researched behavior prediction and UI decision prediction beyond language vectors but hit limits without attention layers, and the entire team was laid off under Sundar Pichai.
How to Apply
- If building a RAG system, try implementing cosine similarity logic yourself based on this article. The basic pattern: embed document chunks, rank top-N by cosine similarity with the query vector, then apply reranking.
- When designing multi-LLM agent chain architectures, minimize call count and have a single orchestrator make all decisions. Uncertainty compounds per call, so carefully evaluate chains of 3+ calls.
- To study LLM math deeper, code along with Karpathy's YouTube series (don't just watch) and supplement with Sebastian Raschka's 'Build a Large Language Model (from Scratch).'
- For systematic linear algebra foundations, combine Deeplearning.AI's 'Mathematics for Machine Learning and Data Science Specialization' (Coursera, ~$50/month) with the book 'Math and Architectures of Deep Learning.'
Terminology
Related Papers
Show HN: Airbyte Agents – context for agents across multiple data sources
Airbyte가 Slack, Salesforce, Linear 등 여러 SaaS 시스템의 데이터를 미리 인덱싱해서 Agent가 API를 일일이 뒤지지 않아도 되는 Context Store를 출시했다. 기존 MCP 방식보다 토큰을 최대 90%까지 줄이는 효과를 확인했다.
A polynomial autoencoder beats PCA on transformer embeddings
PCA 인코더에 2차 다항식 디코더를 붙여서 닫힌 형태(closed-form)로 embedding 압축 품질을 크게 개선하는 기법으로, SGD 없이 numpy만으로 구현 가능하다.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
RAG 스타일 텍스트 검색 대신 Schema로 정의된 구조화 레코드에 메모리를 저장하면, 정확한 사실 조회·상태 추적·집계 쿼리에서 압도적으로 높은 정확도를 얻을 수 있다.
Show HN: Atomic – Local-first, AI-augmented personal knowledge base
Atomic builds a self-hosted, open-source personal knowledge graph app that automatically embeds, tags, and links notes, web clips, and RSS feeds—supporting semantic search, LLM-powered wiki synthesis, and MCP integration.
We replaced RAG with a virtual filesystem for our AI documentation assistant
Explains how Mintlify overcame RAG chunking limitations by building a virtual filesystem (ChromaFs) on top of Chroma DB that mimics UNIX commands, reducing session boot time from 46 seconds to 100ms.
Chroma Context-1: Training a Self-Editing Search Agent
Related Resources
- https://www.gilesthomas.com/2025/09/maths-for-llms
- https://www.youtube.com/watch?v=7xTGNNLPyMI
- https://www.manning.com/books/build-a-large-language-model-from-scratch
- https://github.com/stared/thinking-in-tensors-writing-in-pytorch
- https://www.coursera.org/specializations/mathematics-for-machine-learning-and-data-science
- https://www.manning.com/books/math-and-architectures-of-deep-learning
- https://www.manning.com/books/deep-learning-with-python
- http://wordvec.colorado.edu/website_how_to.html