The maths you need to start understanding LLMs
TL;DR Highlight
You only need high school-level vector and matrix math to understand how LLMs reason — this post walks through it step by step.
Who Should Read
Backend and full-stack developers who want to understand how LLMs work from the ground up. Especially suited for those without a deep learning background who want to grasp AI internals.
Core Mechanics
- Understanding LLM inference only requires high school-level vectors and matrix operations. Deeper math is needed for research/training, but this level suffices for understanding 'how it works.'
- Vectors aren't just arrays of numbers — they represent direction and distance in n-dimensional space. In LLMs, vectors are the core tool for expressing semantics numerically; similar concepts sit close together in vector space.
- Token embedding converts words or subwords into vectors of hundreds to thousands of dimensions. These embeddings are the LLM's input, followed by the transformer network's massive computations.
- Cosine similarity measures the angle between two vectors to judge semantic similarity. This is exactly what RAG uses to compare user queries with documents.
- LLM output is a logit vector — raw scores for each token's probability of coming next. Applying softmax exponentiates each value and normalizes so they sum to 1, creating a probability distribution.
- Softmax is based on the exponential function (exp) taught in high school. Its property of amplifying large values lets the LLM strongly prefer certain next-token candidates.
- LLM internals are essentially repeated addition and multiplication. The math itself is simple, but why networks with over 1 trillion parameters work so well remains not fully explained.
- Embeddings are just the input stage of the LLM. The actual 'intelligence' comes from transformer networks with 1.8T+ parameters, and what exactly happens inside remains a black box.
Evidence
- A physics MSc-turned-developer noted that vectors, linear algebra, and entropy concepts unused throughout their career came alive while studying LLMs. Backprop is tensor calculus, and everything is matrix multiplication — perfectly aligned with a physics background.
- Multiple people shared that actively coding along with Karpathy's LLM video series (not just watching) was decisive for understanding. Some said knowing how CPUs work is enough for practical purposes.
- Comments criticized the title as misleading — the article explains 'how LLMs compute internally,' not 'a mathematical explanation of why LLMs work.' LLM explainability research covers that domain and remains incomplete.
- The danger of multi-agent chains was discussed given that LLMs output logits. Uncertainty compounds with each call — one developer reported complete collapse after 3 chains and recommended single orchestrator + human-in-the-loop.
- A commenter claiming to have developed early math in this space at Google generated buzz — they researched behavior prediction and UI decision prediction beyond language vectors but hit limits without attention layers, and the entire team was laid off under Sundar Pichai.
How to Apply
- If building a RAG system, try implementing cosine similarity logic yourself based on this article. The basic pattern: embed document chunks, rank top-N by cosine similarity with the query vector, then apply reranking.
- When designing multi-LLM agent chain architectures, minimize call count and have a single orchestrator make all decisions. Uncertainty compounds per call, so carefully evaluate chains of 3+ calls.
- To study LLM math deeper, code along with Karpathy's YouTube series (don't just watch) and supplement with Sebastian Raschka's 'Build a Large Language Model (from Scratch).'
- For systematic linear algebra foundations, combine Deeplearning.AI's 'Mathematics for Machine Learning and Data Science Specialization' (Coursera, ~$50/month) with the book 'Math and Architectures of Deep Learning.'
Terminology
Related Papers
Show HN: Bible as RAG Database
성경 전체를 RAG(검색 증강 생성) 데이터베이스로 인덱싱해 주제나 키워드로 관련 성경 구절을 의미론적으로 검색할 수 있는 웹 서비스다. 종교 텍스트에 RAG를 적용한 실용적 예시로, 유사한 프로젝트를 만들려는 개발자에게 참고가 된다.
Haystack: Open-Source AI Framework for Production Ready Agents, RAG
deepset이 만든 오픈소스 AI 오케스트레이션 프레임워크로, LangChain의 대안으로 주목받고 있으며 모듈형 파이프라인 방식으로 RAG·Agent·멀티모달 앱을 프로덕션까지 구축할 수 있다.
We built a persistent agent memory layer on Elasticsearch with 0.89 recall
AI 에이전트가 세션이 끝나도 사용자 정보를 기억할 수 있도록 Elasticsearch 위에 구축한 멀티테넌트 장기 메모리 시스템 아키텍처 공개. 168개 질문 기준 R@10 0.89, 테넌트 간 데이터 누출 0건을 달성한 구체적인 구현 방법을 담았다.
TAHOE: Text-to-SQL with Automated Hint Optimization from Experience
LLM이 SQL 생성 실패에서 배운 힌트를 재사용 가능한 Hint Bank로 쌓아, 모델 재학습 없이 Snowflake 방언 SQL 정확도를 대폭 끌어올리는 시스템.
Inside FAISS: Billion-Scale Similarity Search
FAISS가 수십억 개 벡터를 빠르게 검색하는 핵심 알고리즘인 IVF(파티셔닝)와 Product Quantization(압축)을 시각적으로 설명한 글로, RAG나 벡터 검색 시스템을 구축하는 개발자에게 내부 동작 원리를 이해시켜 준다.
Show HN: Airbyte Agents – context for agents across multiple data sources
Airbyte가 Slack, Salesforce, Linear 등 여러 SaaS 시스템의 데이터를 미리 인덱싱해서 Agent가 API를 일일이 뒤지지 않아도 되는 Context Store를 출시했다. 기존 MCP 방식보다 토큰을 최대 90%까지 줄이는 효과를 확인했다.
Related Resources
- https://www.gilesthomas.com/2025/09/maths-for-llms
- https://www.youtube.com/watch?v=7xTGNNLPyMI
- https://www.manning.com/books/build-a-large-language-model-from-scratch
- https://github.com/stared/thinking-in-tensors-writing-in-pytorch
- https://www.coursera.org/specializations/mathematics-for-machine-learning-and-data-science
- https://www.manning.com/books/math-and-architectures-of-deep-learning
- https://www.manning.com/books/deep-learning-with-python
- http://wordvec.colorado.edu/website_how_to.html