Different Language Models Learn Similar Number Representations
TL;DR Highlight
LLMs, regardless of architecture—from Transformers to LSTMs—consistently learn periodic patterns with periods T=2, 5, and 10 when representing numbers, mathematically explaining a 'convergent evolution' phenomenon beyond model architecture.
Who Should Read
ML researchers curious about how LLM internal representations are formed, and AI system developers aiming to better embed numerical reasoning capabilities into models.
Core Mechanics
- Language models trained on natural language text consistently learn dominant periodic features with periods T=2, 5, and 10 when internally representing numbers, naturally corresponding to the decimal system and even/odd distinctions.
- Researchers discovered a 'two-tiered hierarchy' within these periodic features: a spike at specific frequencies in the Fourier domain, and a 'mod-T geometrically separable' representation—though not all models exhibit the latter.
- Mathematically, sparsity in the Fourier domain is a necessary but not sufficient condition for mod-T geometric separability; periodic patterns alone do not guarantee a representation capable of linearly classifying numbers.
- Geometrically separable representations arise through two paths: learning from 'text-number co-occurrence' and 'number-number interactions' in general language data, or training on 'multi-token addition problems'.
- Structurally disparate models—Transformers, Linear RNNs, LSTMs, and even classical word embeddings—all learn similar numerical representations, a phenomenon the researchers liken to 'convergent evolution' in biology.
- The quality of numerical representations (geometric separability) is influenced by learning data, architecture, optimizer, and tokenizer—it’s not determined by any single factor.
- This research informs attempts to connect external mathematical computation circuits to LLMs (neurosymbolic approaches); the potential opens to leverage a 'common representation' if different models share compatible numerical representations.
Evidence
- "A comment on HN suggested these results support the 'Platonic Representation Hypothesis' (that models converge to a common reality representation when trained on the same data), noting shared representations could simplify connecting mathematical circuits between models. Conversely, critical comments argued the 'learning reality' phrasing is overstated, claiming models only learn statistical regularities, and warned against misusing the hypothesis to justify claims of LLM objectivity. Observations of eigenvalue distributions resembling Benford's Law prompted questions about whether this pattern is expected in human-curated text corpora. Comments also questioned whether the phenomenon stems from training data or model architecture, despite the paper noting all four factors—data, architecture, optimizer, and tokenizer—play a role. Finally, a self-promotional comment introduced 'turnstyle', a library implementing neurosymbolic programming leveraging shared representations."
How to Apply
- "To improve numerical accuracy in LLMs, fine-tuning with multi-token addition problems may be more effective than single-token addition for creating mod-T geometrically separable numerical representations. When designing systems to share or transfer numerical representations across models (e.g., sharing math modules), consider whether a T=2, 5, 10 period-based Fourier representation can serve as a common interface. Account for the impact of tokenizer design on numerical representation quality, and experiment with digit-level versus number-level tokenization for domain-specific models handling numerical data. When probing model internal representations, test for linear separability based on mod-2, mod-5, and mod-10 criteria to quickly assess the model’s numerical representation quality."
Terminology
Fourier domain sparsityA state where, when a signal is decomposed into its frequency components (via Fourier transform), only a few frequencies are strong, and the rest are close to zero. The dominance of T=2, 5, and 10 periods in numerical representations exemplifies this.
mod-T geometric separabilityThe property of a model’s internal representation space where numbers are linearly separable based on their remainder when divided by T (e.g., mod-10 corresponds to the last digit). Essentially, whether even and odd numbers are neatly separated in the internal representation.
convergent evolutionA biological term describing the independent evolution of similar traits in unrelated organisms due to similar environmental pressures (e.g., bat wings and bird wings). This paper draws a parallel to models with different architectures independently learning similar representations from the same training data.
Platonic Representation HypothesisThe hypothesis that sufficiently large models converge to a common internal representation regardless of architecture when trained on the same data, akin to Plato’s theory of Forms—the idea that a ‘true’ representation exists.
probingA technique for verifying whether specific information (e.g., part-of-speech, number parity) is encoded in a model’s internal representation by attaching a simple linear classifier and testing its performance. A simple tool for dissecting models.
neurosymbolic programmingAn approach combining the pattern recognition capabilities of neural networks with the precise reasoning abilities of symbolic logic. For example, allowing an LLM to understand language while delegating arithmetic to a separate, accurate calculator module.