History LLMs: Models trained exclusively on pre-1913 texts | AI Paper Digest

TL;DR Highlight

A 4B parameter LLM family trained from scratch on 80B tokens of historical text through 1913 — it embodies the pre-WWI worldview and can't know about anything after.

Who Should Read

ML researchers interested in domain-specific pretraining and historical NLP, and digital humanities scholars exploring AI for historical text analysis.

Core Mechanics

The model family was trained exclusively on historical texts from before 1913 — newspapers, books, letters, government documents — containing no modern vocabulary, concepts, or references.
The result is a model that genuinely 'thinks' in the idiom and worldview of the late 19th/early 20th century: it doesn't know about WWI, modern physics, or computing.
This makes it useful for period-accurate text generation, historical document analysis, and studying how language and reasoning patterns have changed over time.
The 80B token pretraining corpus is notable — assembling high-quality historical text at this scale required significant digitization and cleaning effort.
The model family (4B parameters) is small enough to run locally, making it accessible for humanities research without GPU cluster requirements.
Evaluation showed the model excels at historical text completion and period-appropriate prose generation, but obviously fails at any task requiring modern knowledge.

Evidence

Demo outputs shared in the HN thread showed convincingly period-accurate prose — no anachronisms, appropriate vocabulary register, and correct historical references.
Digital humanities researchers in the comments expressed genuine excitement, noting this fills a gap in current NLP tools for pre-modern text analysis.
Debate about the value of 'isolated' period models vs fine-tuning a modern model on historical data — the argument for isolation is that modern training data introduces anachronistic reasoning patterns.
Historians noted potential use in transcribing and extending damaged historical documents where period-accurate language modeling is crucial.

How to Apply

For digital humanities: use the model for historical document completion, transcription assistance, and period-accurate text generation without worrying about modern contamination.
For NLP researchers: this is a useful probe model for studying how language and conceptual structure have changed — compare outputs on the same prompts to a modern model.
If you're building historical education tools or games, this model provides a unique source of period-appropriate generated content.
The corpus assembly methodology (80B tokens of pre-1913 text) is itself worth studying for researchers building other domain-specific or temporal pretraining datasets.

Terminology

Domain-specific pretrainingTraining a language model from scratch on a curated corpus focused on a specific domain, time period, or genre rather than general web text.

Temporal isolationA training methodology where the data cutoff is a deliberate design choice, preventing the model from learning about events or concepts after a specific date.

Digital humanitiesAcademic field applying computational methods to humanistic research — history, literature, linguistics — including NLP for historical text analysis.

Related Papers

Related Resources

https://github.com/DGoettlich/history-llms