2025: The Year in LLMs
TL;DR Highlight
Simon Willison's comprehensive 2025 LLM ecosystem retrospective covers reasoning models, agents, vibe coding, MCP, and everything else developers need to know.
Who Should Read
Developers who want a single well-synthesized summary of what changed in LLMs in 2025 and what it means for practitioners.
Core Mechanics
- Reasoning models (o3, Claude 3.5 Sonnet thinking, Gemini 2.0 Flash Thinking) became mainstream — the ability to 'think before answering' measurably improves accuracy on complex tasks.
- Agentic AI went from experimental to production — multi-step tool-using agents are now deployed in real workflows, with MCP (Model Context Protocol) emerging as a standardization layer.
- Vibe coding became a real phenomenon: a meaningful fraction of shipped code in 2025 was written primarily by AI with humans in a supervisory role.
- Context windows exploded — 1M+ token windows became available, changing what's possible for document processing and long-session agents.
- Open-source models closed the gap with closed frontier models significantly — running frontier-tier performance locally became possible for the first time.
- Multimodal capabilities (vision, audio, video) matured from toy features to practical tools in several product categories.
- The MCP ecosystem grew rapidly — dozens of server implementations enabling Claude and other models to connect to external tools and data sources.
Evidence
- Willison is a highly respected voice in the developer community (creator of Django, Datasette) — his annual reviews are widely read and trusted for their practicality.
- The post synthesizes his own experiments plus broader community evidence, with links to specific papers, announcements, and examples throughout.
- HN discussion validated most of his observations, with commenters adding specific experiences — particularly around agentic workflows and vibe coding adoption.
- Several readers noted the retrospective is unusually balanced — acknowledging both genuine progress and real limitations without being either dismissive or hype-driven.
How to Apply
- Use this as an orientation document for bringing teammates up to speed on the AI landscape — it's dense but well-structured.
- For technical leads: use the reasoning model section to evaluate whether your current model choices are still appropriate, given how much the reasoning tier has improved.
- The MCP section is particularly actionable — if you haven't evaluated the MCP ecosystem for your agent tooling needs, this is a good starting point.
- For PMs: the 'vibe coding' and 'agentic AI' sections have concrete examples of what organizations are actually shipping — useful for calibrating what's realistic to build.
Terminology
Related Papers
Show HN: OpenKnowledge – open source AI-first alternative to Obsidian/Notion
Git 기반 동기화와 Claude/Codex/Cursor 연동을 내장한 로컬 우선 마크다운 에디터로, AI 에이전트의 두 번째 뇌(LLM Wiki)로 활용할 수 있는 오픈소스 도구다.
The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems
AI 에이전트가 자신의 안전장치를 우회할 수 없도록, 에이전트 프로세스 바깥에 수학적으로 증명된 강제 통제 게이트를 배치하는 아키텍처
RubyLLM: A Ruby framework for all major AI providers
OpenAI, Claude, Gemini 등 주요 AI 프로바이더를 단일 인터페이스로 통합한 Ruby 프레임워크로, Rails 통합과 에이전트 기능까지 지원해 Ruby 개발자가 AI 기능을 빠르게 붙일 수 있다.
Qwen-AgentWorld: Language World Models for General Agents
Alibaba Qwen 팀이 AI 에이전트가 행동 결과를 미리 시뮬레이션할 수 있는 'Language World Model'을 공개했다. 에이전트 훈련과 실행 경로 검증에 새로운 패러다임을 제시하는 연구다.
SHERLOC: Structured Diagnostic Localization for Code Repair Agents
버그 위치만 알려주는 게 아니라 '왜, 어떻게 고쳐야 하는지'까지 진단 리포트를 생성해서 코드 수정 에이전트의 성능을 높이는 training-free 프레임워크
Show HN: peerd – AI agent harness that runs entirely in your browser
백엔드 서버 없이 Chrome/Firefox 확장 프로그램으로만 동작하는 AI 에이전트 실행 환경으로, 브라우저 탭을 직접 조작하고 WASM Linux VM까지 구동할 수 있어 프라이버시와 보안을 동시에 챙길 수 있다.