Claude Haiku 4.5
TL;DR Highlight
Anthropic launched Claude Haiku 4.5, a small model delivering Sonnet 4-level coding performance at 1/3 the price and 2x+ the speed. A cost-effective option for developers needing agentic coding and real-time responses.
Who Should Read
Backend/full-stack developers running coding agents or chatbots on Claude API who want to reduce cost and latency. AI engineers looking for sub-agent models in multi-agent architectures.
Core Mechanics
- Claude Haiku 4.5 matches the coding performance of Sonnet 4 (a frontier model from 5 months ago) at 1/3 the price ($1/M input, $5/M output) and 2x+ speed.
- Achieved 90% of Sonnet 4.5 performance on Augment's agentic coding benchmark. Even outperformed Sonnet 4 on Computer Use tasks.
- Particularly useful in multi-agent setups. Anthropic directly proposed an orchestration pattern: Sonnet 4.5 decomposes complex problems and plans, while multiple Haiku 4.5 instances handle subtasks in parallel.
- Recorded statistically significantly lower misalignment behavior rates than Sonnet 4.5 and Opus 4.1 in safety evaluations — evaluated as Anthropic's safest model. Released at ASL-2 safety level.
- Available immediately in Claude Code and via API with model ID `claude-haiku-4-5`.
Evidence
- Early tests showed Haiku 4.5 avoids touching unrelated code compared to GPT-5, potentially making actual usage costs lower than the token price difference suggests. However, the 4x price increase from Haiku 3.5 ($0.25→$1/M input) was noted as a concern.
- A direct comparison found Haiku 4.5 hallucinated function outputs giving wrong answers while Sonnet was accurate — the small model hallucination limitation persists.
- On NYT Connections benchmark, Haiku 4.5 scored 20.0 (2x Haiku 3.5's 10.0) but still trails Sonnet 4.0 (26.6) and Sonnet 4.5 (46.1).
- A freelancer said '3x faster responses outweigh slight quality loss for productivity' and planned switching daily driver from Sonnet 4.5 to Haiku 4.5.
How to Apply
- In multi-agent systems, use Sonnet 4.5 as the main agent for planning/decomposition and Haiku 4.5 as parallel sub-agents to significantly cut costs while boosting throughput.
- For quick prototyping or simple code fixes in Claude Code, switching to Haiku 4.5 cuts response wait time by half or more.
- For chatbots or customer service agents where latency matters, Haiku 4.5 instead of Sonnet delivers 1/3 cost savings and improved responsiveness simultaneously.
- However, for tasks requiring accurate fact lookup or code documentation reference, maintain Sonnet due to hallucination risk, and route only simple generation/transformation tasks to Haiku.
Terminology
Related Papers
Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s
Apple Silicon에서 Swift로 직접 행렬 곱셈 커널을 구현하며 CPU, SIMD, AMX, GPU(Metal)를 단계별로 최적화해 Gflop/s에서 Tflop/s 수준까지 성능을 높이는 과정을 상세히 설명한 글이다. 프레임워크 없이 LLM 학습의 핵심 연산을 밑바닥부터 구현하고 싶은 개발자에게 Apple Silicon의 성능 한계를 체감할 수 있는 드문 자료다.
Removing fsync from our local storage engine
FractalBits가 fsync 없이 SSD 전용 KV 스토리지 엔진을 구현해 동일 조건 대비 약 65% 높은 쓰기 성능을 달성한 설계 방법을 공유했다. fsync의 메타데이터 오버헤드를 피하기 위해 사전 할당, O_DIRECT, SSD 원자 쓰기 단위 정렬 저널을 조합한 구조가 핵심이다.
Google Chrome silently installs a 4 GB AI model on your device without consent
Google Chrome이 사용자 동의 없이 Gemini Nano 4GB 모델 파일을 자동 다운로드하고, 삭제해도 재다운로드되는 문제가 발견됐다. GDPR 위반 가능성과 수십억 대 기기에 적용될 때의 환경 비용 문제가 제기되고 있다.
How OpenAI delivers low-latency voice AI at scale
OpenAI redesigned its WebRTC stack to serve real-time voice AI to over 900 million users, detailing the design decisions and trade-offs of a relay + transceiver split architecture.
Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
Deterministic Leaf Enumeration (DLE) cuts self-consistency’s redundant sampling by deterministically exploring a tree of possible sequences, simultaneously improving math/code reasoning performance and speed.