Show HN: CodeBurn – Analyze Claude Code token usage by task
TL;DR Highlight
An open-source tool that visualizes where and how much tokens are consumed in AI coding tools with a terminal dashboard, operating by reading only local session files without the need for separate API keys or proxies.
Who Should Read
Developers who use AI coding tools such as Claude Code, Cursor, and Codex daily and want to understand their costs and identify tasks that consume a lot of tokens.
Core Mechanics
- CodeBurn shows token usage for major AI coding tools like Claude Code, OpenAI Codex, Cursor, OpenCode, Pi, and GitHub Copilot, categorized by task type, tool, model, MCP server, and project.
- Its operation is unique: it doesn't require any wrappers, proxies, or API keys, and directly analyzes session files stored on disk by each tool. Claude Code uses paths like ~/.claude/projects/, and Codex uses ~/.codex/sessions/.
- It tracks the 'one-shot success rate' for each task type, allowing you to see which tasks are completed on the first attempt and which ones waste tokens with edit/test/fix retries.
- It's an interactive TUI (Terminal UI) dashboard that runs in the terminal, built on Ink (a terminal React framework), and supports gradient charts, responsive panels, and keyboard navigation.
- It supports various time ranges such as today, 7 days, 30 days, monthly, and all time, and also features CSV/JSON export, a macOS SwiftBar menu bar widget, and auto-refresh functionality.
- Price information is automatically cached from LiteLLM, allowing you to calculate costs for all supported models without separate configuration.
- Installation is as simple as `npm install -g codeburn`, and you can run it directly with `npx codeburn` if you have Node.js 20+. Cursor/OpenCode automatically install better-sqlite3 to read SQLite files.
- The creator revealed that they were spending about $1,400 per week on Claude Code and wanted to see where the tokens were being consumed.
Evidence
- Regarding the creator's mention of spending $1,400 per week on Claude Code, one comment stated that a $200/month plan was sufficient to run 5 agents simultaneously on a 300k LoC codebase without ever hitting the rate limit, suggesting that a flat-rate plan eliminates cost concerns compared to pay-as-you-go.
- Claudoscope (github.com/cordwainersmith/Claudoscope) and ClaudeRank (clauderank.com) were mentioned in the comments as tools with similar purposes, and commenters expressed a preference for CodeBurn's approach.
- A compatibility issue with Cursor Agent was reported, where the tool fails to recognize data if Cursor stores it in the ~/.cursor path.
- An interesting fact was shared in the comments about the terminal UI being built with Ink (React for terminals), noting that 'Claude Code itself is also made with Ink'.
- A comment suggested adding a feature to detect cost inefficiencies and propose improvements, and the creator responded positively.
How to Apply
- If you use Claude Code or Cursor daily and your bill at the end of the month is higher than expected, you can immediately run `npx codeburn` to see which projects and task types are consuming the most tokens.
- By identifying task types with low one-shot success rates, you can improve the prompts or task decomposition methods to reduce token waste from retries.
- If you're deploying AI coding tools across a team and need to justify costs, you can extract data with `codeburn report --format json` to create team- and project-based cost reports.
- If you're on macOS and want to continuously monitor token usage, you can connect it to a SwiftBar menu bar widget to view the status without opening a separate dashboard.
Code Example
# Installation
npm install -g codeburn
# Run directly without installation
npx codeburn
# Basic interactive dashboard (last 7 days)
codeburn
# Today's usage
codeburn today
# This month's usage
codeburn month
# Recent 30-day rolling window
codeburn report -p 30days
# All time
codeburn report -p all
# Output in JSON format
codeburn report --format json
# Auto-refresh every 60 seconds
codeburn report --refresh 60
# One-line summary (today + this month)
codeburn status
# Export to CSV (today/7 days/30 days)
codeburn export
# Export to JSON
codeburn export -f jsonTerminology
Related Papers
1-Bit Bonsai Image 4B Image Generation for Local Devices
4B 파라미터 이미지 생성 모델의 가중치를 1비트/3값으로 극단적으로 압축해서 iPhone에서도 돌아가게 만든 모델. 7.75GB짜리 diffusion transformer를 0.93GB까지 줄였다.
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
vLLM의 핵심 기능을 C++와 CUDA로 직접 구현하며 배울 수 있는 교육용 LLM 추론 엔진 프로젝트로, 소스코드와 단계별 강의가 함께 제공된다.
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
Kog AI가 8× AMD MI300X에서 요청당 3,000 tokens/s를 달성하는 LLM 추론 엔진을 공개했고, 기존 소프트웨어 스택의 병목을 GPU 메모리 대역폭 최대화로 풀어냈다는 내용이다.
A sleep-like consolidation mechanism for LLMs
LLM이 긴 컨텍스트를 처리할 때 발생하는 Attention 비용 문제를 해결하기 위해, 사람의 수면처럼 주기적으로 컨텍스트를 fast weight에 압축·저장하는 새로운 메커니즘을 제안한 논문이다.
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
GPU에서 Transformer 학습 시 발생하는 메모리 병목을 해결하기 위해, 정규화·활성화 등 소규모 연산들을 GEMM 출력이 칩 위에 있는 동안 함께 실행하는 커널 추상화 CODA를 소개한다. LLM이 이 추상화를 활용해 고성능 커널을 자동 생성할 수 있다는 점이 특히 주목받고 있다.
KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference
모델 수정 없이 KV 캐시를 청크 간 누산기로 쓰면 128K 토큰까지 100% 정확도로 정보를 검색할 수 있다.