Claude

Latest 60 papers on Claude.

Claude Code sends 33k tokens before reading the prompt; OpenCode sends 7k
동일한 모델과 작업 환경에서 Claude Code와 OpenCode의 실제 토큰 사용량을 API 레벨에서 측정한 결과, Claude Code가 시스템 프롬프트 오버헤드만으로 OpenCode 대비 4.7배 더 많은 토큰을 소비한다는 것을 확인했다.
GPT-5.6, Grok 4.5, Claude, and Muse Spark build the same 4 apps
12개 LLM 모델에게 레이캐스터 미로, 루빅스 큐브, 계산기, Game of Life 앱을 각각 5번씩 만들게 해서 성공률·비용·속도를 비교한 실전 벤치마크다. GPT-5.6 Sol이 전반적으로 가장 일관된 결과를 냈고, Grok 4.5는 가성비 면에서 눈에 띄었다.
Geosql: A Claude/Codex skill for geospatial data
PostGIS, BigQuery, Snowflake 등에서 지리공간 데이터를 다룰 때 Claude/Codex/GitHub Copilot에 설치해서 SQL 생성과 지도 렌더링까지 자동화해주는 오픈소스 Skill이다.
Show HN: Rowboat – Open-source, local-first alternative to Claude Desktop
이메일, 미팅, Slack, 코드 등 업무 데이터를 로컬 지식 그래프로 인덱싱하고 백그라운드 에이전트로 자동화해주는 오픈소스 데스크톱 AI 비서 앱이다. Claude Desktop처럼 쓰되 훨씬 더 풍부한 업무 컨텍스트와 자체 작업 화면을 제공한다는 점에서 주목할 만하다.
LLM-as-a-Verifier: A General-Purpose Verification Framework
LLM의 토큰 확률 분포를 활용해 discrete 점수 대신 continuous 점수를 뽑아내면, 추가 학습 없이 코딩·로봇·의료 에이전트 평가 정확도를 SOTA로 끌어올릴 수 있다.
Agent Data Injection Attacks are Realistic Threats to AI Agents
JSON 구분자를 살짝 바꿔 넣는 것만으로 Claude Code, Codex, Gemini CLI에서 원격 코드 실행이 가능한 새로운 AI 에이전트 공격 기법 발견.
Claude-real-video － any LLM can watch a video
YouTube URL이나 로컬 영상 파일에서 장면 변화 기반으로 핵심 프레임만 추출하고 음성 전사까지 해서 LLM에게 넘겨주는 오픈소스 도구. Claude는 영상 파일을 못 받고, ChatGPT는 자막만 읽고, Gemini는 고정 1fps 샘플링이라는 한계를 모두 우회한다.
Distributed Attacks in Persistent-State AI Control
AI 코딩 에이전트가 여러 PR에 걸쳐 악성 코드를 분산 삽입하면 단일 모니터로는 탐지가 사실상 불가능하다는 걸 실험으로 증명.
Show HN: Smart model routing directly in Claude, Codex and Cursor
프롬프트마다 적합한 AI 모델을 50ms 이내에 자동으로 선택해주는 프록시 라우터로, API 비용을 40~70% 절감할 수 있다고 주장하는 오픈소스 도구다. 단, 프롬프트 캐싱 손실 문제로 커뮤니티 반응은 엇갈린다.
Ask HN: Anthropic banned me from using Claude Code and I don't know what to do
VPN 사용 또는 동일 카드 재사용으로 Anthropic Claude Code 계정이 이유 불명으로 정지당한 사용자의 사례와, 커뮤니티에서 나온 대안 및 우회 방법 논의.
Show HN: Recall – Local project memory for Claude Code
Claude Code 세션이 끝날 때마다 프로젝트 컨텍스트를 처음부터 다시 설명해야 하는 문제를 외부 API 없이 로컬에서 해결하는 Python 기반 오픈소스 도구다.
Claude: Elevated errors across many models [resolved]
2026년 6월 16일 약 2시간 동안 Claude의 Sonnet, Opus, Haiku 모델 전반에 걸쳐 10% 수준의 오류율이 발생한 인시던트 보고서. Claude API에 의존하는 서비스 운영자에게 장애 대응 방식과 신뢰성 문제를 다시 생각하게 만드는 사건.
How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation
공격자가 웹에 조작 페이지를 올리면 LLM 검색 에이전트가 그걸 사실처럼 추천해버리는 취약점을 13개 모델에서 체계적으로 측정한 연구.
Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?
Hacker News에서 Claude/GPT를 로컬 LLM으로 완전 대체한 개발자들의 실제 셋업과 성능 경험담을 공유한 스레드로, Qwen3.6 35B를 중심으로 구체적인 하드웨어·속도·한계점까지 담겨 있어 로컬 AI 코딩 도입을 고민하는 개발자에게 현실적인 참고 자료가 된다.
Show HN: Script to bulk delete Claude chats from the web UI
claude.ai의 '전체 선택' 버튼이 화면에 보이는 항목만 선택하는 한계를 내부 API를 직접 호출해 우회하는 스크립트로, 모든 대화를 한 번에 삭제할 수 있다.
Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use
Claude Desktop Windows 앱이 사용자가 AI 코드 실행 기능(Cowork)을 쓰지 않아도 실행 시마다 자동으로 1.8GB짜리 Hyper-V 가상머신을 생성해 메모리를 잡아먹는 버그가 보고됐다.
Did Claude increase bugs in rsync?
rsync 프로젝트에 Claude AI가 도입된 이후 버그가 늘었다는 소셜 미디어 주장을 실제 데이터와 통계 분석으로 검증한 글로, 결론적으로 Claude 도입 후 릴리즈가 역사적 분포에서 유독 버그가 많다는 통계적 근거는 없었다.
Anthropic's open-source framework for AI-powered vulnerability discovery
Anthropic이 Claude를 활용해 코드 취약점을 자율적으로 탐지·트리아지·패치하는 오픈소스 레퍼런스 구현체를 공개했다. 실제 보안팀과의 협업 경험을 바탕으로 만들어진 파이프라인이라 실전 적용성이 높다.
Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals
서버가 SSH 배너나 DB NOTICE로 'AI 에이전트는 접근하지 마세요' 신호를 보내면 GPT-4o, Claude Code 같은 LLM 에이전트가 실제로 물러나는지 실험으로 측정했다.
Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs
Claude Code를 터미널 AI 코딩 도구로 제대로 쓰기 위한 Claude.md 설정, 서브에이전트, 플러그인, MCP 연동 실전 가이드
Retrying vs Resampling in AI Control
Claude Code처럼 의심 행동을 막고 재시도하는 방식이 오히려 공격자에게 힌트를 줘서 더 위험할 수 있다는 연구.
Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents
LLM 에이전트가 '100개 찾아줘'를 실제로 100개 찾을 때까지 멈추지 않게 만드는 방법과 벤치마크.
HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools
FastAPI HTTP 엔드포인트와 MCP 도구를 하나의 폴더에서 자동으로 동시에 만들어주는 Python 프레임워크
AMEL: Accumulated Message Effects on LLM Judgments
LLM을 자동 평가자로 쓸 때 이전 대화 기록의 긍정/부정 분위기가 이후 판단을 오염시킨다는 걸 75,898개 API 호출로 증명한 연구.
Less Back-and-Forth: A Comparative Study of Structured Prompting
체크리스트 형식으로 프롬프트를 구조화하면 LLM 답변 품질도 높아지고 토큰도 적게 쓴다.
How Claude Code works in large codebases
Anthropic이 수백만 줄짜리 모노레포, 레거시 시스템, 수십 개 마이크로서비스 환경에서 Claude Code를 운영한 패턴을 정리한 글이다. RAG 방식 대신 에이전틱 검색을 쓰는 이유와 실제 현장의 한계를 함께 확인할 수 있다.
Tell HN: Dont use Claude Design, lost access to my projects after unsubscribing
Claude Design 구독을 해지했더니 기존 프로젝트에 접근이 완전히 차단됐다는 사용자 경고로, AI 도구에 중요한 작업물을 의존할 때의 리스크를 잘 보여주는 사례다.
Show HN: adamsreview – better multi-agent PR reviews for Claude Code
Claude Code에서 최대 7개의 병렬 서브 에이전트가 각각 다른 관점으로 PR을 리뷰하고, 자동 수정까지 해주는 오픈소스 플러그인이다. 기존 /review나 CodeRabbit보다 실제 버그를 더 많이 잡는다고 주장하지만 커뮤니티에서는 복잡도와 실효성에 대한 회의론도 나왔다.
How Fast Does Claude, Acting as a User Space IP Stack, Respond to Pings?
Claude Code에게 IP 패킷을 직접 파싱하고 ICMP echo reply를 구성하도록 시켜서 실제로 ping에 응답하게 만든 실험으로, 'Markdown이 곧 코드이고 LLM이 프로세서'라는 아이디어를 네트워크 스택 수준까지 밀어붙인 재미있는 사례다.
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
Natural Language Autoencoders: Turning Claude's Thoughts into Text
Anthropic이 LLM 내부의 숫자 벡터(활성화값)를 직접 읽을 수 있는 자연어로 변환하는 NLA 기법을 공개했다. AI가 실제로 무슨 생각을 하는지 해석하는 interpretability 연구의 새로운 진전이다.
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents
티켓 3장으로 쪼개면 Claude/GPT도 보안 취약점 코드를 53~86% 확률로 그냥 짜준다.
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
고정된 파이프라인 대신 추론 중 언제든 DB를 탐색·실행할 수 있는 Text-to-SQL 에이전트로 Spider2.0 벤치마크에서 gpt-o3, DeepSeek-R1 기반 시스템을 더 작은 모델로 능가
Claude.ai unavailable and elevated errors on the API
Anthropic’s entire service suite—Claude.ai, the API, Claude Code—became inaccessible for 1 hour and 18 minutes (17:34–18:52 UTC), sparking outrage among enterprise users over reliability concerns.
EvanFlow – A TDD driven feedback loop for Claude Code
EvanFlow automates code brainstorming, TDD, and validation in Claude Code with 16 skills triggered by a single prompt.
Tell HN: Claude 4.7 is ignoring stop hooks
Anthropic’s Claude Code reveals a security feature designed to ignore instructions within tool results inadvertently disables stop hooks, prompting workarounds and bug reports.
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
AI coding agents consume over 1200x more tokens than standard chat, yet performance doesn’t improve with increased usage.
I cancelled Claude: Token issues, declining quality, and poor support
Anthropic’s Claude Code Pro experienced a three-week decline in speed, token allowance, and support quality, sparking a community discussion among developers.
Anthropic's Claude Desktop App Installs Undisclosed Native Messaging Bridge
Anthropic’s Claude Desktop app installs a Native Messaging Bridge alongside the application, enabling browser and local app communication without explicit user consent, sparking debate within the community.
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
대화 기록 없이 독립된 요청만으로 LLM 안전장치를 점진적으로 무력화하는 새로운 공격 기법 TTI를 소개합니다.
Show HN: Ctx – a /resume that works across Claude Code and Codex
ctx builds a local CLI tool capable of maintaining and branching conversational context between Claude Code and OpenAI Codex, benefiting developers who want seamless AI coding sessions.
Claude Token Counter, now with model comparisons
Anthropic’s Claude Opus 4.7 consumes up to 46% more tokens than its predecessor on the same input due to a tokenizer change, effectively raising costs.
Show HN: SPICE simulation → oscilloscope → verification with Claude Code
This is an experimental case demonstrating that connecting a SPICE simulator and a real oscilloscope to Claude Code via an MCP server allows for creating a feedback loop where AI directly analyzes and verifies simulation results and actual waveform data.
Show HN: CodeBurn – Analyze Claude Code token usage by task
An open-source tool that visualizes where and how much tokens are consumed in AI coding tools with a terminal dashboard, operating by reading only local session files without the need for separate API keys or proxies.
Show HN: I built a social media management tool in 3 weeks with Claude and Codex
**SoloDev built a Buffer/Sendible alternative open-source social media management platform in 3 weeks by leveraging AI coding tools like Claude Opus and OpenAI Codex.**
Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68%
Reports have emerged indicating a 15%p decrease in accuracy on the BridgeBench hallucination benchmark for the Claude Opus 4.6 model, sparking debate within the community regarding whether this represents a genuine performance degradation or simply noise.
Show HN: Claudraband – Claude Code for the Power User
Claudraband is a CLI/library tool that wraps Claude Code TUI, allowing you to maintain sessions and control it headlessly via an HTTP daemon or ACP server. It's worth paying attention to for developers who want to integrate Claude Code into automated workflows.
Reallocating $100/Month Claude Code Spend to Zed and OpenRouter
This article shares how a developer, tired of usage limits with the Claude Code Max plan ($100/month), switched to a combination of Zed editor ($10/month) + OpenRouter (pay-as-you-go), gaining credit rollover and freedom in model selection.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
I gave Claude my dead game's 30-year-old files and asked it to bring the game back to life
This is a user experience where Claude Code reconstructed an entire online multiplayer game from 1992 based solely on script files and manuals, after the original source code was lost.
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.
System Card: Claude Mythos Preview [pdf]
Anthropic released a 244-page System Card detailing Claude Mythos Preview, which achieved overwhelming benchmark scores, including 93.9% on SWE-bench Verified, but also exhibited risky behaviors such as sandbox escapes and unauthorized file modification with git history concealment.
Assessing Claude Mythos Preview's cybersecurity capabilities
Anthropic's new model, Claude Mythos Preview, has reached a level where it can autonomously discover and even create exploits for zero-day vulnerabilities in major OS and browsers, demonstrating a dramatic performance improvement over previous models and signaling a time for urgent response across the security industry.
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
A simple anonymization technique to detect when an LLM analyzes based on its memorized knowledge instead of the data.
Claude Code is locking people out for hours
Claude Code is experiencing repeated service stability issues such as OAuth timeouts, query slowdowns, and malfunctioning background agents. Concerns are growing that this is not simply a bug, but a structural problem related to Anthropic's compute capacity limits.
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
We actually hacked AI Agents connected to Gmail, Stripe, and the file system, and even the strongest models showed a 44% attack success rate.
Issue: Claude Code is unusable for complex engineering tasks with Feb updates
Anthropic has been quietly reducing the depth of Claude's thinking since February and deploying features to hide this, a case demonstrably proven through actual log analysis. It has been revealed that the performance degradation felt by subscription plan users is not a figment of their imagination but is due to actual system changes.
After months with Claude Code, the biggest time sink isn't bugs — it's silent fake success
A pattern where AI agents hide errors and create 'seemingly successful' results with fake data, and practical methods to prevent this using CLAUDE.md.
Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code
This article explains how to run the Google Gemma 4 26B-A4B model locally on macOS using LM Studio 0.4.0's lms CLI and integrate it with Claude Code. Thanks to the MoE architecture, it can run at 51 tok/s on a 48GB MacBook Pro, enabling coding tasks without API costs.
Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs
An open-source library that allows you to train a 1.3B parameter coding agent model from scratch on a $200 (approximately 270,000 KRW) TPU, following Anthropic's Constitutional AI approach. It can serve as a hands-on reference for developers who want to directly understand the entire AI training pipeline.