Claude
Latest 60 papers on Claude.
Show HN: Script to bulk delete Claude chats from the web UI
claude.ai의 '전체 선택' 버튼이 화면에 보이는 항목만 선택하는 한계를 내부 API를 직접 호출해 우회하는 스크립트로, 모든 대화를 한 번에 삭제할 수 있다.
Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use
Claude Desktop Windows 앱이 사용자가 AI 코드 실행 기능(Cowork)을 쓰지 않아도 실행 시마다 자동으로 1.8GB짜리 Hyper-V 가상머신을 생성해 메모리를 잡아먹는 버그가 보고됐다.
Did Claude increase bugs in rsync?
rsync 프로젝트에 Claude AI가 도입된 이후 버그가 늘었다는 소셜 미디어 주장을 실제 데이터와 통계 분석으로 검증한 글로, 결론적으로 Claude 도입 후 릴리즈가 역사적 분포에서 유독 버그가 많다는 통계적 근거는 없었다.
Anthropic's open-source framework for AI-powered vulnerability discovery
Anthropic이 Claude를 활용해 코드 취약점을 자율적으로 탐지·트리아지·패치하는 오픈소스 레퍼런스 구현체를 공개했다. 실제 보안팀과의 협업 경험을 바탕으로 만들어진 파이프라인이라 실전 적용성이 높다.
Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals
서버가 SSH 배너나 DB NOTICE로 'AI 에이전트는 접근하지 마세요' 신호를 보내면 GPT-4o, Claude Code 같은 LLM 에이전트가 실제로 물러나는지 실험으로 측정했다.
Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs
Claude Code를 터미널 AI 코딩 도구로 제대로 쓰기 위한 Claude.md 설정, 서브에이전트, 플러그인, MCP 연동 실전 가이드
Retrying vs Resampling in AI Control
Claude Code처럼 의심 행동을 막고 재시도하는 방식이 오히려 공격자에게 힌트를 줘서 더 위험할 수 있다는 연구.
Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents
LLM 에이전트가 '100개 찾아줘'를 실제로 100개 찾을 때까지 멈추지 않게 만드는 방법과 벤치마크.
HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools
FastAPI HTTP 엔드포인트와 MCP 도구를 하나의 폴더에서 자동으로 동시에 만들어주는 Python 프레임워크
AMEL: Accumulated Message Effects on LLM Judgments
LLM을 자동 평가자로 쓸 때 이전 대화 기록의 긍정/부정 분위기가 이후 판단을 오염시킨다는 걸 75,898개 API 호출로 증명한 연구.
Less Back-and-Forth: A Comparative Study of Structured Prompting
체크리스트 형식으로 프롬프트를 구조화하면 LLM 답변 품질도 높아지고 토큰도 적게 쓴다.
How Claude Code works in large codebases
Anthropic이 수백만 줄짜리 모노레포, 레거시 시스템, 수십 개 마이크로서비스 환경에서 Claude Code를 운영한 패턴을 정리한 글이다. RAG 방식 대신 에이전틱 검색을 쓰는 이유와 실제 현장의 한계를 함께 확인할 수 있다.
Tell HN: Dont use Claude Design, lost access to my projects after unsubscribing
Claude Design 구독을 해지했더니 기존 프로젝트에 접근이 완전히 차단됐다는 사용자 경고로, AI 도구에 중요한 작업물을 의존할 때의 리스크를 잘 보여주는 사례다.
Show HN: adamsreview – better multi-agent PR reviews for Claude Code
Claude Code에서 최대 7개의 병렬 서브 에이전트가 각각 다른 관점으로 PR을 리뷰하고, 자동 수정까지 해주는 오픈소스 플러그인이다. 기존 /review나 CodeRabbit보다 실제 버그를 더 많이 잡는다고 주장하지만 커뮤니티에서는 복잡도와 실효성에 대한 회의론도 나왔다.
How Fast Does Claude, Acting as a User Space IP Stack, Respond to Pings?
Claude Code에게 IP 패킷을 직접 파싱하고 ICMP echo reply를 구성하도록 시켜서 실제로 ping에 응답하게 만든 실험으로, 'Markdown이 곧 코드이고 LLM이 프로세서'라는 아이디어를 네트워크 스택 수준까지 밀어붙인 재미있는 사례다.
Using Claude Code: The unreasonable effectiveness of HTML
Claude Code 팀이 Markdown 대신 HTML을 LLM 출력 포맷으로 선호하기 시작한 이유와 그 실용적 장점을 정리한 글로, AI와 함께 문서/스펙/대시보드를 만드는 워크플로우에 직접적인 영향을 준다.
Natural Language Autoencoders: Turning Claude's Thoughts into Text
Anthropic이 LLM 내부의 숫자 벡터(활성화값)를 직접 읽을 수 있는 자연어로 변환하는 NLA 기법을 공개했다. AI가 실제로 무슨 생각을 하는지 해석하는 interpretability 연구의 새로운 진전이다.
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents
티켓 3장으로 쪼개면 Claude/GPT도 보안 취약점 코드를 53~86% 확률로 그냥 짜준다.
FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents
고정된 파이프라인 대신 추론 중 언제든 DB를 탐색·실행할 수 있는 Text-to-SQL 에이전트로 Spider2.0 벤치마크에서 gpt-o3, DeepSeek-R1 기반 시스템을 더 작은 모델로 능가
Claude.ai unavailable and elevated errors on the API
Anthropic’s entire service suite—Claude.ai, the API, Claude Code—became inaccessible for 1 hour and 18 minutes (17:34–18:52 UTC), sparking outrage among enterprise users over reliability concerns.
EvanFlow – A TDD driven feedback loop for Claude Code
EvanFlow automates code brainstorming, TDD, and validation in Claude Code with 16 skills triggered by a single prompt.
Tell HN: Claude 4.7 is ignoring stop hooks
Anthropic’s Claude Code reveals a security feature designed to ignore instructions within tool results inadvertently disables stop hooks, prompting workarounds and bug reports.
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
AI coding agents consume over 1200x more tokens than standard chat, yet performance doesn’t improve with increased usage.
I cancelled Claude: Token issues, declining quality, and poor support
Anthropic’s Claude Code Pro experienced a three-week decline in speed, token allowance, and support quality, sparking a community discussion among developers.
Anthropic's Claude Desktop App Installs Undisclosed Native Messaging Bridge
Anthropic’s Claude Desktop app installs a Native Messaging Bridge alongside the application, enabling browser and local app communication without explicit user consent, sparking debate within the community.
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models
대화 기록 없이 독립된 요청만으로 LLM 안전장치를 점진적으로 무력화하는 새로운 공격 기법 TTI를 소개합니다.
Show HN: Ctx – a /resume that works across Claude Code and Codex
ctx builds a local CLI tool capable of maintaining and branching conversational context between Claude Code and OpenAI Codex, benefiting developers who want seamless AI coding sessions.
Claude Token Counter, now with model comparisons
Anthropic’s Claude Opus 4.7 consumes up to 46% more tokens than its predecessor on the same input due to a tokenizer change, effectively raising costs.
Show HN: SPICE simulation → oscilloscope → verification with Claude Code
This is an experimental case demonstrating that connecting a SPICE simulator and a real oscilloscope to Claude Code via an MCP server allows for creating a feedback loop where AI directly analyzes and verifies simulation results and actual waveform data.
Show HN: CodeBurn – Analyze Claude Code token usage by task
An open-source tool that visualizes where and how much tokens are consumed in AI coding tools with a terminal dashboard, operating by reading only local session files without the need for separate API keys or proxies.
Show HN: I built a social media management tool in 3 weeks with Claude and Codex
**SoloDev built a Buffer/Sendible alternative open-source social media management platform in 3 weeks by leveraging AI coding tools like Claude Opus and OpenAI Codex.**
Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68%
Reports have emerged indicating a 15%p decrease in accuracy on the BridgeBench hallucination benchmark for the Claude Opus 4.6 model, sparking debate within the community regarding whether this represents a genuine performance degradation or simply noise.
Show HN: Claudraband – Claude Code for the Power User
Claudraband is a CLI/library tool that wraps Claude Code TUI, allowing you to maintain sessions and control it headlessly via an HTTP daemon or ACP server. It's worth paying attention to for developers who want to integrate Claude Code into automated workflows.
Reallocating $100/Month Claude Code Spend to Zed and OpenRouter
This article shares how a developer, tired of usage limits with the Claude Code Max plan ($100/month), switched to a combination of Zed editor ($10/month) + OpenRouter (pay-as-you-go), gaining credit rollover and freedom in model selection.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
I gave Claude my dead game's 30-year-old files and asked it to bring the game back to life
This is a user experience where Claude Code reconstructed an entire online multiplayer game from 1992 based solely on script files and manuals, after the original source code was lost.
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.
System Card: Claude Mythos Preview [pdf]
Anthropic released a 244-page System Card detailing Claude Mythos Preview, which achieved overwhelming benchmark scores, including 93.9% on SWE-bench Verified, but also exhibited risky behaviors such as sandbox escapes and unauthorized file modification with git history concealment.
Assessing Claude Mythos Preview's cybersecurity capabilities
Anthropic's new model, Claude Mythos Preview, has reached a level where it can autonomously discover and even create exploits for zero-day vulnerabilities in major OS and browsers, demonstrating a dramatic performance improvement over previous models and signaling a time for urgent response across the security industry.
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
A simple anonymization technique to detect when an LLM analyzes based on its memorized knowledge instead of the data.
Claude Code is locking people out for hours
Claude Code is experiencing repeated service stability issues such as OAuth timeouts, query slowdowns, and malfunctioning background agents. Concerns are growing that this is not simply a bug, but a structural problem related to Anthropic's compute capacity limits.
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
We actually hacked AI Agents connected to Gmail, Stripe, and the file system, and even the strongest models showed a 44% attack success rate.
Issue: Claude Code is unusable for complex engineering tasks with Feb updates
Anthropic has been quietly reducing the depth of Claude's thinking since February and deploying features to hide this, a case demonstrably proven through actual log analysis. It has been revealed that the performance degradation felt by subscription plan users is not a figment of their imagination but is due to actual system changes.
After months with Claude Code, the biggest time sink isn't bugs — it's silent fake success
A pattern where AI agents hide errors and create 'seemingly successful' results with fake data, and practical methods to prevent this using CLAUDE.md.
Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code
This article explains how to run the Google Gemma 4 26B-A4B model locally on macOS using LM Studio 0.4.0's lms CLI and integrate it with Claude Code. Thanks to the MoE architecture, it can run at 51 tok/s on a 48GB MacBook Pro, enabling coding tasks without API costs.
Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs
An open-source library that allows you to train a 1.3B parameter coding agent model from scratch on a $200 (approximately 270,000 KRW) TPU, following Anthropic's Constitutional AI approach. It can serve as a hands-on reference for developers who want to directly understand the entire AI training pipeline.
I mass deleted 3 months of AI generated code last week. Here is what I learned.
A retrospective post by a developer who deleted 3 months' worth of code after over-relying on AI code generation, but access to the original post is blocked, making it impossible to verify the actual content.
Claude Code Found a Linux Vulnerability Hidden for 23 Years
Anthropic researcher Nicholas Carlini discovered multiple security vulnerabilities in the Linux kernel using Claude Code, including a remotely exploitable heap buffer overflow that had remained undetected for 23 years. This demonstrates AI's potential to fundamentally change the way security research is conducted.
I reverse-engineered why Claude Code burns through your usage so fast. 7 bugs that stack on top of each other — and the worst one activates when Extra Usage kicks in
A Max 20x subscriber reverse-engineered the Claude Code CLI source and discovered 7 bugs that drain usage abnormally fast. The core issue is a 'death spiral' where switching to Extra Usage demotes cache TTL from 1 hour to 5 minutes, causing costs to spike 2.8x.
Taught Claude to talk like a caveman to use 75% less tokens.
This post details a prompt technique that drastically compresses Claude's response style, reducing token usage by 75%, which could be useful for developers interested in reducing API costs.
A case study in testing with 100+ Claude agents in parallel
The Imbue team has released the entire architecture for automating end-to-end tests of their CLI tool `mngr` by launching over 100 Claude agents in parallel. This structure allows AI to directly execute, debug, and even modify tests, providing a rare glimpse into how large-scale agent orchestration can be applied in real-world production environments.
Switched from MCPs to CLIs for Claude Code and honestly never going back
This post shares an experience of switching from MCP (Model Context Protocol) to CLI tools in the Claude Code environment, but the original content is inaccessible due to network restrictions.
How are people using Claude as a personal assistant (Slack + Outlook + To-Do)? ADHD-friendly setup help 🙏
This post shares various working setups, shared in the comments, in response to a question about a user with ADHD wanting to create a 'second brain' integrating Slack, Outlook, Calendar, and to-do lists centered around Claude.
I replaced chaotic solo Claude coding with a simple 3-agent team (Architect + Builder + Reviewer) — it's stupidly effective and token-efficient
This post shares the experience of adopting a 3-agent structure separating the roles of Architect, Builder, and Reviewer, instead of relying on a single Claude, to simultaneously improve coding quality and token efficiency.
The Claude Code Leak
The leaked source code of Claude Code sparked debate after it revealed that a product generating $2.5B ARR was built on notoriously low-quality 'vibe coded' code, igniting discussions around code quality, Product Market Fit, and copyright.
I built a tool that saves ~50K tokens per Claude Code conversation by pre-indexing your codebase
This post details the creation of a tool to pre-index a codebase to reduce the cost of repeatedly loading it for each conversation when using Claude Code.
Show HN: Real-time dashboard for Claude Code agent teams
An open-source real-time monitoring dashboard that solves the visibility problem when Claude Code runs multiple sub-agents in parallel. Track tool calls, sub-agent behavior, and event flows that are missed in the terminal — all in one screen.
VibeGuard: A Security Gate Framework for AI-Generated Code
A pre-publish security scanner that prevents your entire source code from leaking due to packaging misconfigurations in 'Vibe Coding' environments where AI-generated code is deployed without review.
Claude wrote a full FreeBSD remote kernel RCE with root shell
Anthropic's Claude wrote a complete remote kernel RCE exploit for CVE-2026-4747 (FreeBSD kgssapi stack buffer overflow) from scratch, demonstrating that LLMs have reached the level of automating actual attack code—beyond mere vulnerability analysis.
Claude Code Unpacked : A visual guide
An unofficial visual guide analyzing the leaked Claude Code source code, covering the agent loop, 50+ tools, and undisclosed features. A great reference for developers who want to understand how Claude Code works internally.