Code Generation
Latest 60 papers on Code Generation.
Claude.ai unavailable and elevated errors on the API
Anthropic’s entire service suite—Claude.ai, the API, Claude Code—became inaccessible for 1 hour and 18 minutes (17:34–18:52 UTC), sparking outrage among enterprise users over reliability concerns.
Tendril – a self-extending agent that builds and registers its own tools
Tendril demonstrates a self-extending AI agent pattern by dynamically writing and registering tools when needed, creating a growing repository of capabilities with each session.
Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview
Dirac cuts API costs 64.8% and achieves 65.2% on TerminalBench-2 with efficient context management.
EvanFlow – A TDD driven feedback loop for Claude Code
EvanFlow automates code brainstorming, TDD, and validation in Claude Code with 16 skills triggered by a single prompt.
Show HN: A Karpathy-style LLM wiki your agents maintain (Markdown and Git)
WUPHF builds a shared knowledge base using a Git-based Markdown Wiki, enabling multiple AI agents—including Claude and Codex—to autonomously divide and execute tasks.
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
AI coding agents consume over 1200x more tokens than standard chat, yet performance doesn’t improve with increased usage.
I cancelled Claude: Token issues, declining quality, and poor support
Anthropic’s Claude Code Pro experienced a three-week decline in speed, token allowance, and support quality, sparking a community discussion among developers.
Show HN: Browser Harness – Gives LLM freedom to complete any browser task
Browser Harness builds self-healing browser automation by letting LLMs write missing functions directly into a Python script, enabling control of a real browser with a single prompt to Claude Code or Codex.
From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification
Gemma 4-31B achieves 90.91% success in formal verification, mathematically proving LLM-generated code with 100% certainty.
Diagnosing CFG Interpretation in LLMs
LLMs frequently lose semantic meaning despite syntactically correct output when exposed to novel grammar rules.
Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
Deterministic Leaf Enumeration (DLE) cuts self-consistency’s redundant sampling by deterministically exploring a tree of possible sequences, simultaneously improving math/code reasoning performance and speed.
Kernel code removals driven by LLM-created security reports
Linux kernel maintainers are removing legacy drivers—ISA, PCMCIA, AX.25, ATM, and ISDN—after AI-generated security bug reports overwhelmed them, demonstrating a drastic response to unmanageable code.
Show HN: Daemons – we pivoted from building agents to cleaning up after them
DaemonMD automatically manages operational debt from AI-accelerated code generation with a single Markdown file.
Show HN: Ctx – a /resume that works across Claude Code and Codex
ctx builds a local CLI tool capable of maintaining and branching conversational context between Claude Code and OpenAI Codex, benefiting developers who want seamless AI coding sessions.
Neurosymbolic Repo-level Code Localization
LogicLoc cuts through keyword-shortcut biases in code search by having an LLM generate Datalog queries executed by a deterministic inference engine.
Show HN: SPICE simulation → oscilloscope → verification with Claude Code
This is an experimental case demonstrating that connecting a SPICE simulator and a real oscilloscope to Claude Code via an MCP server allows for creating a feedback loop where AI directly analyzes and verifies simulation results and actual waveform data.
Android CLI: Build Android apps 3x faster using any agent
Google has released Android CLI and Android Skills for AI agent-based Android development, achieving a 70% reduction in LLM token usage and a 3x speed improvement in internal experiments.
Show HN: Marky – A lightweight Markdown viewer for agentic coding
This macOS desktop app allows you to open Markdown files generated in real-time by AI agents like Claude directly in the terminal and view them with live rendering. It simplifies the document review process in AI-powered development workflows.
Atropos: Improving Cost-Benefit Trade-off of LLM-based Agents under Self-Consistency with Early Termination and Model Hotswap
An agent optimization technique that achieves 74% of GPT-4o performance with only 23.9% of the cost by starting with SLM and switching to GPT-4 if failure is predicted.
Show HN: Libretto – Making AI browser automations deterministic
Libretto, open-sourced by Saffron Health, provides AI coding agents with a real-time browser and token-efficient CLI, enabling the creation and maintenance of robust browser automation scripts.
CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation
A multi-agent framework that co-evolves plans and code, simultaneously achieving 11-20% higher accuracy and a 4-10 reduction in API calls compared to existing methods.
Show HN: Plain – The full-stack Python framework designed for humans and agents
A Python web framework forked from Django, redesigned with type hints, a single convention, and an agent-friendly structure, making it easier for LLMs to read and modify code.
One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness
We discovered that LLM responses can shrink by up to 48% with a single instruction: "Don't use commas".
Show HN: Kontext CLI – Credential broker for AI coding agents in Go
This open-source CLI tool securely injects short-lived tokens into AI coding agents when accessing external services like GitHub, Stripe, and databases, avoiding the exposure of long-term API keys. It's gaining attention as a replacement for the risky practice of copy-pasting keys into .env files.
Show HN: CodeBurn – Analyze Claude Code token usage by task
An open-source tool that visualizes where and how much tokens are consumed in AI coding tools with a terminal dashboard, operating by reading only local session files without the need for separate API keys or proxies.
Show HN: I built a social media management tool in 3 weeks with Claude and Codex
**SoloDev built a Buffer/Sendible alternative open-source social media management platform in 3 weeks by leveraging AI coding tools like Claude Opus and OpenAI Codex.**
Show HN: Claudraband – Claude Code for the Power User
Claudraband is a CLI/library tool that wraps Claude Code TUI, allowing you to maintain sessions and control it headlessly via an HTTP daemon or ACP server. It's worth paying attention to for developers who want to integrate Claude Code into automated workflows.
AI assistance when contributing to the Linux kernel
An AI coding tool usage policy has been added to the official Linux kernel documentation, stating that legal responsibility for AI-generated code lies entirely with humans and AI usage must be explicitly indicated with an 'Assisted-by' tag.
Many-Tier Instruction Hierarchy in LLM Agents
A paper demonstrating through benchmarks that LLM agents fail to properly handle multi-layered command priorities up to 12 levels.
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?
A benchmark for measuring an AI coding agent's ability to determine when to ask humans when given incomplete specifications.
Show HN: CSS Studio. Design by hand, code by agent
A design tool where visually editing CSS directly in the browser allows an AI Agent via MCP to modify the actual codebase, enabling a WYSIWYG workflow regardless of the framework.
Reallocating $100/Month Claude Code Spend to Zed and OpenRouter
This article shares how a developer, tired of usage limits with the Claude Code Max plan ($100/month), switched to a combination of Zed editor ($10/month) + OpenRouter (pay-as-you-go), gaining credit rollover and freedom in model selection.
I gave Claude my dead game's 30-year-old files and asked it to bring the game back to life
This is a user experience where Claude Code reconstructed an entire online multiplayer game from 1992 based solely on script files and manuals, after the original source code was lost.
We moved Railway's frontend off Next.js. Builds went from 10+ mins to under 2
This is a practical experience of Railway migrating its production frontend from Next.js to Vite + TanStack Start, reducing build times from over 10 minutes to under 2 minutes. Teams that deploy multiple times a day can feel how build time directly affects development speed.
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.
Tailslayer: Library for reducing tail latency in RAM reads
This C++ library implements the hedged read technique, which reduces the worst-case latency (tail latency) of RAM reads caused by DRAM refresh timing conflicts by replicating data to independent DRAM channels and writing the result from the first responding channel.
Show HN: Marimo pair – Reactive Python notebooks as environments for agents
This is an open-source tool that allows you to directly drop-in an AI agent into a running Marimo notebook session, using the notebook's reactive execution state itself as the agent's working memory.
Launch HN: Freestyle – Sandboxes for Coding Agents
Sandbox infrastructure designed to allow AI coding agents to run tens of thousands of VMs concurrently, with core features including VM startup within 700ms, forking (cloning) of running VMs, and Pause/Resume functionality.
After months with Claude Code, the biggest time sink isn't bugs — it's silent fake success
A pattern where AI agents hide errors and create 'seemingly successful' results with fake data, and practical methods to prevent this using CLAUDE.md.
Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code
This article explains how to run the Google Gemma 4 26B-A4B model locally on macOS using LM Studio 0.4.0's lms CLI and integrate it with Claude Code. Thanks to the MoE architecture, it can run at 51 tok/s on a 48GB MacBook Pro, enabling coding tasks without API costs.
Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs
An open-source library that allows you to train a 1.3B parameter coding agent model from scratch on a $200 (approximately 270,000 KRW) TPU, following Anthropic's Constitutional AI approach. It can serve as a hands-on reference for developers who want to directly understand the entire AI training pipeline.
I mass deleted 3 months of AI generated code last week. Here is what I learned.
A retrospective post by a developer who deleted 3 months' worth of code after over-relying on AI code generation, but access to the original post is blocked, making it impossible to verify the actual content.
Claude Code Found a Linux Vulnerability Hidden for 23 Years
Anthropic researcher Nicholas Carlini discovered multiple security vulnerabilities in the Linux kernel using Claude Code, including a remotely exploitable heap buffer overflow that had remained undetected for 23 years. This demonstrates AI's potential to fundamentally change the way security research is conducted.
I reverse-engineered why Claude Code burns through your usage so fast. 7 bugs that stack on top of each other — and the worst one activates when Extra Usage kicks in
A Max 20x subscriber reverse-engineered the Claude Code CLI source and discovered 7 bugs that drain usage abnormally fast. The core issue is a 'death spiral' where switching to Extra Usage demotes cache TTL from 1 hour to 5 minutes, causing costs to spike 2.8x.
A case study in testing with 100+ Claude agents in parallel
The Imbue team has released the entire architecture for automating end-to-end tests of their CLI tool `mngr` by launching over 100 Claude agents in parallel. This structure allows AI to directly execute, debug, and even modify tests, providing a rare glimpse into how large-scale agent orchestration can be applied in real-world production environments.
AI-Assisted Unit Test Writing and Test-Driven Code Refactoring: A Case Study
A practical case study of creating 16,000 lines of tests in hours for an MVP frontend codebase without tests, using AI, and completing large-scale refactoring safely with those tests as guardrails.
Show HN: ctx – an Agentic Development Environment (ADE)
ADE (Agentic Development Environment) is a tool that allows you to run multiple coding agents such as Claude Code, Codex, and Cursor in a containerized, isolated environment from a single interface, and safely merge the results of parallel tasks.
I replaced chaotic solo Claude coding with a simple 3-agent team (Architect + Builder + Reviewer) — it's stupidly effective and token-efficient
This post shares the experience of adopting a 3-agent structure separating the roles of Architect, Builder, and Reviewer, instead of relying on a single Claude, to simultaneously improve coding quality and token efficiency.
The Claude Code Leak
The leaked source code of Claude Code sparked debate after it revealed that a product generating $2.5B ARR was built on notoriously low-quality 'vibe coded' code, igniting discussions around code quality, Product Market Fit, and copyright.
I built a tool that saves ~50K tokens per Claude Code conversation by pre-indexing your codebase
This post details the creation of a tool to pre-index a codebase to reduce the cost of repeatedly loading it for each conversation when using Claude Code.
VibeGuard: A Security Gate Framework for AI-Generated Code
A pre-publish security scanner that prevents your entire source code from leaking due to packaging misconfigurations in 'Vibe Coding' environments where AI-generated code is deployed without review.
Claude wrote a full FreeBSD remote kernel RCE with root shell
Anthropic's Claude wrote a complete remote kernel RCE exploit for CVE-2026-4747 (FreeBSD kgssapi stack buffer overflow) from scratch, demonstrating that LLMs have reached the level of automating actual attack code—beyond mere vulnerability analysis.
Claude Code Unpacked : A visual guide
An unofficial visual guide analyzing the leaked Claude Code source code, covering the agent loop, 50+ tools, and undisclosed features. A great reference for developers who want to understand how Claude Code works internally.
I read 17 papers on agentic AI workflows. Most Claude Code advice is measurably wrong
A post analyzing 17 real research papers on agentic AI coding workflows, revealing that widely spread advice like 'compliment prompts' and 'multi-agent teams' actually degrades performance.
Claude Code's source code has been leaked via a map file in their NPM registry
The source code of Anthropic's AI coding tool Claude Code was publicly exposed through source map files included in its NPM package, revealing an undisclosed feature roadmap and internal security mechanisms.
Universal Claude.md – cut Claude output tokens
A project claiming that simply adding a single CLAUDE.md file to your project root can reduce unnecessary verbosity (sycophancy, filler openers/closers, unsolicited suggestions, etc.) from Claude and cut output tokens by up to 63%—though the community has raised strong doubts about benchmark reliability and real-world effectiveness.
Show HN: I turned a sketch into a 3D-print pegboard for my kid with an AI agent
A real maker's account of pasting a marker sketch drawn with their child into Codex, providing just two dimensions, and getting a 3D-printable pegboard toy file in under a minute. A case study showing a workflow where AI generates 3D models via Python code—no CAD required.
PSA: Claude Code has two cache bugs that can silently 10-20x your API costs — here's the root cause and workarounds
A warning post was shared about two bugs in Claude Code that could increase API costs by up to 10-20x due to a malfunctioning cache, but access to the original post is blocked, making it impossible to confirm the details.
ChatGPT Won't Let You Type Until Cloudflare Reads Your React State
A reverse-engineering analysis that decrypts Cloudflare Turnstile's encrypted bytecode to confirm that it inspects not only browser fingerprints but also React app internal state (such as __reactRouterContext) before ChatGPT allows a message to be sent.
Lat.md: Agent Lattice: a knowledge graph for your codebase, written in Markdown
A tool that manages design decisions and domain knowledge across a codebase as a graph of interconnected Markdown files, overcoming the limitations of a single AGENTS.md file, enabling AI agents to quickly grasp context without having to traverse the code.