Prompting

Latest 60 papers on Prompting.

From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification
Gemma 4-31B achieves 90.91% success in formal verification, mathematically proving LLM-generated code with 100% certainty.
Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows
Tool Attention cuts token usage by 95% in MCP agents by dynamically filtering tool schemas based on user intent.
Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
Deterministic Leaf Enumeration (DLE) cuts self-consistency’s redundant sampling by deterministically exploring a tree of possible sequences, simultaneously improving math/code reasoning performance and speed.
Show HN: GoModel – an open-source AI gateway in Go
GoModel unifies access to OpenAI, Anthropic, Gemini, and other AI providers through a single, OpenAI-compatible API, offering a compiled-language alternative to LiteLLM.
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs
Bayesian Linguistic Belief State surpasses web search performance by a margin exceeding search’s own gains in predictive systems.
Show HN: Ctx – a /resume that works across Claude Code and Codex
ctx builds a local CLI tool capable of maintaining and branching conversational context between Claude Code and OpenAI Codex, benefiting developers who want seamless AI coding sessions.
Show HN: Mediator.ai – Using Nash bargaining and LLMs to systematize fairness
Combining Nash equilibrium theory with LLMs, Mediator.ai automatically generates mutually acceptable settlement proposals for disputes, applicable to real-world scenarios like founder equity splits and contract disagreements.
Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs
Chain-of-Thought reasoning decreases accuracy across 17 models on image-based spatial reasoning tasks.
CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation
A multi-agent framework that co-evolves plans and code, simultaneously achieving 11-20% higher accuracy and a 4-10 reduction in API calls compared to existing methods.
Show HN: Plain – The full-stack Python framework designed for humans and agents
A Python web framework forked from Django, redesigned with type hints, a single convention, and an agent-friendly structure, making it easier for LLMs to read and modify code.
One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness
We discovered that LLM responses can shrink by up to 48% with a single instruction: "Don't use commas".
Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks
A methodology for improving accuracy by having another agent directly explore and synthesize the results investigated simultaneously by multiple AI agents, rather than a simple vote.
Show HN: I built a social media management tool in 3 weeks with Claude and Codex
**SoloDev built a Buffer/Sendible alternative open-source social media management platform in 3 weeks by leveraging AI coding tools like Claude Opus and OpenAI Codex.**
Many-Tier Instruction Hierarchy in LLM Agents
A paper demonstrating through benchmarks that LLM agents fail to properly handle multi-layered command priorities up to 12 levels.
Show HN: CSS Studio. Design by hand, code by agent
A design tool where visually editing CSS directly in the browser allows an AI Agent via MCP to modify the actual codebase, enabling a WYSIWYG workflow regardless of the framework.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
Show HN: We fingerprinted 178 AI models' writing styles and similarity clusters
This study measured the similarity of writing styles of 178 AI models by analyzing them in 32 dimensions, and found that even among models with significant price differences, over 78% similar writing patterns were discovered.
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives
This study experimentally demonstrates how majority pressure, expert authority, response length, and rhetorical persuasion can compromise the accurate judgment of a leading agent in a multi-agent LLM system.
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
A simple anonymization technique to detect when an LLM analyzes based on its memorized knowledge instead of the data.
Early Stopping for Large Reasoning Models via Confidence Dynamics
A method to save 25-50% of tokens by observing the pattern of changes in the model's confidence during inference and stopping unnecessary reasoning early.
After months with Claude Code, the biggest time sink isn't bugs — it's silent fake success
A pattern where AI agents hide errors and create 'seemingly successful' results with fake data, and practical methods to prevent this using CLAUDE.md.
Show HN: I built a tiny LLM to demystify how language models work
This educational project allows you to build a mini LLM with 8.7 million parameters, trained on a Guppy fish character, from scratch in just 5 minutes using a single Colab notebook, focusing on demystifying the black box nature of LLMs.
I mass deleted 3 months of AI generated code last week. Here is what I learned.
A retrospective post by a developer who deleted 3 months' worth of code after over-relying on AI code generation, but access to the original post is blocked, making it impossible to verify the actual content.
This new technique saves 60% of my token expenses
You can reduce LLM response tokens by 60% by using a telegraphic style that only keeps nouns and verbs, excluding articles, conjunctions, and auxiliary verbs.
Taught Claude to talk like a caveman to use 75% less tokens.
This post details a prompt technique that drastically compresses Claude's response style, reducing token usage by 75%, which could be useful for developers interested in reducing API costs.
I used ChatGPT to help me go from 229lbs to 176lbs
This is a testimonial about successfully losing weight based on scientific evidence by using ChatGPT as a conversational partner for several months, demonstrating how to utilize AI as a personal health coach.
AI-Assisted Unit Test Writing and Test-Driven Code Refactoring: A Case Study
A practical case study of creating 16,000 lines of tests in hours for an MVP frontend codebase without tests, using AI, and completing large-scale refactoring safely with those tests as guardrails.
Answering the Wrong Question: Reasoning Trace Inversion for Abstention in LLMs
A new method for determining when an LLM should abstain from answering — it reverse-analyzes the model's reasoning trace to reconstruct 'what question the model actually answered' and compares it against the original question.
Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents
In Function-Calling agents, using only 32 tokens of CoT yields peak performance — using 256 tokens actually performs worse than no reasoning at all.
How are people using Claude as a personal assistant (Slack + Outlook + To-Do)? ADHD-friendly setup help 🙏
This post shares various working setups, shared in the comments, in response to a question about a user with ADHD wanting to create a 'second brain' integrating Slack, Outlook, Calendar, and to-do lists centered around Claude.
I replaced chaotic solo Claude coding with a simple 3-agent team (Architect + Builder + Reviewer) — it's stupidly effective and token-efficient
This post shares the experience of adopting a 3-agent structure separating the roles of Architect, Builder, and Reviewer, instead of relying on a single Claude, to simultaneously improve coding quality and token efficiency.
Reasoning Shift: How Context Silently Shortens LLM Reasoning
When irrelevant context is present, reasoning models skip self-verification and cut reasoning tokens by up to 50%.
What peak image prompt engineering looks like:
This post introduces a case of image generation prompt engineering that became a hot topic on Reddit, but detailed content verification is difficult due to network blocking preventing access to the original text.
Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks
To protect AI agents from malicious commands hidden in external data, you must co-design dynamic planning, LLM input restriction, and human intervention.
I wish Claude just knew how I work without me explaining - so I made something that quietly observes me, learns and teaches it. Open source
A Mac app that automatically creates Skills by observing your actual work instead of repeatedly entering the same context for each Claude Code session.
Structured Intent as a Protocol-Like Communication Layer: Cross-Model Robustness, Framework Comparison, and the Weak-Model Compensation Effect
Writing prompts in the 5W3H structure elevates even weaker models to the level of stronger ones, and delivers consistent results regardless of language.
I wrote a cron job that saves me ~2 hours of dead time on Claude Code every day
This method leverages the 5-hour usage window of the Claude Code Max plan, which starts based on the first message, by automatically sending a 'hi' message every morning to anchor the window to your work hours.
I read 17 papers on agentic AI workflows. Most Claude Code advice is measurably wrong
A post analyzing 17 real research papers on agentic AI coding workflows, revealing that widely spread advice like 'compliment prompts' and 'multi-agent teams' actually degrades performance.
Accidentally created my first fork bomb with Claude Code
A real incident where Claude Code's SessionStart hook recursively spawned infinite Claude instances, creating a fork bomb that crashed a computer overnight and nearly resulted in a shocking API bill.
Universal Claude.md – cut Claude output tokens
A project claiming that simply adding a single CLAUDE.md file to your project root can reduce unnecessary verbosity (sycophancy, filler openers/closers, unsolicited suggestions, etc.) from Claude and cut output tokens by up to 63%—though the community has raised strong doubts about benchmark reliability and real-world effectiveness.
Learn Claude Code by doing, not reading
An interactive Claude Code learning platform featuring a browser-based terminal simulator, Config Builder, quizzes, and more — letting you practice core Claude Code features without any installation or API key.
PSA: Claude Code has two cache bugs that can silently 10-20x your API costs — here's the root cause and workarounds
A warning post was shared about two bugs in Claude Code that could increase API costs by up to 10-20x due to a malfunctioning cache, but access to the original post is blocked, making it impossible to confirm the details.
Lat.md: Agent Lattice: a knowledge graph for your codebase, written in Markdown
A tool that manages design decisions and domain knowledge across a codebase as a graph of interconnected Markdown files, overcoming the limitations of a single AGENTS.md file, enabling AI agents to quickly grasp context without having to traverse the code.
Anatomy of the .claude/ folder
A detailed guide explaining the structure of the .claude/ folder—Claude Code's core configuration directory—and the role of each file within it, providing practical setup instructions for developers looking to effectively use Claude at the team level.
Can AI Models Direct Each Other? Organizational Structure as a Probe into Training Limitations
Having an expensive AI direct a cheap AI can achieve performance on par with the expensive AI alone — at a fraction of the cost, but only when there's a real capability gap between them.
Natural-Language Agent Harnesses
A framework that writes and shares agent control logic (harness) in natural language instead of code, executed by a shared runtime, enabling comparison, reuse, and analysis of design patterns.
Show HN: A plain-text cognitive architecture for Claude Code
A project that designs a hierarchical memory structure (Cognitive Architecture) based on plain-text files to address Claude Code's inability to retain memory across sessions. A practical reference for developers who want to use AI coding assistants consistently over the long term.
Saying 'hey' cost me 22% of my usage limits
A post sharing the experience that sending a short greeting like 'hey' to Claude first can consume a significant portion of your total usage limit, raising awareness about prompt-writing habits for token conservation.
Building a coding agent in Swift from scratch
A learning project that reimplements the core architecture of Claude Code in Swift across 9 stages to understand why it works so well, directly validating the design philosophy of 'fewer tools, trust the model more.'
Claude Code: 6 Github repositories to 10x Your Next Project
A post introducing 6 GitHub repositories that boost Claude Code productivity based on real-world usage, covering memory management, UI generation, workflow automation, and other practical tools at a glance.
ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains
Running GPT-4, Claude-3, and Groq simultaneously to automatically extract software requirements achieves F1 0.88 and reduces analysis time by 78%.
I made a prompt that finds careers you didn't know you were qualified for. Safe to say I might change my career 😂
A post about a ChatGPT prompt that discovers suitable career paths you didn't know you qualified for based on your experience and skills — a practical example of using AI for career exploration.
[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data
Prompt optimization achieves 97% of expert quality on analog circuit placement with zero training data — learns from failure-to-success pairs iteratively
Claude Code Cheat Sheet
A cheat sheet for developers who use Claude Code daily but keep forgetting commands — covering everything from keyboard shortcuts to MCP configuration, memory management, and CLI flags, on one page. With auto-update to always stay current.
How I'm Productive with Claude Code
A hands-on account of building parallel agent workflows and infrastructure automation with Claude Code over 6 weeks — the key insight being the shift from 'coder' to 'agent manager.'
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents
A 31-page survey unifying LLM agent workflows as Agentic Computation Graphs (ACG) with a taxonomy of static vs. dynamic optimization — IBM + RPI joint research
Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models
A 37-model experiment pinpointing which model + prompt combos align best with human judgment when using LLMs as automated evaluators.
SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection
7 cognitively-grounded prompt templates turn a small domain corpus into massive synthetic training data — and outperforms complex RL/multi-stage approaches at knowledge injection.
Causal Evidence that Language Models use Confidence to Drive Behavior
A 4-stage experiment provides causal evidence that major LLMs like GPT-4o and Gemma 3 27B actually use internal confidence signals to decide whether to answer.