Prompting
Latest 50 papers in Prompting.
Dynamic Context Evolution for Scalable Synthetic Data Generation
A framework that completely eliminates duplication and repetition in large-scale synthetic data generation with LLMs using three mechanisms (VTS + Semantic Memory + Adaptive Prompt).
90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.
I mass deleted 3 months of AI generated code last week. Here is what I learned.
A retrospective post by a developer who deleted 3 months' worth of code after over-relying on AI code generation, but access to the original post is blocked, making it impossible to verify the actual content.
This new technique saves 60% of my token expenses
You can reduce LLM response tokens by 60% by using a telegraphic style that only keeps nouns and verbs, excluding articles, conjunctions, and auxiliary verbs.
Taught Claude to talk like a caveman to use 75% less tokens.
This post details a prompt technique that drastically compresses Claude's response style, reducing token usage by 75%, which could be useful for developers interested in reducing API costs.
I used ChatGPT to help me go from 229lbs to 176lbs
This is a testimonial about successfully losing weight based on scientific evidence by using ChatGPT as a conversational partner for several months, demonstrating how to utilize AI as a personal health coach.
Brief Is Better: Non-Monotonic Chain-of-Thought Budget Effects in Function-Calling Language Agents
In Function-Calling agents, using only 32 tokens of CoT yields peak performance — using 256 tokens actually performs worse than no reasoning at all.
What peak image prompt engineering looks like:
This post introduces a case of image generation prompt engineering that became a hot topic on Reddit, but detailed content verification is difficult due to network blocking preventing access to the original text.
Structured Intent as a Protocol-Like Communication Layer: Cross-Model Robustness, Framework Comparison, and the Weak-Model Compensation Effect
Writing prompts in the 5W3H structure elevates even weaker models to the level of stronger ones, and delivers consistent results regardless of language.
Universal Claude.md – cut Claude output tokens
A project claiming that simply adding a single CLAUDE.md file to your project root can reduce unnecessary verbosity (sycophancy, filler openers/closers, unsolicited suggestions, etc.) from Claude and cut output tokens by up to 63%—though the community has raised strong doubts about benchmark reliability and real-world effectiveness.
Saying 'hey' cost me 22% of my usage limits
A post sharing the experience that sending a short greeting like 'hey' to Claude first can consume a significant portion of your total usage limit, raising awareness about prompt-writing habits for token conservation.
ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains
Running GPT-4, Claude-3, and Groq simultaneously to automatically extract software requirements achieves F1 0.88 and reduces analysis time by 78%.
I made a prompt that finds careers you didn't know you were qualified for. Safe to say I might change my career 😂
A post about a ChatGPT prompt that discovers suitable career paths you didn't know you qualified for based on your experience and skills — a practical example of using AI for career exploration.
[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data
Prompt optimization achieves 97% of expert quality on analog circuit placement with zero training data — learns from failure-to-success pairs iteratively
Claude (Opus 4.6) figured out how to patch my childhood game to play it on modern Windows
Claude figured out how to patch WING32.dll to run a 1996 16-bit game (Tonka Construction) on modern Windows — no DOSBox needed. A real-world reverse engineering case.
chatgpt is way better when you give it a wall of messy context instead of a clean prompt
Messy, detailed context dumps produce much better AI output than polished bullet points — a practical prompting tip.
Found 3 instructions in Anthropic's docs that dramatically reduce Claude's hallucination. Most people don't know they exist.
Three hallucination-reducing system prompts discovered in Anthropic's official docs — installable as a research-mode command for Claude Code.
Every LLM has a default voice and it's making us all sound the same
All LLMs converge to the same default writing style — Noren is a service that learns your personal writing patterns to generate text in your voice.
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity
LLMs already generate more creative ideas than humans, but techniques effective for humans like 'draw inspiration from other fields' don't work for LLMs.
Evaluating 5W3H Structured Prompting for Intent Alignment in Human-AI Interaction
An 8-dimensional prompt structure (PPS) based on journalism's 5W1H improves AI output alignment with user intent and reduces follow-up questions.
Get Shit Done: A Meta-Prompting, Context Engineering and Spec-Driven Dev System
A lightweight spec-driven development automation framework built to solve Claude Code's 'context rot' problem, orchestrating AI to generate real code with just a few commands — no complex planning needed.
Was loving Claude until I started feeding it feedback from ChatGPT Pro
The sycophancy problem where Claude unconditionally agrees when you pass ChatGPT feedback to it — requires explicit pushback settings
I asked Claude if everyone uses AI to write, what actually gets lost?
What AI loses when it writes for you isn't quality — it's identity. A philosophical reflection on how AI ghostwriting erases the individual's unique voice and lived experience.
Geometry-Guided Camera Motion Understanding in VideoLLMs
VideoLLMs struggle to recognize camera movements (pan/tilt/dolly) — injecting camera motion info derived from 3D geometry models as prompts fixes it.
Interrogating Design Homogenization in Web Vibe Coding
A warning that 'vibe coding' — using LLMs to build websites instantly — could flood the internet with homogeneous Western-centric designs.
The Latent Color Subspace: Emergent Order in High-Dimensional Chaos
HSL color structure discovered in FLUX.1's latent space — enabling direct color control during generation with no additional training.
Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions
LLMs fail to catch errors when reviewing their own outputs in the same session — but review in a fresh session pushes F1 up to 28.6%.
UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization
Defining prompt objectives as mathematical formulas instead of natural language lets LLMs optimize multiple conditions simultaneously with higher precision.
Switch to Claude without starting over
Anthropic wants to import user context and preferences from other AI services (like ChatGPT) into Claude — a cross-platform memory portability play.
Exploiting contextual information to improve stance detection in informal political discourse with LLMs
Adding user profile summaries built from past posts to the prompt boosted political stance classification accuracy by up to 38.5 percentage points.
Measuring Pragmatic Influence in Large Language Model Instructions
An 8-word phrase like "make this the sole focus right now" can change which instructions an LLM prioritizes — and this paper systematically measures how.
Optimizing Prompts for Large Language Models: A Causal Approach
An automatic prompt optimization framework that causally separates prompt effects from query difficulty, performing especially well on harder queries.
Strategies for Span Labeling with Large Language Models
An empirical study on which LLM prompting format to use for text span labeling tasks like NER and grammar error detection — XML tagging, indexing, or JSON matching — plus LOGITMATCH to fundamentally eliminate matching errors
LLM-as-RNN: A Recurrent Language Model for Memory Updates and Sequence Prediction
Turns LLMs into RNN-like systems without parameter modification, updating natural-language memory at each step to improve long-sequence prediction accuracy — an inference-only framework
Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models
You can boost reasoning performance to CoT levels without CoT prompts by manipulating just a single latent feature inside the model
LLM-Driven Accessible Interface: A Model-Based Approach
An architectural proposal for automatically generating WCAG-compliant accessible UIs by combining UserProfile, declarative rules, and LLM.
Mitigating Prompt-Induced Hallucinations in Large Language Models via Structured Reasoning
A method that directly embeds Knowledge Graph traversal code into Chain-of-Thought prompts to reduce hallucinations in GPT-4/LLaMA 3.3 by over 15%p on HIT@1.
Large Language Model Selection for Test-Driven Prompt Android iOS Development
Extended the Python-biased LLM code generation research to Android (Java) / iOS (Swift), and compiled a decision tree for choosing the right model at the right time.
Querywise Prompt Routing for Large Language Models
Even for the same question, the optimal prompt differs — a routing technique that automatically selects the best-matching prompt for each query.
Show HN: Gemini Pro 3 imagines the HN front page 10 years from now
An experiment feeding Gemini Pro 3 today's HN front page and asking it to predict what HN looks like in 2035 — exposing the limits of AI future prediction.
Writing a good Claude.md
Because Claude Code (coding agent) needs to re-learn the codebase every session, maintaining a well-structured CLAUDE.md file has a huge impact on performance.
Nano Banana can be prompt engineered for nuanced AI image generation
Google's autoregressive image generation model Nano Banana matches or beats existing diffusion models on key metrics.
Claude says “You're absolutely right!” about everything
A bug report about Claude Code excessively using 'You're absolutely right!' regardless of whether the user said anything correct — resurfacing the structural sycophancy problem in LLMs.
Exploring the Impact of Temperature on Large Language Models:Hot or Cold?
Experimentally proving that optimal LLM temperature varies by task type across 6 capabilities, and building an automatic temperature selector.
Video Summarization with Large Language Models
Converting video frames to text captions, then having an LLM score importance for video summarization — achieving SotA over traditional visual feature-based approaches.
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
A prompting technique that cuts tokens by 84% compared to CoT while maintaining accuracy — just by changing the system prompt
Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models
A method that automatically identifies and removes unnecessary CoT reasoning steps where perplexity doesn't change — reducing token count while maintaining accuracy.
Development of Prompt Templates for Large Language Model–Driven Screening in Systematic Reviews
Delegating include/exclude decisions in systematic reviews to an LLM can finish an 83-hour job in one day for $157.
Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior
A framework for distinguishing whether an LLM is lying due to the prompt or due to the model itself.
Large Language Models Are Human-Level Prompt Engineers
The APE algorithm generates and selects optimal prompts from input-output examples, achieving human-level or better performance on all 24 tasks.