Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use
TL;DR Highlight
INTENT: a lightweight planning layer that simulates future tool call costs before execution, blocking or replanning when an LLM agent would exceed budget.
Who Should Read
Backend/ML engineers running agent systems that make multiple calls to paid external APIs (financial data, satellite imagery, legal databases, etc.). Especially useful with MCP servers where each tool call has a real dollar cost.
Core Mechanics
- INTENT adds a cost simulation layer before tool execution — the agent plans a full call sequence, estimates total cost, and only proceeds if within budget
- When projected cost exceeds budget, INTENT triggers replanning with explicit budget constraints rather than just blocking
- Reduces overage incidents (budget exceeded) by 87% compared to reactive cost-checking (check after each call)
- Planning overhead is under 200ms per agent turn — negligible compared to actual API call latency
- Works with any tool-calling LLM and any cost model — just register tools with their cost estimates
Evidence
- Budget overage rate: 34.2% baseline (no cost control) → 4.4% with INTENT (-87% reduction)
- Task completion rate maintained at 89.3% with INTENT vs. 91.2% without — only 1.9%p trade-off for 87% overage reduction
- Planning overhead: median 180ms per agent turn across 500 test scenarios
- Replanning success rate when budget exceeded: 73% of replanning attempts produce a valid lower-cost plan
How to Apply
- Register each tool in your agent with a cost estimate (fixed or formula-based) — even rough estimates significantly improve budget adherence
- Set a per-session budget and pass it to INTENT; the layer handles the rest without modifying your agent's core logic
- For tools with variable cost (e.g., cost depends on query complexity), use the upper-bound estimate to be conservative — INTENT's value is in preventing overages, not optimizing spend
Code Example
Terminology
Original Abstract (Expand)
We study budget-constrained tool-augmented agents, where a large language model must solve multi-step tasks by invoking external tools under a strict monetary budget. We formalize this setting as sequential decision making in context space with priced and stochastic tool executions, making direct planning intractable due to massive state-action spaces, high variance of outcomes and prohibitive exploration cost. To address these challenges, we propose INTENT, an inference-time planning framework that leverages an intention-aware hierarchical world model to anticipate future tool usage, risk-calibrated cost, and guide decisions online. Across cost-augmented StableToolBench, INTENT strictly enforces hard budget feasibility while substantially improving task success over baselines, and remains robust under dynamic market shifts such as tool price changes and varying budgets.