ToolChoiceConfusion: 신뢰할 수 있는 LLM 에이전트를 위한 Causal Minimal Tool Filtering

TL;DR Highlight

LLM 에이전트에 도구를 100개 다 보여주지 말고, 지금 당장 필요한 것 1개만 보여주면 성공률은 그대로에 토큰은 90% 절약된다.

Who Should Read

LLM 에이전트에 function calling이나 tool use를 붙이고 있는 백엔드/ML 엔지니어. 특히 도구 목록이 많아질수록 에이전트가 엉뚱한 도구를 부르거나 실패율이 올라가는 문제를 겪고 있는 개발자.

Core Mechanics

도구가 많을수록 에이전트가 헷갈리는 현상을 'ToolChoiceConfusion'이라 정의함. 의미적으로 관련 있는 도구라도 지금 단계에서 필요 없으면 오히려 방해가 됨.
CMTF(Causal Minimal Tool Filtering)는 각 도구에 '사전조건(precondition)'과 '효과(effect)'를 기술한 가벼운 계약서를 붙이고, 현재 상태에서 목표까지의 최단 인과 경로를 BFS로 탐색해 지금 바로 실행 가능한 도구 딱 1개만 노출함.
기존 키워드/임베딩 검색 방식은 관련 있어 보이는 도구를 고르지만 '지금 써야 하는가'를 판단 못 함. CMTF는 '지금 이 도구를 써야 목표에 가까워지는가'를 기준으로 필터링함.
Amazon Nova 2 Lite/Pro, Claude 3.5 Haiku, Claude Sonnet 4 네 가지 모델에서 실험. 특히 Claude 3.5 Haiku는 전체 도구 노출 시 성공률 0.48이었는데 CMTF 적용 후 0.94로 올라감.
별도 학습(training) 없이 도구 계약서(precondition/effect 스펙)만 작성하면 바로 적용 가능한 training-free 방식임.
단계별로 딱 1개 도구만 보여주기 때문에 고위험 작업(send, delete, update 등)이 조건이 갖춰지기 전에 노출되지 않아 안전성도 높아짐.

Evidence

전체 도구 노출 방식 대비 CMTF: 성공률 0.83 → 0.99, 잘못된 도구 호출 1.25 → 0.01, 조기 실행(premature action) 0.03 → 0.00, 평균 토큰 24,569 → 2,405 (약 90% 감소).
키워드 top-5 필터링은 도구 수를 5개로 줄였음에도 성공률 0.61, 잘못된 도구 호출 2.36으로 오히려 전체 노출(0.83)보다 성공률이 낮음.
State-aware filtering(실행 가능한 도구만 노출)도 성공률 0.65, 잘못된 도구 호출 1.98로 CMTF(0.99, 0.01)에 크게 못 미침.
102개 태스크, 100개 도구, 4개 LLM, 6가지 필터링 전략 조합으로 총 2,448번의 실험 수행.

How to Apply

기존 도구 스펙(OpenAPI schema 등)에 required_state_variables(사전조건)와 produced_state_variables(효과) 필드를 추가하고, 태스크 시작 시 초기 상태와 목표 상태를 정의한 뒤 BFS로 최단 경로를 찾아 첫 번째 도구만 LLM에 넘기면 됨.
send, delete, share, update 같은 고위험 도구에는 risk 레벨을 추가해두면 해당 도구의 precondition이 충족되기 전까지는 LLM에게 아예 보이지 않아, 에이전트가 실수로 이메일을 발송하거나 파일을 삭제하는 사고를 방지할 수 있음.
작은 모델(비용 절감용)을 에이전트로 쓰는 경우 도구 목록이 길면 실패율이 급등하는데, CMTF로 단계별 1개만 노출하면 Claude 3.5 Haiku처럼 약한 모델도 성공률을 크게 높일 수 있음.

Code Example

snippet

# CMTF 핵심 로직 - BFS로 최단 인과 경로 탐색 후 첫 번째 도구만 반환
from collections import deque

def causal_minimal_tool_filter(current_state: set, goal: set, tools: list[dict]) -> list[dict]:
    """
    current_state: 현재 알려진 상태 변수 집합 예: {'date', 'event_description'}
    goal: 목표 상태 변수 집합 예: {'event_updated'}
    tools: 각 도구는 {'name', 'required': set, 'produces': set} 포함
    """
    if goal.issubset(current_state):
        return []  # 이미 목표 달성
    
    # BFS: (현재 상태, 사용한 도구 경로)
    queue = deque([(frozenset(current_state), [])])
    visited = {frozenset(current_state)}
    
    while queue:
        state, path = queue.popleft()
        
        if goal.issubset(state):
            # 경로의 첫 번째 도구만 반환 (next causal frontier)
            return [path[0]] if path else []
        
        for tool in tools:
            if tool['required'].issubset(state):  # 실행 가능한 도구
                new_state = state | tool['produces']
                if new_state not in visited:
                    visited.add(new_state)
                    queue.append((new_state, path + [tool]))
    
    return []  # 경로 없음 → fallback 처리

# 사용 예시 (캘린더 태스크)
tools = [
    {'name': 'search_events', 'required': {'date'}, 'produces': {'event_id'}},
    {'name': 'update_event', 'required': {'event_id', 'new_time'}, 'produces': {'event_updated'}},
    {'name': 'create_event', 'required': {'date', 'new_time'}, 'produces': {'event_created'}},  # distractor
    {'name': 'delete_event', 'required': {'event_id'}, 'produces': {'event_deleted'}},  # distractor
]

current_state = {'date', 'event_description', 'new_time'}
goal = {'event_updated'}

next_tools = causal_minimal_tool_filter(current_state, goal, tools)
print([t['name'] for t in next_tools])  # ['search_events'] - 1개만 반환!

Terminology

CMTFCausal Minimal Tool Filtering의 약자. LLM 에이전트에게 지금 당장 필요한 도구 1개만 골라서 보여주는 필터링 방법.

precondition-effect contract각 도구가 '실행되려면 무엇이 필요하고(precondition)', '실행하면 무엇이 생기는가(effect)'를 명세한 짧은 설명서. 레시피로 치면 '재료 목록'과 '완성 요리'에 해당.

causal frontier현재 상태에서 목표까지 가는 최단 경로에서 지금 바로 실행 가능한 첫 번째 도구(들). 다음 행동으로 넘어갈 수 있는 최소한의 관문.

state-aware filtering도구의 입력값이 현재 이미 알려진 경우에만 그 도구를 노출하는 방식. 실행 가능한 것만 보여주지만 '지금 써야 하는가'는 판단 못 함.

function callingLLM이 텍스트 생성 대신 외부 함수/API를 호출하도록 하는 기능. GPT나 Claude에서 tool_use, tools 파라미터로 활성화함.

training-free모델을 추가로 학습시키지 않고 프롬프트나 파이프라인 수준에서 바로 적용할 수 있는 방법. 파인튜닝 비용 없이 즉시 사용 가능.

BFS (Breadth-First Search)그래프에서 가장 짧은 경로를 찾는 탐색 알고리즘. CMTF에서 목표까지의 최소 도구 사용 순서를 찾을 때 사용함.

Related Resources

ToolChoiceConfusion GitHub Repository (벤치마크, 필터링 구현, 평가 스크립트)

Original Abstract (Expand)

Large language model agents increasingly rely on external tools, but larger tool menus can reduce reliability and efficiency by increasing wrong-tool calls, premature actions, and token cost. Existing tool-selection methods often optimize semantic relevance, exposing tools whose names or descriptions match the user request. We argue that relevance is insufficient: a tool may be related to the task while still being unnecessary or premature at the current step. We propose Causal Minimal Tool Filtering (CMTF), a training-free method that selects tools by causal sufficiency. CMTF uses lightweight precondition-effect contracts to expose only the minimal next-step tool frontier needed to advance from the current state toward the user goal. Across multi-step tool-use tasks, we compare CMTF with all-tools exposure, keyword retrieval, state-aware filtering, and causal-path ablations, measuring task success, wrong-tool calls, premature actions, tool exposure, and token cost. In the main benchmark with 102 tasks, 100 tools, four LLM backends, and 2448 task-method-model runs, CMTF matches the strongest causal baseline in aggregate success while reducing visible tools from 100 to one per step and reducing token usage by about 90% relative to all-tools exposure.