Tool-Augmented Agent에서의 Entity Binding 실패 분석

TL;DR Highlight

AI 에이전트가 올바른 도구를 선택해도 잘못된 대상에 실행하는 'Entity Binding 실패' 문제를 정의하고, 이를 막는 실행 정책을 평가한 논문.

Who Should Read

LLM 에이전트가 이메일 전송, 문서 수정, 캘린더 관리 등 실제 시스템에 연결되는 서비스를 개발하는 백엔드/AI 엔지니어. 특히 function calling이나 tool use 기능을 프로덕션에 적용하려는 개발자.

Core Mechanics

기존 평가지표의 맹점: 도구 선택이 맞아도 '어떤 Alex에게 보내는가'처럼 잘못된 대상에 실행하는 오류(Entity Binding Failure)는 현재 벤치마크가 잡아내지 못함.
실험에서 모든 방법이 wrong-tool error 0.0%를 달성했지만, 기본 실행 방식(Direct)은 26.0%의 비율로 엉뚱한 엔티티에 도구를 실행함 — 도구 선택 성능과 안전성은 별개의 문제.
단순 RAG나 도구 필터링(Semantic Filter, CMTF)으로는 entity binding 실패를 거의 줄이지 못함 (26%→24%로 소폭 개선에 그침).
Confidence-gated binding(신뢰도 기반 실행 게이트)과 Entity CMTF+Provenance(근거 추적 포함) 방식은 wrong-entity 오류를 0%로 줄임. 단, 직접 task 완료율은 31.7%, 26.0%로 감소.
가장 위험한 조건은 '시간적 모호성(Temporal ambiguity)'과 '진짜 모호(True ambiguity)' 케이스로, 기본 방식에서 wrong-entity 비율이 최대 100%에 달함.
Entity-aware 방식은 명확한 작업에서 과도한 clarification을 요청하지 않음(over-clarification 0.0%) — 모호할 때만 멈추는 것이지, 불필요한 방해가 아님.

Evidence

1,800회(60개 태스크 × 5개 모델 × 6개 방법) 실험에서 모든 방법의 wrong-tool error는 0.0%였으나, Direct 방식의 wrong-entity 비율은 26.0%, Semantic Filter도 24.0%였음.
Temporal ambiguity 조건에서 Direct/Entity Retrieval 방식의 wrong-entity 비율은 100.0%, CMTF-only 97.5%, Semantic Filter도 90.0%에 달함.
True ambiguity 조건에서도 Direct/Entity Retrieval/CMTF-only가 100% 오실행한 반면, Entity-aware 방식 2종은 해당 케이스 100% 안전 처리(ambiguity detection 100%, safe success 100%).
Confidence gate의 safe success는 40.0%, Entity CMTF+Provenance는 34.3%로, task success(31.7%, 26.0%)보다 높음 — 애매한 경우 clarification이 '안전한 성공'으로 카운트됨.

How to Apply

Function calling 파이프라인에 '실행 전 엔티티 확인 게이트'를 추가하라: 도구 호출 직전 해당 도구가 요구하는 엔티티(수신자, 문서, 이벤트 등)가 후보 중 하나로 충분히 좁혀졌는지 확인하고, 두 후보의 신뢰도 점수 차이가 작으면 실행 대신 사용자에게 clarification 요청을 반환하도록 설계하면 된다.
기업용 에이전트에서 '삭제, 전송, 업데이트' 같은 고위험 액션을 구현할 때, 각 도구에 entity precondition(어떤 엔티티 타입이 필수인지)을 명세하고, 실행 시 해당 엔티티가 구체적 ID로 resolved됐는지 체크하는 레이어를 넣으면 된다.
Provenance tracking을 활용해 엔티티 바인딩 결정의 근거(어떤 메타데이터로 이 Alex를 선택했는가)를 로깅하면, 잘못된 실행 사고 발생 시 디버깅과 감사(audit)가 가능하고 사용자 신뢰도 높아진다.

Code Example

snippet

# Entity-Aware Action Gate 개념 구현 예시 (Python pseudo-code)

def entity_aware_gate(instruction, env_state, tool, candidates, tau=0.7, delta=0.2):
    """
    tool 실행 전 entity binding이 충분히 resolved됐는지 확인.
    tau: 최소 신뢰도 임계값
    delta: 1위-2위 후보 간 최소 점수 차이
    """
    entity_mentions = extract_mentions(instruction, tool)  # 예: ['Alex', 'launch doc']
    
    for mention in entity_mentions:
        candidate_scores = score_candidates(mention, candidates, env_state)
        # 예: {'Alex Chen': 0.85, 'Alex Kumar': 0.80, 'Alex Park': 0.30}
        
        sorted_candidates = sorted(candidate_scores.items(), key=lambda x: -x[1])
        best_entity, best_score = sorted_candidates[0]
        second_score = sorted_candidates[1][1] if len(sorted_candidates) > 1 else 0.0
        
        margin = best_score - second_score
        
        if best_score < tau or margin < delta:
            # 불충분한 근거 → clarification 요청
            return {
                'action': 'clarify',
                'message': f"'{mention}'에 해당하는 대상이 여럿입니다: "
                           f"{sorted_candidates[0][0]} vs {sorted_candidates[1][0]}. 어느 쪽인가요?",
                'candidates': sorted_candidates[:2]
            }
        
        # Provenance 기록
        log_provenance(mention, best_entity, best_score, evidence=env_state.get_metadata(best_entity))
    
    # 모든 엔티티가 resolved → 실행 허용
    return {'action': 'execute', 'bindings': {m: get_best(m, candidates) for m in entity_mentions}}


# 실제 사용 예
result = entity_aware_gate(
    instruction="Alex에게 launch 문서 이메일 보내줘",
    env_state=current_env,
    tool='send_email',
    candidates=contact_list + document_store
)

if result['action'] == 'clarify':
    return result['message']  # 사용자에게 질문 반환
else:
    send_email(**result['bindings'])  # 안전하게 실행

Terminology

Entity Binding자연어 명령에 나온 '그 Alex', '그 문서' 같은 표현을 실제 시스템의 구체적인 객체(이메일 주소, 파일 ID 등)에 연결하는 과정. 주소록에서 'Alex'라는 이름을 가진 사람이 3명일 때 누구를 선택하느냐의 문제.

Tool-Augmented Agent검색, 이메일 전송, DB 조회 등 외부 도구(API)를 호출할 수 있는 LLM 기반 에이전트. 단순 텍스트 생성을 넘어 실제 시스템에 영향을 미칠 수 있음.

CMTFCausal Minimal Tool Filtering의 약자. 에이전트에게 현재 작업에 인과적으로 필요한 최소한의 도구만 보여주는 필터링 기법. 도구 선택 혼란을 줄이는 효과가 있음.

Confidence-gated binding신뢰도 점수가 충분히 높고, 1위-2위 후보 사이 점수 차이가 클 때만 실행을 허용하는 게이트. 비슷한 후보가 여럿이면 실행 대신 사용자에게 질문을 던짐.

Provenance tracking왜 이 엔티티를 선택했는지 근거(메타데이터, 점수, 증거)를 기록해두는 것. 나중에 잘못된 실행이 발생했을 때 어떤 근거로 그 판단을 내렸는지 추적 가능하게 함.

True ambiguity아무리 문맥을 봐도 사용자가 어떤 특정 대상을 의도했는지 알 수 없는 상태. 이 경우 최선의 안전한 행동은 추측해서 실행하는 것이 아니라 사용자에게 물어보는 것.

Risk-weighted exposure잘못된 엔티티에 실행했을 때의 위험도를 가중치로 반영한 점수. 단순 읽기 오류보다 삭제/전송 오류가 더 높은 가중치를 가짐.

Related Resources

EntityBindingFailures GitHub Repository (벤치마크, 평가 스크립트, 프롬프트 포함)

Original Abstract (Expand)

Tool-augmented language-model agents are often evaluated by whether they select the correct tool, produce valid API arguments, and complete the requested task. However, an agent may choose the right tool and still act on the wrong external entity. For example, a request to "email Alex about the launch" may lead the agent to contact the wrong Alex, attach the wrong launch document, reply in the wrong thread, or update the wrong customer account. We call these errors entity binding failures. This paper studies entity binding failures as a distinct reliability and safety problem in tool-augmented agents. We formalize the separation between tool correctness and entity correctness, introduce a taxonomy of wrong-entity failures in enterprise workflows, and evaluate entity-aware execution mechanisms including entity-resolution preconditions, confidence-gated binding, clarification under ambiguity, and provenance tracking. In a controlled diagnostic evaluation across 60 tasks, five model backends, and six tool-use methods, all methods achieved 0.0 percent wrong-tool error, yet action-oriented baselines still produced wrong-entity actions in 24.0-26.0 percent of runs. Entity-aware methods eliminated wrong-entity actions and risk-weighted wrong-entity exposure in this setting, but reduced direct task completion by deferring under ambiguity. These findings show that safe tool use requires not only selecting the correct tool, but also reliably binding natural-language references to the correct real-world entity before action.