An AI Agent Execution Environment to Safeguard User Data
TL;DR Highlight
GAAP eliminates personal data leaks—even from prompt injection and malicious AI models—by 100% blocking access via Information Flow Control (IFC) within an AI Agent execution environment.
Who Should Read
Backend/platform developers concerned about securing user personal information (payment details, email credentials, etc.) when used by AI Agents. Teams preparing to deploy MCP-based tool-calling agents to production.
Core Mechanics
- GAAP blocks user data transfer to external services at the code execution level using Information Flow Control (IFC), rather than trusting the LLM.
- Instead of directly calling tools, the LLM generates code artifacts (Python scripts), and GAAP statically analyzes the data flow within that code to prevent unauthorized information exposure.
- User personal information is stored separately in a Private Data DB and a Permission DB, and is never included in the LLM context—meaning even the model provider (OpenAI, Anthropic, etc.) cannot access it.
- A Disclosure Log remembers previously shared data, allowing GAAP to track and control responses from external services that may contain previously shared sensitive information.
- Adding semantic annotations to MCP servers—like 'This API should never return passwords'—reduces unnecessary taint and enables more accurate permission requests to the user.
- GAAP supports multi-shot execution, tracking data exposure at each step even when handling complex tasks that involve iteratively generating code with LLM feedback.
Evidence
- "GAAP achieved a 0% success rate against three prompt injection attacks: SSN-leak, Phone-leak, and SSN-swap. In contrast, NP-Agent (unprotected) had a 75% success rate for SSN-leak and 100% for Phone-leak and SSN-swap. LLM-Judge (LLM-based guardrail) allowed 15% of SSN-leak attacks, and partially allowed Phone-leak and SSN-swap. Conseca (policy-based) failed to prevent SSN-swap attacks. GAAP achieved a task completion rate (Utility) of 76.0%, a negligible difference from the unprotected NP-Agent’s 81.0%. LLM-Judge scored 75.0% and Conseca 72.0%. Average latency increased by only 13% compared to NP-Agent. GAAP’s cost ($0.52) was even lower than NP-Agent’s ($0.67) based on GPT-5, due to the code artifact approach reducing repetitive context transmission."
How to Apply
- "If you operate MCP-based AI Agents that handle user personal information (email, payment details, etc.), adopt GAAP’s Private Data DB pattern by architecting your system to pass key names instead of actual values to the LLM prompt and retrieve values at runtime. If you have tool calls that send data to external services (e.g., airline APIs, payment APIs), add a permission layer like GAAP’s Permission DB pattern to manage which data can be sent to which external parties as (data_type, external_party) pairs, requesting user confirmation once and automating processing thereafter. Implement a Disclosure Log to track data sent to external services and automatically taint responses from those services if they contain previously shared sensitive information, preventing indirect leakage."
Code Example
# Example of GAAP's Private Data DB pattern
# Never include personal data directly in the LLM prompt
# ❌ Incorrect method: Including personal information directly in the prompt
prompt = "Complete check-in for a user with a date of birth of 1990-01-01"
# ✅ GAAP method: Only include key names in the prompt, and retrieve values from the DB at runtime
prompt = "Complete check-in using the user's date_of_birth"
# Code artifact generated by the agent (LLM output)
code_artifact = """
dob = priv_data_db.access_date_of_birth() # Retrieve actual value from DB
rewards_num = priv_data_db.access_airline_rewards_number()
airline_mcp = mcp_helper.connect("airline")
# GAAP intercepts this call to check if dob, rewards_num are exposed
# Executes only if permitted, otherwise requests permission from the user
checkin_result = airline_mcp.process_query(
"complete_checkin",
args={"dob": dob, "rewards_number": rewards_num}
)
"""
# Example Permission DB structure
permissions = [
# (data_type, external_party, allow)
("date_of_birth", "airline.com", True),
("ssn", "food_order_app.com", False),
("email_password", "*", False), # Never share passwords anywhere
]
# Example Disclosure Log structure
disclosure_log = [
{
"data_item": "passport_number",
"external_entity": "airline.com",
"timestamp": "2024-01-15T10:30:00",
"tool_call": "complete_checkin"
}
]
# Subsequent responses from airline.com are automatically tagged with passport_number taintTerminology
Related Resources
Original Abstract (Expand)
AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g., personal and financial information). This poses a serious risk to security and privacy. Adversaries may attack the AI model (e.g., via prompt injection) to exfiltrate user data. Furthermore, sharing private data with an AI agent requires users to trust a potentially unscrupulous or compromised AI model provider with their private data. This paper presents GAAP (Guaranteed Accounting for Agent Privacy), an execution environment for AI agents that guarantees confidentiality for private user data. Through dynamic and directed user prompts, GAAP collects permission specifications from users describing how their private data may be shared, and GAAP enforces that the agent's disclosures of private user data, including disclosures to the AI model and its provider, comply with these specifications. Crucially, GAAP provides this guarantee deterministically, without trusting the agent with private user data, and without requiring any AI model or the user prompt to be free of attacks. GAAP enforces the user's permission specification by tracking how the AI agent accesses and uses private user data. It augments Information Flow Control with novel persistent data stores and annotations that enable it to track the flow of private information both across execution steps within a single task, and also over multiple tasks separated in time. Our evaluation confirms that GAAP blocks all data disclosure attacks, including those that make other state-of-the-art systems disclose private user data to untrusted parties, without a significant impact on agent utility.