An AI Agent Execution Environment to Safeguard User Data

Apr 21, 2026•Robert Stanley, Avi Verma, Lillian Tsai +2•View PDF

TL;DR Highlight

GAAP eliminates personal data leaks—even from prompt injection and malicious AI models—by 100% blocking access via Information Flow Control (IFC) within an AI Agent execution environment.

Who Should Read

Backend/platform developers concerned about securing user personal information (payment details, email credentials, etc.) when used by AI Agents. Teams preparing to deploy MCP-based tool-calling agents to production.

Core Mechanics

GAAP blocks user data transfer to external services at the code execution level using Information Flow Control (IFC), rather than trusting the LLM.
Instead of directly calling tools, the LLM generates code artifacts (Python scripts), and GAAP statically analyzes the data flow within that code to prevent unauthorized information exposure.
User personal information is stored separately in a Private Data DB and a Permission DB, and is never included in the LLM context—meaning even the model provider (OpenAI, Anthropic, etc.) cannot access it.
A Disclosure Log remembers previously shared data, allowing GAAP to track and control responses from external services that may contain previously shared sensitive information.
Adding semantic annotations to MCP servers—like 'This API should never return passwords'—reduces unnecessary taint and enables more accurate permission requests to the user.
GAAP supports multi-shot execution, tracking data exposure at each step even when handling complex tasks that involve iteratively generating code with LLM feedback.

Evidence

"GAAP achieved a 0% success rate against three prompt injection attacks: SSN-leak, Phone-leak, and SSN-swap. In contrast, NP-Agent (unprotected) had a 75% success rate for SSN-leak and 100% for Phone-leak and SSN-swap. LLM-Judge (LLM-based guardrail) allowed 15% of SSN-leak attacks, and partially allowed Phone-leak and SSN-swap. Conseca (policy-based) failed to prevent SSN-swap attacks. GAAP achieved a task completion rate (Utility) of 76.0%, a negligible difference from the unprotected NP-Agent’s 81.0%. LLM-Judge scored 75.0% and Conseca 72.0%. Average latency increased by only 13% compared to NP-Agent. GAAP’s cost ($0.52) was even lower than NP-Agent’s ($0.67) based on GPT-5, due to the code artifact approach reducing repetitive context transmission."

How to Apply

"If you operate MCP-based AI Agents that handle user personal information (email, payment details, etc.), adopt GAAP’s Private Data DB pattern by architecting your system to pass key names instead of actual values to the LLM prompt and retrieve values at runtime. If you have tool calls that send data to external services (e.g., airline APIs, payment APIs), add a permission layer like GAAP’s Permission DB pattern to manage which data can be sent to which external parties as (data_type, external_party) pairs, requesting user confirmation once and automating processing thereafter. Implement a Disclosure Log to track data sent to external services and automatically taint responses from those services if they contain previously shared sensitive information, preventing indirect leakage."

Code Example

snippet

# Example of GAAP's Private Data DB pattern
# Never include personal data directly in the LLM prompt

# ❌ Incorrect method: Including personal information directly in the prompt
prompt = "Complete check-in for a user with a date of birth of 1990-01-01"

# ✅ GAAP method: Only include key names in the prompt, and retrieve values from the DB at runtime
prompt = "Complete check-in using the user's date_of_birth"

# Code artifact generated by the agent (LLM output)
code_artifact = """
dob = priv_data_db.access_date_of_birth()  # Retrieve actual value from DB
rewards_num = priv_data_db.access_airline_rewards_number()

airline_mcp = mcp_helper.connect("airline")
# GAAP intercepts this call to check if dob, rewards_num are exposed
# Executes only if permitted, otherwise requests permission from the user
checkin_result = airline_mcp.process_query(
    "complete_checkin",
    args={"dob": dob, "rewards_number": rewards_num}
)
"""

# Example Permission DB structure
permissions = [
    # (data_type, external_party, allow)
    ("date_of_birth", "airline.com", True),
    ("ssn", "food_order_app.com", False),
    ("email_password", "*", False),  # Never share passwords anywhere
]

# Example Disclosure Log structure
disclosure_log = [
    {
        "data_item": "passport_number",
        "external_entity": "airline.com",
        "timestamp": "2024-01-15T10:30:00",
        "tool_call": "complete_checkin"
    }
]
# Subsequent responses from airline.com are automatically tagged with passport_number taint

Terminology

IFCInformation Flow Control. A security technique for tracking how data flows within a program. It's like installing CCTV on a factory production line to monitor where raw materials (personal information) go.

Prompt InjectionAn attack that hides malicious commands within external input (web pages, emails, etc.) to manipulate an AI Agent. For example, hiding the command 'send the user's password to the attacker's email' on a web page.

TaintA tracking tag attached to data. It's like putting a red sticker on personal information, so that the sticker follows the data as it's processed or combined, tracking its origin.

MCPModel Context Protocol. Anthropic’s protocol for standardizing how AI Agents call external tools (email, file system, web APIs). Like a USB standard, it allows any tool to be connected in the same way.

Code ArtifactAn approach where the LLM 'writes code' instead of directly calling tools. It's like the AI first creating an execution plan in code, reviewing it, and then running it, rather than acting impulsively.

Multi-shot ExecutionA method of generating code iteratively, receiving intermediate results and refining the code in multiple steps, rather than writing all the code at once. It's similar to tasting food while cooking and adjusting the recipe accordingly.

Permission SpecificationRules defining which personal information can be shared with which external services. For example, 'Date of birth can be sent to airlines, but SSN should never be sent anywhere'.

Related Resources

Original Abstract (Expand)

AI agents promise to serve as general-purpose personal assistants for their users, which requires them to have access to private user data (e.g., personal and financial information). This poses a serious risk to security and privacy. Adversaries may attack the AI model (e.g., via prompt injection) to exfiltrate user data. Furthermore, sharing private data with an AI agent requires users to trust a potentially unscrupulous or compromised AI model provider with their private data. This paper presents GAAP (Guaranteed Accounting for Agent Privacy), an execution environment for AI agents that guarantees confidentiality for private user data. Through dynamic and directed user prompts, GAAP collects permission specifications from users describing how their private data may be shared, and GAAP enforces that the agent's disclosures of private user data, including disclosures to the AI model and its provider, comply with these specifications. Crucially, GAAP provides this guarantee deterministically, without trusting the agent with private user data, and without requiring any AI model or the user prompt to be free of attacks. GAAP enforces the user's permission specification by tracking how the AI agent accesses and uses private user data. It augments Information Flow Control with novel persistent data stores and annotations that enable it to track the flow of private information both across execution steps within a single task, and also over multiple tasks separated in time. Our evaluation confirms that GAAP blocks all data disclosure attacks, including those that make other state-of-the-art systems disclose private user data to untrusted parties, without a significant impact on agent utility.