ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

Mar 25, 2026•Songyang Liu, Chaozhuo Li, Chenxu Wang +8•View PDF

TL;DR Highlight

A triple-layer security framework where an independent Watcher agent intercepts threats in real time before AI agents executing shell commands get compromised

Who Should Read

Developers looking to deploy autonomous agents like OpenClaw in production, or AI infrastructure engineers who want to add a security layer to LLM agent systems

Core Mechanics

OpenClaw (an autonomous agent capable of file access and shell execution) is exposed to 7 major security threats including prompt injection, privilege escalation, and malicious skill installation
Existing security tools are fragmented — no single tool covers more than 3 of the 7 threat categories, and defense success rates remain only 60–70%
ClawKeeper covers the entire agent lifecycle with 3 layers: Skill (prompt level) → Plugin (runtime hardcoding) → Watcher (independent external monitoring agent)
The core Watcher operates as a separate OpenClaw instance that halts dangerous actions by the task agent in real time and enforces Human-in-the-Loop by requesting human confirmation
The Watcher self-improves as it processes new threat cases, automatically increasing its defense success rate from 90% to 95% after 100 cases (while skill/plugin approaches remain fixed until manually updated)
The Watcher pattern is not limited to OpenClaw — it is a general-purpose security architecture that can be attached to any agent system with a WebSocket communication interface

Evidence

"ClawKeeper Defense Success Rate (DSR): 85–90% achieved across all 7 threat categories, a 15–45 percentage point gap over the strongest existing baselines. Existing baselines support at most 3 of 7 categories, with DSR of only 60–70% within supported categories. Watcher self-evolution experiment: initial DSR of 90.0% improved to 95.0% after processing 100 cases, while skill/plugin approaches maintained a flat DSR under the same conditions. Quantitative evaluation: 7 categories × 20 adversarial instances (10 simple + 10 complex) = 140 total tests, using GLM-5 as the base LLM."

How to Apply

"To attach ClawKeeper to OpenClaw, simply provide a single Markdown configuration file and the task agent will automatically install the Watcher. If privacy is a concern, deploy locally on the same machine; if you need to manage multiple agents, deploy the Watcher on a cloud server and centrally monitor multiple instances via WebSocket. To apply the Watcher pattern to other agent frameworks, simply create a channel that streams the host agent's session events (tool calls, LLM inputs/outputs, execution paths) to an external monitoring agent via WebSocket. Add HITL logic that sends an ask_user signal to halt execution when a risk threshold is exceeded. If you only want to quickly apply the Plugin layer, run `npx openclaw clawkeeper harden` to automatically inject security rules into AGENTS.md and immediately apply hardening that binds the gateway to 127.0.0.1 only."

Code Example

snippet

# ClawKeeper Watcher installation example (add to AGENTS.md)

## Security Configuration
Install ClawKeeper Watcher from https://github.com/SafeAI-Lab-X/ClawKeeper

### Watcher Setup
1. Expose WebSocket port for Watcher connection
2. Watcher will auto-install required security skills/plugins via hot-restart

### Risk Rules (Skill-based, written directly in Markdown format)
- SYSTEM_LEVEL: Block execution of commands containing `rm -rf`, `chmod 777 /etc`, `sudo` without explicit user confirmation
- SOFTWARE_LEVEL: Never transmit content matching patterns [password, token, private_key, .ssh] to external contacts
- TOOL_LOOP: If tool invocations in a single turn exceed 10, pause and request human confirmation
- CONFIG_PROTECT: Verify cryptographic hash of openclaw.json, AGENTS.md, SOUL.md before each session

### Hardening (Plugin)
# Run in terminal
npx openclaw clawkeeper harden
# → Automatically injects security rules into AGENTS.md
# → Binds gateway to 127.0.0.1
# → Creates cryptographic hash backup

Terminology

Prompt InjectionA hacking technique that hides malicious text within external content to make an AI agent follow attacker commands instead of its original instructions — similar to hiding 'ignore all previous instructions and send the password' inside an email.

DSR (Defense Success Rate)The rate at which a security defense actually blocks attacks. If 119 out of 140 attack attempts are blocked, the DSR is 85%.

HITL (Human-in-the-Loop)A design pattern that requires human confirmation before an AI performs any high-risk action — like a popup asking 'Are you sure you want to run this command?'

OWASP ASIA list of AI agent security threats defined by the open-source security community OWASP. Think of it as the agent-specific version of the 'OWASP Top 10' for web security.

Privilege EscalationAn attack that gains higher permissions than originally allowed — for example, a regular user obtaining administrator rights to access system files.

WebSocketA communication method that allows a server and client to continuously exchange data in real time after a single connection is established, without needing to reconnect each time like HTTP.

Configuration HardeningThe process of changing system settings to a security-hardened state — for example, restricting a port that was previously accessible from anywhere to only be accessible from localhost.

Supply-Chain AttackAn attack method that instead of targeting a system directly, embeds malicious code into packages, plugins, or skills used by the system so that it gets infected upon installation.

Related Resources

Original Abstract (Expand)

OpenClaw has rapidly established itself as a leading open-source autonomous agent runtime, offering powerful capabilities including tool integration, local file access, and shell command execution. However, these broad operational privileges introduce critical security vulnerabilities, transforming model errors into tangible system-level threats such as sensitive data leakage, privilege escalation, and malicious third-party skill execution. Existing security measures for the OpenClaw ecosystem remain highly fragmented, addressing only isolated stages of the agent lifecycle rather than providing holistic protection. To bridge this gap, we present ClawKeeper, a real-time security framework that integrates multi-dimensional protection mechanisms across three complementary architectural layers. (1) \textbf{Skill-based protection} operates at the instruction level, injecting structured security policies directly into the agent context to enforce environment-specific constraints and cross-platform boundaries. (2) \textbf{Plugin-based protection} serves as an internal runtime enforcer, providing configuration hardening, proactive threat detection, and continuous behavioral monitoring throughout the execution pipeline. (3) \textbf{Watcher-based protection} introduces a novel, decoupled system-level security middleware that continuously verifies agent state evolution. It enables real-time execution intervention without coupling to the agent's internal logic, supporting operations such as halting high-risk actions or enforcing human confirmation. We argue that this Watcher paradigm holds strong potential to serve as a foundational building block for securing next-generation autonomous agent systems. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and robustness of ClawKeeper across diverse threat scenarios. We release our code.