ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers
TL;DR Highlight
A triple-layer security framework where an independent Watcher agent intercepts threats in real time before AI agents executing shell commands get compromised
Who Should Read
Developers looking to deploy autonomous agents like OpenClaw in production, or AI infrastructure engineers who want to add a security layer to LLM agent systems
Core Mechanics
- OpenClaw (an autonomous agent capable of file access and shell execution) is exposed to 7 major security threats including prompt injection, privilege escalation, and malicious skill installation
- Existing security tools are fragmented — no single tool covers more than 3 of the 7 threat categories, and defense success rates remain only 60–70%
- ClawKeeper covers the entire agent lifecycle with 3 layers: Skill (prompt level) → Plugin (runtime hardcoding) → Watcher (independent external monitoring agent)
- The core Watcher operates as a separate OpenClaw instance that halts dangerous actions by the task agent in real time and enforces Human-in-the-Loop by requesting human confirmation
- The Watcher self-improves as it processes new threat cases, automatically increasing its defense success rate from 90% to 95% after 100 cases (while skill/plugin approaches remain fixed until manually updated)
- The Watcher pattern is not limited to OpenClaw — it is a general-purpose security architecture that can be attached to any agent system with a WebSocket communication interface
Evidence
- "ClawKeeper Defense Success Rate (DSR): 85–90% achieved across all 7 threat categories, a 15–45 percentage point gap over the strongest existing baselines. Existing baselines support at most 3 of 7 categories, with DSR of only 60–70% within supported categories. Watcher self-evolution experiment: initial DSR of 90.0% improved to 95.0% after processing 100 cases, while skill/plugin approaches maintained a flat DSR under the same conditions. Quantitative evaluation: 7 categories × 20 adversarial instances (10 simple + 10 complex) = 140 total tests, using GLM-5 as the base LLM."
How to Apply
- "To attach ClawKeeper to OpenClaw, simply provide a single Markdown configuration file and the task agent will automatically install the Watcher. If privacy is a concern, deploy locally on the same machine; if you need to manage multiple agents, deploy the Watcher on a cloud server and centrally monitor multiple instances via WebSocket. To apply the Watcher pattern to other agent frameworks, simply create a channel that streams the host agent's session events (tool calls, LLM inputs/outputs, execution paths) to an external monitoring agent via WebSocket. Add HITL logic that sends an ask_user signal to halt execution when a risk threshold is exceeded. If you only want to quickly apply the Plugin layer, run `npx openclaw clawkeeper harden` to automatically inject security rules into AGENTS.md and immediately apply hardening that binds the gateway to 127.0.0.1 only."
Code Example
# ClawKeeper Watcher installation example (add to AGENTS.md)
## Security Configuration
Install ClawKeeper Watcher from https://github.com/SafeAI-Lab-X/ClawKeeper
### Watcher Setup
1. Expose WebSocket port for Watcher connection
2. Watcher will auto-install required security skills/plugins via hot-restart
### Risk Rules (Skill-based, written directly in Markdown format)
- SYSTEM_LEVEL: Block execution of commands containing `rm -rf`, `chmod 777 /etc`, `sudo` without explicit user confirmation
- SOFTWARE_LEVEL: Never transmit content matching patterns [password, token, private_key, .ssh] to external contacts
- TOOL_LOOP: If tool invocations in a single turn exceed 10, pause and request human confirmation
- CONFIG_PROTECT: Verify cryptographic hash of openclaw.json, AGENTS.md, SOUL.md before each session
### Hardening (Plugin)
# Run in terminal
npx openclaw clawkeeper harden
# → Automatically injects security rules into AGENTS.md
# → Binds gateway to 127.0.0.1
# → Creates cryptographic hash backupTerminology
Related Resources
Original Abstract (Expand)
OpenClaw has rapidly established itself as a leading open-source autonomous agent runtime, offering powerful capabilities including tool integration, local file access, and shell command execution. However, these broad operational privileges introduce critical security vulnerabilities, transforming model errors into tangible system-level threats such as sensitive data leakage, privilege escalation, and malicious third-party skill execution. Existing security measures for the OpenClaw ecosystem remain highly fragmented, addressing only isolated stages of the agent lifecycle rather than providing holistic protection. To bridge this gap, we present ClawKeeper, a real-time security framework that integrates multi-dimensional protection mechanisms across three complementary architectural layers. (1) \textbf{Skill-based protection} operates at the instruction level, injecting structured security policies directly into the agent context to enforce environment-specific constraints and cross-platform boundaries. (2) \textbf{Plugin-based protection} serves as an internal runtime enforcer, providing configuration hardening, proactive threat detection, and continuous behavioral monitoring throughout the execution pipeline. (3) \textbf{Watcher-based protection} introduces a novel, decoupled system-level security middleware that continuously verifies agent state evolution. It enables real-time execution intervention without coupling to the agent's internal logic, supporting operations such as halting high-risk actions or enforcing human confirmation. We argue that this Watcher paradigm holds strong potential to serve as a foundational building block for securing next-generation autonomous agent systems. Extensive qualitative and quantitative evaluations demonstrate the effectiveness and robustness of ClawKeeper across diverse threat scenarios. We release our code.