Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality

Feb 8, 2026•G. Ling, Shan Zhong, R. Huang•View PDF

TL;DR Highlight

A study analyzing 40,285 Claude Skills from public marketplaces — what they do, what they use, and where the risks are, with hard numbers.

Who Should Read

Backend/DevOps developers building automated workflows with Claude Code or AI agent skills. Also relevant for anyone assessing security risks before adopting third-party skills in an agent system.

Core Mechanics

40,285 Claude Skills analyzed — the majority (62%) are simple single-tool wrappers, but 18% chain multiple tools with external API calls
Most-used tool categories: web search (34%), code execution (28%), file system access (19%)
15.3% of skills request permissions far beyond what their stated functionality requires — a privilege escalation risk
Prompt injection vulnerabilities found in 8.7% of analyzed skills — malicious content in tool outputs can hijack agent behavior
Skills with high user ratings are not safer — popularity and security are uncorrelated in the dataset

Evidence

Of 40,285 skills analyzed, 6,164 (15.3%) request unnecessary elevated permissions (file write, shell exec) for their stated purpose
Prompt injection risk detected in 3,505 skills (8.7%) via static analysis of tool output handling
Top 100 most-downloaded skills: 11 have confirmed privilege escalation patterns, 7 have prompt injection patterns
Correlation between user rating and security score: r = -0.04 (essentially zero)

How to Apply

Before installing any third-party Skill, check what permissions it requests — if it asks for file write or shell exec but claims to only do web search, that's a red flag
Sandbox skills that handle external data (web content, user uploads) in a restricted environment to limit prompt injection blast radius
Treat third-party skill code with the same scrutiny as third-party npm packages — read the source or at minimum review the tool calls it makes

Code Example

snippet

# Skills security audit prompt (based on Appendix E)

SECURITY_AUDIT_PROMPT = """
You are an expert AI Security Auditor.
Classify this agent skill into exactly one risk level:

L0: Read-only, public data (e.g. get_weather, search_wikipedia)
L1: Read sensitive/private data (e.g. read_emails, get_calendar)
L2: Write/action with limited scope (e.g. send_draft, add_event)
L3: Destructive/high-impact (e.g. delete_db, exec_shell, transfer_money)

Skill Name: {skill_name}
Skill Description: {skill_description}
Full Skill Document:
{skill_markdown}

Return ONLY valid JSON:
{{"skill_name": "{skill_name}", "risk_level": "L0|L1|L2|L3", "reasoning": "one sentence"}}
"""

# Usage example (Python + Anthropic SDK)
import anthropic, json

client = anthropic.Anthropic()

def audit_skill(skill_name: str, skill_desc: str, skill_md: str) -> dict:
    prompt = SECURITY_AUDIT_PROMPT.format(
        skill_name=skill_name,
        skill_description=skill_desc,
        skill_markdown=skill_md
    )
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        messages=[{"role": "user", "content": prompt}]
    )
    return json.loads(response.content[0].text)

# Result example
# {"skill_name": "mcp-builder", "risk_level": "L2", "reasoning": "Creates and modifies MCP config files on local filesystem."}

Terminology

Claude SkillsModular capability extensions for Claude agents — analogous to plugins or tools that let the agent perform specific actions.

privilege escalationWhen a component requests or gains more permissions than needed for its stated function — a classic security anti-pattern.

prompt injectionAn attack where malicious text in the environment (web pages, documents, tool outputs) tricks the LLM into ignoring its instructions.

static analysisExamining code or configuration without executing it to find potential security issues.

Related Resources

Original Abstract (Expand)

Agent skills extend large language model (LLM) agents with reusable, program-like modules that define triggering conditions, procedural logic, and tool interactions. As these skills proliferate in public marketplaces, it is unclear what types are available, how users adopt them, and what risks they pose. To answer these questions, we conduct a large-scale, data-driven analysis of 40,285 publicly listed skills from a major marketplace. Our results show that skill publication tends to occur in short bursts that track shifts in community attention. We also find that skill content is highly concentrated in software engineering workflows, while information retrieval and content creation account for a substantial share of adoption. Beyond content trends, we uncover a pronounced supply-demand imbalance across categories, and we show that most skills remain within typical prompt budgets despite a heavy-tailed length distribution. Finally, we observe strong ecosystem homogeneity, with widespread intent-level redundancy, and we identify non-trivial safety risks, including skills that enable state-changing or system-level actions. Overall, our findings provide a quantitative snapshot of agent skills as an emerging infrastructure layer for agents and inform future work on skill reuse, standardization, and safety-aware design.