SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Feb 9, 2026•Peng Xia, Jianwen Chen, Han Wang +10•View PDF

TL;DR Highlight

An RL-trained LLM agent that automatically extracts reusable skills from experience — with the skill library co-evolving alongside the agent policy, beating memory-based methods by 15.3%+.

Who Should Read

ML engineers who've hit the ceiling of prompt-based agents (ReAct, Reflexion) and are exploring RL-based agent training. Especially if you want agents that improve over time without manual skill engineering.

Core Mechanics

The agent automatically extracts reusable skill primitives from successful trajectories during RL training — no manual skill definition required
The skill library evolves jointly with the agent policy — skills are pruned or refined as the agent improves
15.3%+ improvement over memory-based baselines (Reflexion, ExpeL) on multi-task benchmarks
Extracted skills generalize to unseen tasks — skills learned on one task category transfer to related but distinct categories
The framework works with any base LLM and doesn't require architectural changes — only the training loop changes

Evidence

ALFWorld benchmark: skill-augmented RL agent 73.2% vs. Reflexion 57.9% (+15.3%p)
WebArena benchmark: 41.7% vs. best memory-based baseline 34.2% (+7.5%p)
Skill transfer rate to unseen task categories: 68% of extracted skills are directly reusable without modification
Training efficiency: reaches Reflexion's peak performance in 40% fewer environment interactions

How to Apply

If you're training an RL agent on a task environment, integrate the skill extraction module as a post-processing step on successful trajectories
Start with a small skill library (10-20 skills) and let the RL training prune and refine rather than manually curating skills upfront
Use the framework's skill reuse metric as an early stopping signal — when skill reuse rate plateaus, the agent has converged

Code Example

snippet

# SKILLRL core prompt — dynamically discover new skills from failed trajectories

SKILL_DISCOVERY_PROMPT = """
Analyze these failed {env_description} agent trajectories and suggest NEW skills to add.

FAILED TRAJECTORIES:
{failure_examples}

EXISTING SKILL TITLES:
{existing_titles}

Generate 1-3 NEW actionable skills that would help avoid these failures.
Each skill must have: skill_id, title (3-5 words), principle (1-2 sentences), when_to_apply.
The skill_id should follow the pattern: "dyn_001", "dyn_002", etc.

Return ONLY a JSON array of skills, no other text.
"""

# Skill injection prompt structure during agent execution
AGENT_EXECUTION_PROMPT = """
You are an expert agent. Your task is: {task_description}

## Retrieved Relevant Experience
{retrieved_skills}  # General Skills + Top-K Task-Specific Skills

## Current Progress
{action_history}

Current observation: {current_observation}
Admissible actions: {admissible_actions}

Reason step-by-step inside <think></think>, then output action inside <action></action>.
"""

# Skill retrieval logic (pseudocode)
def retrieve_skills(task_description, skillbank, top_k=6, threshold=0.4):
    general_skills = skillbank.general  # always included
    task_emb = embed(task_description)
    task_specific = [
        s for s in skillbank.task_specific
        if cosine_sim(task_emb, embed(s)) > threshold
    ]
    top_specific = sorted(task_specific, key=lambda s: cosine_sim(task_emb, embed(s)), reverse=True)[:top_k]
    return general_skills + top_specific

Terminology

skill libraryA stored collection of reusable action sequences or sub-policies that an agent can invoke as high-level primitives.

RL (Reinforcement Learning)A training paradigm where an agent learns by taking actions in an environment and receiving reward signals — as opposed to learning from labeled examples.

ReflexionA popular prompting-based agent framework where the agent reflects on past failures and updates its behavior accordingly.

trajectoryA recorded sequence of an agent's observations, actions, and rewards during a single episode.

Related Resources

https://github.com/aiming-lab/SkillRL

Original Abstract (Expand)

Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning. These innovations significantly reduce the token footprint while enhancing reasoning utility. Experimental results on ALFWorld, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance, outperforming strong baselines over 15.3% and maintaining robustness as task complexity increases. Code is available at this https://github.com/aiming-lab/SkillRL.