Effective Strategies for Asynchronous Software Engineering Agents
TL;DR Highlight
A multi-agent framework that applies git's branch-and-merge pattern directly to AI collaboration — achieving up to 26.7% accuracy improvement over single-agent baselines.
Who Should Read
ML engineers deploying LLM-based coding agents in production, or developers designing multi-agent systems that split complex software tasks across multiple agents.
Core Mechanics
- CAID (Centralized Asynchronous Isolated Delegation): a manager agent builds a dependency graph, each engineer agent works in an isolated git worktree, then integrates via git merge
- Doubling the iteration budget for a single agent yields little or no improvement — MiniMax 2.5 actually regresses on PaperBench
- Soft isolation (file-overlap instructions) is insufficient — scores 55.5% on PaperBench, below the single-agent baseline of 57.2%. Physical isolation via git worktree achieves 63.3%
- More agents is not always better — on the simpy repo, N=4 achieves 92.1% but N=8 drops to 44.3%
- Fallback strategy (single agent first, then multi-agent on failure) is inefficient — nearly doubles cost and time with no meaningful performance gain over multi-agent alone
- The manager's task delegation quality is the critical bottleneck — missing a core dependency file like autodiff.py tanks the entire pipeline
Evidence
- PaperBench CAID vs single-agent: Claude Sonnet 4.5 57.2%→63.3%, MiniMax 2.5 10.4%→36.7%, GLM 4.7 38.0%→45.4%
- Commit0-Lite CAID vs single-agent: Claude Sonnet 4.5 53.1%→59.1%, MiniMax 2.5 42.3%→57.0% (+14.7pp, p=0.007)
- Doubling iterations 100→200: Claude Sonnet 4.5 on PaperBench drops -3.0pp, MiniMax 2.5 gains only +1.5pp
- simpy repo pass rate by agent count: N=2 → 0.0%, N=4 → 92.1%, N=8 → 44.3%
How to Apply
- For large coding tasks: first build a file-level dependency graph, assign only independent files to parallel agents, and configure each agent to work in a separate git worktree
- When deciding agent count: identify the number of independently developable modules in the repo and deploy fewer agents than that. 4 often outperforms 8
- When conflict issues arise in multi-agent pipelines: replace soft isolation (file-overlap instructions) with physical isolation via git worktree and handle merge conflicts explicitly at integration time
Code Example
# CAID manager prompt core structure (JSON delegation format)
# 1. Dependency graph-based task splitting
delegation_plan = {
"delegation_plan": {
"first_round": {
"num_agents": 4,
"reasoning": "tensor_data.py and operators.py are independent and can be parallelized; autodiff.py depends on them and is assigned later",
"tasks": [
{
"engineer_id": "engineer_1",
"task_id": "task-operators",
"file_path": "src/operators.py",
"functions_to_implement": ["add", "mul", "neg"],
"complexity": "simple",
"instruction": "Implement basic operators. No dependency on tensor_data.py."
},
{
"engineer_id": "engineer_2",
"task_id": "task-tensor-data",
"file_path": "src/tensor_data.py",
"functions_to_implement": ["TensorData.__init__", "shape"],
"complexity": "medium",
"instruction": "Implement tensor data structure. Independent from operators.py."
}
]
},
"remaining_tasks": [
{
"task_id": "task-autodiff",
"file_path": "src/autodiff.py",
"functions_to_implement": ["backpropagate"],
"complexity": "complex",
"depends_on": ["task-operators", "task-tensor-data"]
}
]
}
}
# 2. Each engineer runs in a separate git worktree
# git worktree add ../workspace_engineer_1 -b engineer-1-branch
# git worktree add ../workspace_engineer_2 -b engineer-2-branch
# 3. Merge after completion
# git merge engineer-1-branch # If a conflict occurs, the respective engineer resolves it directlyRelated Resources
Original Abstract (Expand)
AI agents have become increasingly capable at isolated software engineering (SWE) tasks such as resolving issues on Github. Yet long-horizon tasks involving multiple interdependent subtasks still pose challenges both with respect to accuracy, and with respect to timely completion. A natural approach to solving these long-horizon tasks in a timely manner is asynchronous multi-agent collaboration, where multiple agents work on different parts of the task at the same time. But effective application of multi-agent systems has proven surprisingly difficult: concurrent edits by multiple agents interfere with each other, dependencies are difficult to synchronize, and combining partial progress into a coherent whole is challenging. On the other hand, human developers have long relied on mature collaboration infrastructure to manage these challenges in large software projects. Inspired by these collaboration primitives, we introduce Centralized Asynchronous Isolated Delegation (CAID), a structured multi-agent coordination paradigm grounded in three core SWE primitives: centralized task delegation, asynchronous execution, and isolated workspaces. CAID constructs dependency-aware task plans through a central manager, executes subtasks concurrently in isolated workspaces, and consolidates progress via structured integration with executable test-based verification. In empirical evaluation, we find that CAID improves accuracy over single-agent baselines by 26.7% absolute on paper reproduction tasks (PaperBench) and 14.3% on Python library development tasks (Commit0). Through systematic analysis, we find that branch-and-merge is a central coordination mechanism for multi-agent collaboration, and that SWE primitives such as git worktree, git commit, and git merge enable it to be realized in a reliable and executable manner.