Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Jan 10, 2025•Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen +3•View PDF

TL;DR Highlight

A comprehensive survey systematically categorizing LLM collaboration methodologies — cooperation types, structures, strategies, and orchestration.

Who Should Read

Backend/AI engineers adopting or designing multi-agent frameworks like AutoGen, CrewAI, or LangGraph. Developers scaling from single LLM calls to collaborative agent systems.

Core Mechanics

Proposed a unified framework classifying agent collaboration by type (cooperative/competitive/coopetitive), structure (centralized/distributed/hierarchical), and strategy (rule-based/role-based/model-based)
Role-based strategies (MetaGPT, AgentVerse) excel at specialized subtasks; rule-based suits predictable environments; model-based (Theory of Mind) fits uncertain dynamic environments
Competitive structures (debate, Critic-Explainer patterns) improve reasoning quality over pure cooperation, but poorly designed ones can lose to a single agent with strong prompts
MoE (Mixture of Experts) is the prime example of 'coopetition' — agents compete and a gating network selects the best output
Cascading hallucination is the core risk: one agent's hallucination propagating and amplifying across other agents
Open-source frameworks like AutoGen, CAMEL, CrewAI, OpenAI Swarm, and Microsoft Magentic-One are accelerating real-world adoption

Evidence

Fine-tuning Mistral-7B with multi-agent synthetic data from Orca-AgentInstruct yielded up to 54% performance improvement across multiple benchmarks
Agent-as-a-Judge framework showed higher alignment with human expert evaluation than LLM-as-a-Judge on DevAI benchmark (55 AI dev tasks)
DyLAN (Dynamic LLM-Agent Network) improved final answer quality by dynamically deactivating low-contribution agents
Multi-agent debate (multi-round discussion) improved factuality and reasoning over single-model baselines (Du et al., 2023)

How to Apply

For code generation pipelines, design a static role-based architecture like MapCoder (recall > planning > coding > debugging agents in sequence) to reduce errors vs single LLM calls.
To boost LLM response quality, apply the Explainer + Critic competition pattern: first agent generates an answer, second agent challenges and verifies it in a loop.
For systems with dynamic tasks, consider a Magentic-One style architecture where an Orchestrator agent generates a DAG at runtime and dynamically delegates to sub-agents.

Code Example

snippet

# Example implementation of Critic-Actor competitive channel pattern with AutoGen
import autogen

config_list = [{"model": "gpt-4", "api_key": "YOUR_KEY"}]

actor = autogen.AssistantAgent(
    name="Actor",
    system_message="You generate answers to the given problem.",
    llm_config={"config_list": config_list},
)

critic = autogen.AssistantAgent(
    name="Critic",
    system_message="You identify logical errors, hallucinations, and missing edge cases in the Actor's responses and point them out specifically. Also acknowledge the good points.",
    llm_config={"config_list": config_list},
)

user_proxy = autogen.UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=4,  # Actor-Critic 2 rounds
    code_execution_config=False,
)

# Actor responds first → Critic reviews → Actor improves → End
user_proxy.initiate_chat(
    actor,
    message="Write a binary search function in Python and ask the Critic to review it.",
)

Terminology

MASMulti-Agent System. A system where multiple AI agents communicate and collaborate to solve complex problems — like a team where each person has a role.

CoopetitionSimultaneously cooperating and competing. Same team, but competing in some areas to bring out the best results.

Federated LearningDistributed learning where each agent/device keeps its data local and only shares model weights with a central server. Improves privacy while improving the model.

Theory of MindThe ability to reason about another agent's goals, intentions, and beliefs. Enables more natural collaboration.

DAGDirected Acyclic Graph. Represents task dependencies with arrows, no cycles, enabling sequential execution. Used in multi-agent orchestration.

Cascading HallucinationWhen one agent generates wrong information and downstream agents treat it as fact, amplifying the error. Risk grows with longer agent chains.

MoEMixture of Experts. Multiple specialized sub-models compete for each input, and a gating network selects the best expert to generate the response.

Related Resources

Original Abstract (Expand)

With recent advances in Large Language Models (LLMs), Agentic AI has become phenomenal in real-world applications, moving toward multiple LLM-based agents to perceive, learn, reason, and act collaboratively. These LLM-based Multi-Agent Systems (MASs) enable groups of intelligent agents to coordinate and solve complex tasks collectively at scale, transitioning from isolated models to collaboration-centric approaches. This work provides an extensive survey of the collaborative aspect of MASs and introduces an extensible framework to guide future research. Our framework characterizes collaboration mechanisms based on key dimensions: actors (agents involved), types (e.g., cooperation, competition, or coopetition), structures (e.g., peer-to-peer, centralized, or distributed), strategies (e.g., role-based or model-based), and coordination protocols. Through a review of existing methodologies, our findings serve as a foundation for demystifying and advancing LLM-based MASs toward more intelligent and collaborative solutions for complex, real-world use cases. In addition, various applications of MASs across diverse domains, including 5G/6G networks, Industry 5.0, question answering, and social and cultural settings, are also investigated, demonstrating their wider adoption and broader impacts. Finally, we identify key lessons learned, open challenges, and potential research directions of MASs towards artificial collective intelligence.