LLM 기반 Multi-Agent Collaboration 스케일링: MACNET

Scaling Large-Language-Model-based Multi-Agent Collaboration

Jun 11, 2024•Cheng Qian, Zihao Xie, Yifei Wang +7•View PDF

TL;DR Highlight

에이전트를 수천 개까지 늘리면 성능이 어떻게 변하는지 실험했더니 'collaborative scaling law'가 존재했다.

Who Should Read

멀티 에이전트 시스템을 설계하거나 에이전트 수와 토폴로지 구성을 고민하는 AI 엔지니어. LLM 추론 성능을 재학습 없이 높이고 싶은 개발자.

Core Mechanics

에이전트들을 DAG(방향성 비순환 그래프) 구조로 연결한 MACNET 프레임워크 제안 — 노드엔 actor(실행자), 엣지엔 critic(감독자)을 배치해 역할 분담
에이전트 수를 늘릴수록 성능이 로지스틱(S자) 곡선 형태로 증가하는 'collaborative scaling law' 발견 — 대략 노드 16개(2^4) 수준이 적정점
불규칙한 랜덤 토폴로지가 완전 연결(mesh) 같은 규칙적 구조보다 오히려 더 좋은 성능 — 소셜 네트워크의 'small-world' 특성 덕분
메모리 컨트롤로 컨텍스트 폭발 문제 해결 — 전체 대화 히스토리 대신 최종 결과물(artifact)만 다음 에이전트로 전달해 컨텍스트 길이를 O(n²)에서 O(n)으로 감소
GPT-3.5 기반으로 1,000개 이상 에이전트 협업 실험 성공, MMLU·HumanEval·소프트웨어 개발·창의적 글쓰기 등 다양한 벤치마크에서 COT·AutoGPT·AgentVerse 등 기존 방법 대부분 능가
신경망 scaling law보다 훨씬 작은 규모(수백 에이전트)에서 emergence 발생 — 에이전트는 이미 사전학습 지식을 보유하기 때문

Evidence

MACNET-RANDOM이 평균 품질(Quality) 0.6522로 모든 방법 중 최고, MACNET-CHAIN은 MMLU 0.6632로 COT(0.3544), AutoGPT(0.4485), AgentVerse(0.2977) 대비 큰 차이로 앞섬
메모리 컨트롤 적용 시 컨텍스트 토큰 복잡도가 O(n²) → O(n)으로 감소, mesh 구조 대비 랜덤 토폴로지는 시간 51.92% 절감
노드 수를 2^0=1에서 2^6=64(mesh 기준 1000개 이상)으로 늘렸을 때 artifact 토큰 길이가 7.51배 증가(2^0 → 2^4 구간)
critic의 제안을 actor가 실제로 반영할 확률 93.10%, 네트워크 확장 시 고려하는 측면(aspect) 수가 dozen에서 수십 개로 급증

How to Apply

코드 리뷰나 소프트웨어 개발 파이프라인에서는 chain 또는 layer 토폴로지를 사용하고, 창의적 글쓰기나 아이디어 발산이 필요한 경우엔 star/tree 토폴로지로 전환해보면 된다.
에이전트 수가 많아질수록 컨텍스트가 폭발하는 문제가 있다면, 각 에이전트 간 전달 정보를 전체 대화가 아닌 최종 결과물(artifact)만으로 제한하는 메모리 컨트롤 패턴을 적용하면 된다.
성능과 비용의 균형이 필요하다면 완전 연결 대신 random 토폴로지를 선택하면 mesh 대비 51.92% 시간 절감하면서도 더 높은 성능을 얻을 수 있다.

Code Example

snippet

# MACNET 핵심 패턴: critic-actor 듀얼 에이전트 상호작용
# 노드 = actor (결과물 생성), 엣지 = critic (지시 및 피드백)

from openai import OpenAI
client = OpenAI()

def critic_actor_interaction(task: str, artifact: str, critic_role: str, actor_role: str, rounds: int = 3) -> str:
    """
    MACNET의 핵심: critic이 instruction을 주고 actor가 artifact를 개선하는 반복 패턴
    artifact만 다음 노드로 전달 (전체 대화 히스토리 X)
    """
    messages_critic = [
        {"role": "system", "content": f"You are {critic_role}. Review the artifact and give specific improvement instructions."},
    ]
    messages_actor = [
        {"role": "system", "content": f"You are {actor_role}. Refine the artifact based on critic's instructions."},
    ]
    
    current_artifact = artifact
    
    for _ in range(rounds):
        # Critic이 artifact 검토 후 지시
        messages_critic.append({"role": "user", "content": f"Task: {task}\nCurrent artifact:\n{current_artifact}\n\nProvide specific improvement instructions:"})
        critic_response = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages_critic)
        instruction = critic_response.choices[0].message.content
        
        # Actor가 지시를 받아 artifact 개선
        messages_actor.append({"role": "user", "content": f"Task: {task}\nCurrent artifact:\n{current_artifact}\n\nCritic's instruction: {instruction}\n\nProvide refined artifact:"})
        actor_response = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages_actor)
        current_artifact = actor_response.choices[0].message.content
        
        # 핵심: artifact만 메모리에 유지 (컨텍스트 폭발 방지)
        messages_critic = messages_critic[:1]  # system prompt만 유지
        messages_actor = messages_actor[:1]
    
    return current_artifact  # 다음 노드로는 artifact만 전달

# Chain 토폴로지 예시 (소프트웨어 개발)
def macnet_chain(task: str, agents: list) -> str:
    artifact = task
    for i in range(len(agents) - 1):
        critic_role, actor_role = agents[i], agents[i+1]
        artifact = critic_actor_interaction(task, artifact, critic_role, actor_role)
        print(f"Step {i+1} done: {critic_role} -> {actor_role}")
    return artifact

# 사용 예시
agent_chain = ["Requirements Analyst", "System Architect", "Senior Developer", "Code Reviewer", "QA Engineer"]
result = macnet_chain("Build a REST API for user authentication", agent_chain)

Terminology

DAG방향성 비순환 그래프(Directed Acyclic Graph). 정보가 한 방향으로만 흐르고 사이클이 없는 그래프 구조. 에이전트 간 정보가 뒤로 돌아가지 않게 막아준다.

collaborative scaling law에이전트 수를 늘릴수록 성능이 어떻게 변하는지 설명하는 법칙. 처음엔 천천히, 중간에 급격히, 그리고 포화 상태에 이르는 S자 곡선을 그린다.

neural scaling law모델 파라미터(뉴런) 수, 데이터 크기, 학습량을 늘리면 성능이 거듭제곱 법칙으로 향상된다는 법칙. ChatGPT 같은 대형 모델의 이론적 근거.

topology에이전트들이 어떻게 연결되어 있는지를 나타내는 네트워크 구조. chain(일렬), tree(계층), mesh(전체 연결) 등 다양한 모양이 있다.

artifact에이전트들이 협업해서 만들어낸 최종 결과물. 코드, 문서, 답변 등 다양한 형태가 될 수 있다.

context explosion에이전트 수가 늘어날수록 서로 주고받는 대화 내용이 기하급수적으로 커져서 LLM이 처리할 수 없게 되는 문제. 컨텍스트 창(context window) 초과 오류로 이어진다.

small-world property소셜 네트워크처럼 무작위 연결이 몇 개만 있어도 평균 거리가 급격히 줄어드는 현상. 랜덤 토폴로지가 좋은 성능을 내는 이유 중 하나.

emergence개별 구성요소에는 없던 능력이 여러 개를 합쳤을 때 갑자기 나타나는 현상. LLM에서는 모델 크기가 특정 임계점을 넘으면 새로운 능력이 갑자기 생기는 것을 말한다.

Related Resources

https://github.com/OpenBMB/ChatDev/tree/macnet

Original Abstract (Expand)

Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning. Inspired by the neural scaling law--increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. Technically, we utilize directed acyclic graphs to organize agents into a multi-agent collaboration network (MacNet), upon which their interactive reasoning is topologically orchestrated for autonomous task solving. Extensive evaluations reveal that it effectively supports collaboration among over a thousand agents, with irregular topologies outperforming regular ones. We also identify a collaborative scaling law--the overall performance follows a logistic growth pattern as agents scale, with collaborative emergence occurring earlier than traditional neural emergence. We speculate this may be because scaling agents catalyzes their multidimensional considerations during interactive reflection and refinement, thereby producing more comprehensive artifacts. The code is available at https://github.com/OpenBMB/ChatDev/tree/macnet.