LLM 기반 Agent의 부상과 가능성: 종합 Survey

The Rise and Potential of Large Language Model Based Agents: A Survey

Sep 14, 2023•Zhiheng Xi, Wenxiang Chen, Xin Guo +27•View PDF

TL;DR Highlight

LLM을 Brain으로 삼는 AI Agent의 구조, 응용, 사회 시뮬레이션까지 86페이지로 총정리한 서베이 논문

Who Should Read

LLM 기반 에이전트 시스템을 설계하거나 도입을 검토하는 백엔드/AI 개발자. 단일 에이전트부터 멀티 에이전트 협업 아키텍처까지 큰 그림을 잡고 싶은 사람에게 적합.

Core Mechanics

LLM Agent의 구조를 Brain(추론/기억/지식), Perception(멀티모달 입력), Action(텍스트 출력/도구 사용/물리적 행동) 3개 모듈로 명확히 정의함
메모리 관리 전략으로 Transformer 길이 제한 극복, 요약 압축, 벡터/DB 저장 3가지 방식을 소개 — 실제 장기 대화 에이전트 구현 시 직접 참고 가능
Chain-of-Thought(CoT), ReAct, Reflexion 등 추론·계획 기법들을 Plan Formulation vs Plan Reflection 두 단계로 분류해 정리
멀티 에이전트 패턴을 협력(ChatDev, MetaGPT)과 경쟁/토론(ChatEval, Du et al.) 두 축으로 나누고, 각각 어떤 문제에 효과적인지 설명
AutoGPT, Voyager(Minecraft 생존 에이전트), ChemCrow(화학 연구 에이전트) 등 실제 배포 사례를 태스크 지향/혁신 지향/생애주기 지향 3가지로 분류
Agent 사회 시뮬레이션에서 창발적 사회 현상(협력, 정보 확산, 규범 형성)이 나타남을 소개하며, 악용·실직·안전 리스크도 함께 경고

Evidence

Voyager는 GPT-4 기반으로 Minecraft에서 인간 개입 없이 지속 탐험·학습하며, 기존 RL 기반 에이전트 대비 획득 아이템 수·탐험 거리 모두 유의미하게 앞섬(논문 내 Voyager 원문 인용)
GPT-4는 추상적 추론, 코딩, 수학, 의학, 법률 등 다양한 도메인에서 zero-shot 성능을 보여줬으며, 저자들은 이를 AGI의 'spark'로 평가(Microsoft Research 'Sparks of AGI' 인용)
PaLM-E는 로봇 데이터와 일반 비전-언어 데이터를 공동 학습해 새로운 객체 조합에 대한 zero-shot/one-shot 일반화 능력을 시연
ChatDev, MetaGPT 등 멀티 에이전트 소프트웨어 개발 시스템은 역할 분담(PM, 개발자, 테스터)을 통해 단일 에이전트 대비 코드 품질과 완성도를 높였다고 보고

How to Apply

에이전트에 장기 메모리가 필요하다면 Recency·Relevance·Importance 세 가지 점수를 가중합산해 메모리를 우선순위 정렬하는 Generative Agents 방식을 바로 적용해볼 수 있음
복잡한 태스크를 단일 LLM에 맡기는 대신 역할별 에이전트(기획자, 실행자, 검토자)로 나눠 ChatDev·MetaGPT 패턴을 적용하면 품질과 오류 수정 효율이 올라감
도구 사용 에이전트를 만들 때 Toolformer나 ReAct 방식처럼 '생각(Thought) → 행동(Action) → 관찰(Observation)' 루프를 프롬프트 템플릿으로 구조화하면 디버깅이 쉬워짐

Code Example

snippet

# ReAct 스타일 에이전트 프롬프트 템플릿 예시
SYSTEM_PROMPT = """
You are an agent that solves tasks step by step.
For each step, output in this format:
Thought: [reasoning about what to do next]
Action: [tool_name(args)] or [Final Answer: answer]
Observation: [result of the action]

Available tools:
- search(query): Search the web
- calculator(expr): Evaluate math expression
- read_file(path): Read a file
"""

def run_agent(task: str, llm, max_steps: int = 10):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": f"Task: {task}"}
    ]
    
    for step in range(max_steps):
        response = llm(messages)
        messages.append({"role": "assistant", "content": response})
        
        # Final Answer 감지
        if "Final Answer:" in response:
            return response.split("Final Answer:")[-1].strip()
        
        # Action 파싱 및 도구 실행
        if "Action:" in response:
            action_line = [l for l in response.split('\n') if l.startswith('Action:')][0]
            observation = execute_tool(action_line)  # 실제 도구 실행
            messages.append({"role": "user", "content": f"Observation: {observation}"})
    
    return "Max steps reached"

Terminology

LLM-based AgentLLM(대형 언어 모델)을 두뇌 삼아 스스로 계획하고 행동하는 AI 시스템. 사람이 일일이 지시 안 해도 목표만 주면 알아서 단계를 쪼개 실행함.

Chain-of-Thought (CoT)'단계별로 생각해봐'라고 프롬프트에 넣어서 LLM이 중간 추론 과정을 출력하게 하는 기법. 수학 문제 풀 때 풀이 과정 쓰는 것과 같음.

ReActReasoning(생각)과 Acting(행동)을 번갈아 수행하는 에이전트 패턴. 생각 → 도구 사용 → 결과 관찰 → 다시 생각 루프를 반복함.

In-context Learning (ICL)모델 파라미터를 바꾸지 않고 프롬프트에 예시 몇 개만 넣어서 새로운 태스크를 수행하게 하는 방법. 모델 재학습 없이 설명서 보여주는 것과 유사.

Embodied Agent텍스트만 처리하는 게 아니라 카메라·센서로 환경을 인식하고 로봇 팔이나 게임 캐릭터 같은 물리적 몸체로 행동하는 에이전트.

Multi-Agent System (MAS)여러 AI 에이전트가 협력하거나 경쟁하면서 하나의 에이전트가 못 하는 복잡한 문제를 함께 푸는 시스템. 개발팀처럼 역할 분담해서 일하는 구조.

Catastrophic Forgetting모델이 새로운 것을 학습하면서 예전에 배운 것을 잊어버리는 현상. 새 외국어 공부하다가 기존 언어 실력이 떨어지는 것과 비슷.

HallucinationLLM이 사실이 아닌 내용을 자신 있게 생성하는 현상. 있지도 않은 논문 인용이나 틀린 정보를 그럴듯하게 말하는 것.

Related Resources

LLM-Agent-Paper-List (GitHub)

Original Abstract (Expand)

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.