Personalized Large Language Models 서베이: 진행 현황과 미래 방향

A Survey of Personalized Large Language Models: Progress and Future Directions

Feb 17, 2025•Jiahong Liu, Zexuan Qiu, Zhongyang Li +5•View PDF

TL;DR Highlight

LLM을 개인화하는 세 가지 기술(Prompting, Adaptation, Alignment)을 체계적으로 정리한 종합 서베이.

Who Should Read

챗봇, 추천 시스템, 교육/헬스케어 AI에 개인화 기능을 붙이려는 ML 엔지니어나 AI 프로덕트 개발자. 특히 유저별 맞춤 응답 품질을 올리고 싶은데 어떤 접근법을 써야 할지 감을 못 잡는 상황에 딱 맞음.

Core Mechanics

LLM 개인화 기술을 3단계로 분류: 입력 수준(Personalized Prompting), 모델 수준(Personalized Adaptation), 목적함수 수준(Personalized Alignment)으로 체계화
Prompting 방식 4가지: 유저 프로필 요약 주입(Profile-Augmented), 메모리에서 관련 기록 검색(Retrieval-Augmented), 소프트 임베딩 주입(Soft-Fused), 개인화 전후 대조(Contrastive) - 각각 장단점이 다름
Fine-tuning 기반 개인화는 'One PEFT(적은 파라미터만 학습하는 기법) All Users'(공유 어댑터)와 'One PEFT Per User'(유저별 어댑터)로 나뉘고, 후자는 성능은 좋지만 스토리지·프라이버시 비용이 큼
Personalized Alignment는 RLHF를 다목적으로 확장한 MORLHF나 학습 없이 추론 시점에 여러 정책 모델을 합치는 방식(Weight Merging, Model Ensemble)으로 구현 가능
쿼리 타입을 Extraction(명시적 사실 추출), Abstraction(암묵적 선호 추론), Generalization(외부 지식 결합 생성) 3가지로 구분하면 어떤 기술을 쓸지 결정하기 쉬움
성능-프라이버시-효율의 삼각 트레이드오프가 핵심 병목: 강한 개인화 = 프라이버시 위험 or 높은 계산 비용, 세 가지 동시 최적화는 아직 미해결 연구 과제

Evidence

LaMP 벤치마크 기준, 개인화 없는 기본 LLM 대비 RAG 기반 개인화 적용 시 ROUGE-L, F1 등 주요 지표에서 일관된 성능 향상 확인(논문 내 Table 7 다수 실험 결과 집계)
PRISM 데이터셋: 75개국 1,500명 참가자의 선호도와 21개 LLM 응답을 매핑한 대규모 정렬 데이터셋 구축
ALOE 데이터셋: 3,310개 다양한 유저 페르소나를 생성·확장해 개인화 alignment 학습 데이터 구성
MULTIFACETED COLLECTION 데이터셋: 19만7천 개의 시스템 메시지로 다양한 유저 가치관 커버, 재학습 없이 유저별 선호 적응 가능성 검증

How to Apply

유저 데이터가 텍스트 히스토리(과거 리뷰, 대화)인 경우: Profile-Augmented Prompting(GPT-3.5/4로 유저 요약 생성 후 프롬프트에 주입)을 먼저 시도하고, 컨텍스트 길이 한계에 걸리면 Retrieval-Augmented(FAISS나 BM25로 관련 기록만 검색)로 전환
유저별 톤/스타일 차별화가 핵심인 글쓰기 보조 서비스라면: LLaMA-2-7B 기반 One PEFT Per User(LoRA) + OPPU 방식으로 글로벌 LoRA 먼저 학습 후 유저별 LoRA 추가 튜닝하는 2단계 파이프라인 적용
재학습 비용 없이 선호 다양성을 처리해야 하는 경우: 추론 시점에 여러 정책 모델을 유저 선호 가중치로 합치는 Personalized Soups 또는 MOD 방식 적용 - 새 유저가 와도 기존 모델 재학습 불필요

Code Example

snippet

# Profile-Augmented Prompting 예시 (OpenAI API 사용)
import openai

def build_user_profile_summary(user_history: list[str]) -> str:
    """유저 히스토리에서 프로필 요약 생성"""
    history_text = "\n".join(user_history)
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "다음 유저의 과거 활동을 분석해서 선호도, 관심사, 글쓰기 스타일을 3-5문장으로 요약하세요."},
            {"role": "user", "content": f"유저 히스토리:\n{history_text}"}
        ]
    )
    return response.choices[0].message.content

def personalized_response(query: str, user_history: list[str]) -> str:
    """유저 프로필을 주입한 개인화 응답 생성"""
    profile = build_user_profile_summary(user_history)
    
    system_prompt = f"""당신은 개인화된 AI 어시스턴트입니다.
    
유저 프로필:
{profile}

위 프로필을 참고해서 유저의 스타일과 선호도에 맞게 응답하세요."""
    
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": query}
        ]
    )
    return response.choices[0].message.content

# 사용 예시
user_history = [
    "간결하고 기술적인 설명을 선호합니다.",
    "Python과 데이터 분석에 관심이 많아요.",
    "복잡한 개념을 코드 예시로 설명해주면 좋겠어요."
]

result = personalized_response("머신러닝 과적합이 뭔가요?", user_history)
print(result)

Terminology

PEFT모델 전체를 재학습하지 않고 일부 작은 모듈만 추가해서 학습하는 기법. 원본 모델은 그대로 두고 얇은 레이어만 끼워서 특정 목적에 맞게 튜닝하는 것.

LoRA모델 파라미터를 직접 수정하지 않고, 변화량만 작은 행렬 두 개로 표현해서 학습하는 경량 파인튜닝 기법. 모델 크기는 그대로, 학습해야 할 파라미터만 극적으로 줄어듦.

RLHF사람의 피드백(어떤 답이 더 좋은지 선택)을 강화학습 보상으로 삼아 LLM을 조정하는 방법. ChatGPT가 사람 친화적으로 말하는 이유가 이 방식 때문.

DPORLHF의 복잡한 강화학습 과정 없이, 선호 데이터(좋은 답 vs 나쁜 답 쌍)만으로 모델을 직접 최적화하는 간단한 방법.

RAGLLM이 답변을 만들 때, 외부 데이터베이스에서 관련 정보를 검색해서 프롬프트에 넣어주는 방식. 모델이 모르는 정보도 검색해서 알려줄 수 있음.

MoEMixture of Experts의 약자. 여러 전문가(Expert) 모듈을 두고, 입력에 따라 적합한 전문가를 선택해 활성화하는 아키텍처. 하나의 큰 모델 대신 여러 작은 전문가를 상황별로 써서 효율을 높임.

Federated Learning개인 데이터를 서버에 보내지 않고, 각 기기에서 로컬로 학습한 모델 파라미터만 서버에 전송해 합치는 프라이버시 보호 학습 방식. 데이터는 내 폰에, 지식만 공유하는 것.

Contrastive Prompting개인화 정보를 넣은 응답과 안 넣은 응답을 비교해서 개인화 요소가 얼마나 영향을 미치는지 측정하고, 그 차이를 증폭시키는 프롬프팅 기법.

Related Resources

Original Abstract (Expand)

Large Language Models (LLMs) excel in handling general knowledge tasks, yet they struggle with user-specific personalization, such as understanding individual emotions, writing styles, and preferences. Personalized Large Language Models (PLLMs) tackle these challenges by leveraging individual user data, such as user profiles, historical dialogues, content, and interactions, to deliver responses that are contextually relevant and tailored to each user's specific needs. This is a highly valuable research topic, as PLLMs can significantly enhance user satisfaction and have broad applications in conversational agents, recommendation systems, emotion recognition, medical assistants, and more. This survey reviews recent advancements in PLLMs from three technical perspectives: prompting for personalized context (input level), finetuning for personalized adapters (model level), and alignment for personalized preferences (objective level). To provide deeper insights, we also discuss current limitations and outline several promising directions for future research. Updated information about this survey can be found at the https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models.