A Survey of Personalized Large Language Models: Progress and Future Directions
TL;DR Highlight
A comprehensive survey systematically categorizing three LLM personalization techniques: Prompting, Adaptation, and Alignment.
Who Should Read
ML engineers or AI product developers adding personalization to chatbots, recommendation systems, education/healthcare AI. Especially useful if you're unsure which approach to use for improving per-user response quality.
Core Mechanics
- Classified LLM personalization into 3 levels: input level (Personalized Prompting), model level (Personalized Adaptation), objective level (Personalized Alignment)
- 4 prompting approaches: user profile summary injection, relevant record retrieval from memory, soft embedding injection, and personalization contrast — each has different tradeoffs
- Fine-tuning based personalization: 'One PEFT All Users' (shared adapter) vs 'One PEFT Per User' (per-user adapter) — latter has better performance but higher storage and privacy costs
- Personalized Alignment can be implemented via MORLHF (multi-objective RLHF extension) or inference-time model combination without training
- Classifying query types into Extraction, Abstraction, and Generalization helps decide which technique to use
- Performance-privacy-efficiency triangle is the core bottleneck — optimizing all three simultaneously is still unsolved
Evidence
- LaMP benchmark: RAG-based personalization consistently improved ROUGE-L, F1, etc. vs non-personalized baseline LLM
- PRISM dataset: large-scale alignment dataset mapping preferences of 1,500 participants from 75 countries to 21 LLM responses
- ALOE dataset: 3,310 diverse user personas generated for personalization alignment training data
- MULTIFACETED COLLECTION: 197K system messages covering diverse user values — validated personalization adaptation without retraining
How to Apply
- For text history user data: try Profile-Augmented Prompting (generate user summary with GPT-3.5/4, inject into prompt) first; switch to Retrieval-Augmented when context length limits hit.
- For writing assistant services where per-user tone/style differentiation is key: apply 2-stage pipeline with LLaMA-2-7B-based One PEFT Per User (LoRA) + OPPU.
- Without retraining budget but needing preference diversity: apply Personalized Soups or MOD — combine multiple policy models at inference time with user preference weights, no retraining needed for new users.
Code Example
# Profile-Augmented Prompting Example (Using OpenAI API)
import openai
def build_user_profile_summary(user_history: list[str]) -> str:
"""Generate a profile summary from user history"""
history_text = "\n".join(user_history)
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Analyze the following user's past activities and summarize their preferences, interests, and writing style in 3-5 sentences."},
{"role": "user", "content": f"User History:\n{history_text}"}
]
)
return response.choices[0].message.content
def personalized_response(query: str, user_history: list[str]) -> str:
"""Generate a personalized response with user profile injected"""
profile = build_user_profile_summary(user_history)
system_prompt = f"""You are a personalized AI assistant.
User Profile:
{profile}
Refer to the profile above and respond in a way that matches the user's style and preferences."""
response = openai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
]
)
return response.choices[0].message.content
# Usage example
user_history = [
"I prefer concise and technical explanations.",
"I'm very interested in Python and data analysis.",
"I'd love it if complex concepts were explained with code examples."
]
result = personalized_response("What is overfitting in machine learning?", user_history)
print(result)Terminology
Related Resources
Original Abstract (Expand)
Large Language Models (LLMs) excel in handling general knowledge tasks, yet they struggle with user-specific personalization, such as understanding individual emotions, writing styles, and preferences. Personalized Large Language Models (PLLMs) tackle these challenges by leveraging individual user data, such as user profiles, historical dialogues, content, and interactions, to deliver responses that are contextually relevant and tailored to each user's specific needs. This is a highly valuable research topic, as PLLMs can significantly enhance user satisfaction and have broad applications in conversational agents, recommendation systems, emotion recognition, medical assistants, and more. This survey reviews recent advancements in PLLMs from three technical perspectives: prompting for personalized context (input level), finetuning for personalized adapters (model level), and alignment for personalized preferences (objective level). To provide deeper insights, we also discuss current limitations and outline several promising directions for future research. Updated information about this survey can be found at the https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models.