A Survey of Personalized Large Language Models: Progress and Future Directions

Feb 17, 2025•Jiahong Liu, Zexuan Qiu, Zhongyang Li +5•View PDF

TL;DR Highlight

A comprehensive survey systematically categorizing three LLM personalization techniques: Prompting, Adaptation, and Alignment.

Who Should Read

ML engineers or AI product developers adding personalization to chatbots, recommendation systems, education/healthcare AI. Especially useful if you're unsure which approach to use for improving per-user response quality.

Core Mechanics

Classified LLM personalization into 3 levels: input level (Personalized Prompting), model level (Personalized Adaptation), objective level (Personalized Alignment)
4 prompting approaches: user profile summary injection, relevant record retrieval from memory, soft embedding injection, and personalization contrast — each has different tradeoffs
Fine-tuning based personalization: 'One PEFT All Users' (shared adapter) vs 'One PEFT Per User' (per-user adapter) — latter has better performance but higher storage and privacy costs
Personalized Alignment can be implemented via MORLHF (multi-objective RLHF extension) or inference-time model combination without training
Classifying query types into Extraction, Abstraction, and Generalization helps decide which technique to use
Performance-privacy-efficiency triangle is the core bottleneck — optimizing all three simultaneously is still unsolved

Evidence

LaMP benchmark: RAG-based personalization consistently improved ROUGE-L, F1, etc. vs non-personalized baseline LLM
PRISM dataset: large-scale alignment dataset mapping preferences of 1,500 participants from 75 countries to 21 LLM responses
ALOE dataset: 3,310 diverse user personas generated for personalization alignment training data
MULTIFACETED COLLECTION: 197K system messages covering diverse user values — validated personalization adaptation without retraining

How to Apply

For text history user data: try Profile-Augmented Prompting (generate user summary with GPT-3.5/4, inject into prompt) first; switch to Retrieval-Augmented when context length limits hit.
For writing assistant services where per-user tone/style differentiation is key: apply 2-stage pipeline with LLaMA-2-7B-based One PEFT Per User (LoRA) + OPPU.
Without retraining budget but needing preference diversity: apply Personalized Soups or MOD — combine multiple policy models at inference time with user preference weights, no retraining needed for new users.

Code Example

snippet

# Profile-Augmented Prompting Example (Using OpenAI API)
import openai

def build_user_profile_summary(user_history: list[str]) -> str:
    """Generate a profile summary from user history"""
    history_text = "\n".join(user_history)
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Analyze the following user's past activities and summarize their preferences, interests, and writing style in 3-5 sentences."},
            {"role": "user", "content": f"User History:\n{history_text}"}
        ]
    )
    return response.choices[0].message.content

def personalized_response(query: str, user_history: list[str]) -> str:
    """Generate a personalized response with user profile injected"""
    profile = build_user_profile_summary(user_history)
    
    system_prompt = f"""You are a personalized AI assistant.
    
User Profile:
{profile}

Refer to the profile above and respond in a way that matches the user's style and preferences."""
    
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": query}
        ]
    )
    return response.choices[0].message.content

# Usage example
user_history = [
    "I prefer concise and technical explanations.",
    "I'm very interested in Python and data analysis.",
    "I'd love it if complex concepts were explained with code examples."
]

result = personalized_response("What is overfitting in machine learning?", user_history)
print(result)

Terminology

PEFTParameter-Efficient Fine-Tuning. Adds only small additional modules without full model retraining.

LoRAA lightweight fine-tuning technique representing weight changes as just two small matrices. Model size unchanged, parameters to train dramatically reduced.

RLHFReinforcement Learning from Human Feedback. Human preference feedback becomes a reward signal to adjust LLMs via RL.

DPODirect Preference Optimization. Simpler approach directly optimizing the model from preference data pairs without complex RL.

RAGRetrieval-Augmented Generation. Retrieves relevant information from external databases to include in the prompt.

MoEMixture of Experts. Has multiple expert modules, activating the appropriate expert based on input.

Federated LearningPrivacy-preserving learning where personal data stays on each device — only model parameter updates are shared.

Contrastive PromptingComparing responses with and without personalization info to amplify the personalization effect.

Related Resources

Original Abstract (Expand)

Large Language Models (LLMs) excel in handling general knowledge tasks, yet they struggle with user-specific personalization, such as understanding individual emotions, writing styles, and preferences. Personalized Large Language Models (PLLMs) tackle these challenges by leveraging individual user data, such as user profiles, historical dialogues, content, and interactions, to deliver responses that are contextually relevant and tailored to each user's specific needs. This is a highly valuable research topic, as PLLMs can significantly enhance user satisfaction and have broad applications in conversational agents, recommendation systems, emotion recognition, medical assistants, and more. This survey reviews recent advancements in PLLMs from three technical perspectives: prompting for personalized context (input level), finetuning for personalized adapters (model level), and alignment for personalized preferences (objective level). To provide deeper insights, we also discuss current limitations and outline several promising directions for future research. Updated information about this survey can be found at the https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models.