Preserving In-Context Learning ability in Large Language Model Fine-tuning

TL;DR Highlight

A study on how to preserve ICL (in-context learning — performing new tasks with just a few examples) ability that breaks during fine-tuning.

Who Should Read

Researchers and engineers who need to fine-tune LLMs for specific tasks while preserving their general few-shot learning capabilities.

Core Mechanics

Fine-tuning on task-specific data often catastrophically degrades a model's in-context learning (ICL) ability — a form of catastrophic forgetting
ICL capability is stored in a distributed way across attention layers — task-specific fine-tuning overwrites these distributed representations
Key finding: fine-tuning with diversity in the fine-tuning data preserves ICL better than fine-tuning on homogeneous task-specific data
LoRA fine-tuning preserves ICL significantly better than full fine-tuning because it limits the parameter update space
Adding a small fraction (~10%) of general instruction-following examples to the task-specific fine-tuning data substantially preserves ICL ability
The paper proposes an ICL preservation regularization term that can be added to any fine-tuning objective

Evidence

Full fine-tuning on task-specific data: ICL performance dropped by 31% on average across held-out tasks
LoRA fine-tuning: ICL performance dropped by only 9% — significantly better preservation
Adding 10% general data to fine-tuning mix: ICL performance preserved within 5% of baseline while maintaining task-specific performance

How to Apply

When fine-tuning: always use LoRA or PEFT methods rather than full fine-tuning if preserving ICL matters for your use case — the rank constraint acts as a natural regularizer.
Mix in ~10% of diverse general instruction-following data (e.g., from FLAN or Alpaca datasets) into your task-specific fine-tuning data — this simple trick significantly preserves ICL.
Evaluate ICL preservation explicitly in your fine-tuning pipeline: before and after fine-tuning, test the model on held-out few-shot tasks unrelated to your fine-tuning domain.

Code Example

snippet

# Example of mixing ICL format samples into fine-tuning data (based on HuggingFace datasets)

from datasets import concatenate_datasets, load_dataset

# Target task data
task_dataset = load_dataset("your_task_dataset")

# Function to convert to ICL format
def to_icl_format(examples, num_shots=3):
    """
    Convert to ICL format including few-shot examples
    e.g., [Example1 Q&A] [Example2 Q&A] [Example3 Q&A] [Actual Question]
    """
    icl_samples = []
    data = examples  # Modify to match actual data structure
    for i in range(num_shots, len(data)):
        shots = data[i-num_shots:i]
        shot_text = "\n".join([f"Q: {s['input']}\nA: {s['output']}" for s in shots])
        query = f"{shot_text}\nQ: {data[i]['input']}\nA:"
        icl_samples.append({"text": query, "label": data[i]['output']})
    return icl_samples

# Mix ICL format data at 20% ratio of the total dataset
# (Adjust to match data structure in actual implementation)
print("ICL data mixing ratio: 20% recommended")
print("LoRA rank: 8~16 (LoRA recommended over full fine-tuning)")

# LoRA configuration example
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,                    # rank
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
# model = get_peft_model(base_model, lora_config)

Terminology

ICL (In-Context Learning)The ability to perform a new task from just a few examples in the prompt, without any weight updates.

Catastrophic ForgettingWhen fine-tuning on new data causes a model to lose previously learned capabilities.

LoRALow-Rank Adaptation — a PEFT method that adds small trainable matrices while keeping the base model frozen.

PEFTParameter-Efficient Fine-Tuning — methods that update only a small fraction of model parameters during fine-tuning.

FLANA large instruction-following dataset used to fine-tune models for instruction following across many tasks.