Spurious Forgetting in Continual Learning of Language Models

Jan 23, 2025•Junhao Zheng, Xidi Cai, Shengjie Qiu +1•View PDF

TL;DR Highlight

LLM performance drops after learning new tasks not because of knowledge loss, but because task alignment breaks — and simply freezing lower layers mostly prevents it.

Who Should Read

ML engineers who sequentially fine-tune LLMs or consider additional fine-tuning after safety alignment. Especially those who've seen previous task performance suddenly drop after new task training.

Core Mechanics

Performance drops when learning new tasks are due to 'task alignment collapse' not 'knowledge loss' — proven by experiments showing recovery with just 10 examples
The first 150 steps of new task training are the core problem — previous task alignment is rapidly overwritten in this window
Lower layers (including embeddings) handle task alignment; orthogonal weight updates in these layers cause spurious forgetting
Freezing lower layers alone improved SEQ Task 0 accuracy from 11% to 44% — all existing methods (EWC, LAMOL, Gradient Projection) stayed below 22%
Applied to LLaMa-2-7B-Chat safety alignment: jailbreak rate dropped from 99.80% to 1.15% (6 layers frozen)
Validated on LLaMa-3-8B-Instruct, Qwen2.5-7B-Instruct, Mistral-8B for math/code SFT — Freeze mitigated general capability degradation

Evidence

Biography synthetic dataset: Freeze (7 layers + early stop) achieved Task 0 accuracy 44.22% vs SEQ 11.18%, best competitor (Task Vector) 30.75%
Safety Alignment: freezing 6 layers dropped jailbreak rate from 99.80% to 1.15% (LLaMa-2-7B-Chat)
Recovery experiment: 96% recovered Task 0 accuracy maintained even after 150 steps of Task 1 training — knowledge itself is intact
LLaMa-3-8B-Instruct math SFT: general capability avg 64.15 vs 66.11 with Freeze; math ability maintained (80.29 vs 80.17)

How to Apply

During sequential fine-tuning: freeze lower 1-3 layers (+ embeddings) immediately after learning the first task and train subsequent tasks. More similar task formats benefit from freezing more layers.
If additional fine-tuning is needed after safety alignment, freeze the lower 6 layers during fine-tuning to greatly suppress safety alignment collapse.
Even for single-task fine-tuning like code/math SFT: freezing just the bottom 1 layer reduces general capability degradation while maintaining target performance.

Code Example

snippet

# Example of freezing the bottom N layers with HuggingFace Transformers
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

N_FREEZE_LAYERS = 2  # Number of bottom layers to freeze

# Freeze embeddings + bottom N layers
for param in model.model.embed_tokens.parameters():
    param.requires_grad = False

for i in range(N_FREEZE_LAYERS):
    for param in model.model.layers[i].parameters():
        param.requires_grad = False

# Check the number of trainable parameters
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
total = sum(p.numel() for p in model.parameters())
print(f"Trainable parameters: {trainable/total:.1%}")

# Proceed with standard fine-tuning afterwards

Terminology

Continual LearningA learning paradigm where the model learns new tasks sequentially without forgetting old ones.

Catastrophic ForgettingWhen learning new things causes the model to completely forget previously learned information.

Task AlignmentThe ability to apply the model's knowledge to a specific task format. Separate from knowledge itself — knowing 'how to answer' in the right format.

Spurious ForgettingA concept defined in this paper. Performance appears to drop without actual knowledge loss — the knowledge is there, but task alignment is broken.

EWCElastic Weight Consolidation. Applies constraints on important parameters to prevent large changes. Estimates parameter importance using the Fisher Information Matrix.

Fisher Information MatrixA matrix measuring how important each model parameter is for the current task.

Orthogonal UpdateAn update perpendicular to the existing weight direction. This paper identifies orthogonal updates as the primary cause of task alignment breakdown.

SFTSupervised Fine-Tuning. The most basic way to fine-tune a pretrained model for a specific task.

Related Resources

https://github.com/zzz47zzz/spurious-forgetting

Original Abstract (Expand)

Recent advancements in large language models (LLMs) reveal a perplexing phenomenon in continual learning: despite extensive training, models experience significant performance declines, raising questions about task alignment and underlying knowledge retention. This study first explores the concept of"spurious forgetting", proposing that such performance drops often reflect a decline in task alignment rather than true knowledge loss. Through controlled experiments with a synthesized dataset, we investigate the dynamics of model performance during the initial training phases of new tasks, discovering that early optimization steps can disrupt previously established task alignments. Our theoretical analysis connects these shifts to orthogonal updates in model weights, providing a robust framework for understanding this behavior. Ultimately, we introduce a Freezing strategy that fix the bottom layers of the model, leading to substantial improvements in four continual learning scenarios. Our findings underscore the critical distinction between task alignment and knowledge retention, paving the way for more effective strategies in continual learning.