LLM-NER: Advancing Named Entity Recognition with LoRA+ Fine-Tuned Large Language Models
TL;DR Highlight
A methodology paper on fine-tuning LLMs with LoRA+ to boost NER (Named Entity Recognition) performance.
Who Should Read
NLP engineers and researchers working on information extraction tasks who want to efficiently fine-tune LLMs for NER without full fine-tuning costs.
Core Mechanics
- Standard LoRA has a learning rate imbalance between the A and B matrices that slows convergence for NER tasks
- LoRA+ addresses this by setting different learning rates for the A matrix (lower) and B matrix (higher), leading to faster and better convergence
- LLMs fine-tuned with LoRA+ for NER significantly outperform smaller specialized NER models on domain-specific datasets
- The approach is particularly effective for low-resource NER domains where training data is limited — LoRA+'s efficiency matters more here
- Few-shot LoRA+ fine-tuning (even with 100-500 examples) achieves competitive results with full fine-tuning on specialized NER benchmarks
- The paper provides practical guidance on LoRA rank selection, learning rate ratios, and training duration for NER
Evidence
- On CoNLL-2003 NER: LoRA+ fine-tuned LLM achieved F1 92.4 vs. LoRA 91.1 vs. specialized NER model 91.8
- On biomedical NER (low-resource): LoRA+ F1 85.2 vs. LoRA 82.7 vs. BioBERT 83.5
- Convergence speed: LoRA+ reached peak performance in 60% of the training steps required by standard LoRA
How to Apply
- For NER fine-tuning: use LoRA+ with rank=16, learning rate ratio B/A = 16 (B matrix 16x higher learning rate than A matrix), and target the attention + MLP layers.
- If you have < 1K training examples: LoRA+ fine-tuning with a 7B LLM will likely outperform training a dedicated smaller NER model — the LLM's pretrained language understanding is a strong prior.
- For production NER: fine-tune with LoRA+ for performance, then consider LoRA weight merging into the base model for inference efficiency — eliminates adapter overhead.
Code Example
snippet
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "meta-llama/Llama-2-7b-hf" # or another LLM
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True, device_map="auto")
# LoRA+ style: increase lora_alpha for learning rate correction effect
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Example output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.062
# Example instruction format for NER
ner_prompt = """Extract named entities from the sentence below. Format: [TYPE: entity]
Sentence: Apple's CEO Tim Cook gave a presentation in Seoul.
Answer:"""
# Expected output: [ORG: Apple] [PER: Tim Cook] [LOC: Seoul]Terminology
NERNamed Entity Recognition — the NLP task of identifying and classifying named entities (people, places, organizations, etc.) in text.
LoRALow-Rank Adaptation — a parameter-efficient fine-tuning method that adds small trainable matrices to a frozen pretrained model.
LoRA+An improvement over LoRA that uses different learning rates for the A and B adapter matrices to address training imbalance.
F1 ScoreA metric combining precision and recall — the harmonic mean of both. Standard metric for NER evaluation.
Low-Resource NERNER for domains or languages with limited labeled training data.