LLM-NER: Advancing Named Entity Recognition with LoRA+ Fine-Tuned Large Language Models

TL;DR Highlight

A methodology paper on fine-tuning LLMs with LoRA+ to boost NER (Named Entity Recognition) performance.

Who Should Read

NLP engineers and researchers working on information extraction tasks who want to efficiently fine-tune LLMs for NER without full fine-tuning costs.

Core Mechanics

Standard LoRA has a learning rate imbalance between the A and B matrices that slows convergence for NER tasks
LoRA+ addresses this by setting different learning rates for the A matrix (lower) and B matrix (higher), leading to faster and better convergence
LLMs fine-tuned with LoRA+ for NER significantly outperform smaller specialized NER models on domain-specific datasets
The approach is particularly effective for low-resource NER domains where training data is limited — LoRA+'s efficiency matters more here
Few-shot LoRA+ fine-tuning (even with 100-500 examples) achieves competitive results with full fine-tuning on specialized NER benchmarks
The paper provides practical guidance on LoRA rank selection, learning rate ratios, and training duration for NER

Evidence

On CoNLL-2003 NER: LoRA+ fine-tuned LLM achieved F1 92.4 vs. LoRA 91.1 vs. specialized NER model 91.8
On biomedical NER (low-resource): LoRA+ F1 85.2 vs. LoRA 82.7 vs. BioBERT 83.5
Convergence speed: LoRA+ reached peak performance in 60% of the training steps required by standard LoRA

How to Apply

For NER fine-tuning: use LoRA+ with rank=16, learning rate ratio B/A = 16 (B matrix 16x higher learning rate than A matrix), and target the attention + MLP layers.
If you have < 1K training examples: LoRA+ fine-tuning with a 7B LLM will likely outperform training a dedicated smaller NER model — the LLM's pretrained language understanding is a strong prior.
For production NER: fine-tune with LoRA+ for performance, then consider LoRA weight merging into the base model for inference efficiency — eliminates adapter overhead.

Code Example

snippet

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Llama-2-7b-hf"  # or another LLM
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True, device_map="auto")

# LoRA+ style: increase lora_alpha for learning rate correction effect
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Example output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.062

# Example instruction format for NER
ner_prompt = """Extract named entities from the sentence below. Format: [TYPE: entity]

Sentence: Apple's CEO Tim Cook gave a presentation in Seoul.

Answer:"""
# Expected output: [ORG: Apple] [PER: Tim Cook] [LOC: Seoul]

Terminology

NERNamed Entity Recognition — the NLP task of identifying and classifying named entities (people, places, organizations, etc.) in text.

LoRALow-Rank Adaptation — a parameter-efficient fine-tuning method that adds small trainable matrices to a frozen pretrained model.

LoRA+An improvement over LoRA that uses different learning rates for the A and B adapter matrices to address training imbalance.

F1 ScoreA metric combining precision and recall — the harmonic mean of both. Standard metric for NER evaluation.

Low-Resource NERNER for domains or languages with limited labeled training data.