F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World
TL;DR Highlight
Multilingual embeddings supporting 200 languages without English bias that outperform Qwen3-Embedding at smaller sizes.
Who Should Read
Engineers building multilingual search, retrieval, or RAG systems that need strong embedding models for non-English languages.
Core Mechanics
- Most existing embedding models have strong English bias — performance degrades significantly for non-European languages
- The proposed model supports 200 languages with balanced performance across language families, including low-resource languages
- Achieves better performance than Qwen3-Embedding at equivalent or smaller model sizes on multilingual benchmarks
- Training used a curated multilingual dataset with explicit balancing to prevent high-resource language dominance
- The model uses a modified architecture that avoids language-specific components — truly language-agnostic representations
- Particularly strong on code-switching scenarios where multiple languages appear in the same document
Evidence
- On MIRACL (multilingual retrieval benchmark): average nDCG@10 across 18 languages — new model 68.4 vs Qwen3-Embedding 64.1 at comparable model size
- Low-resource language performance gap: new model shows <5% degradation from English baseline vs. 20-30% for comparable models
- Code-switching retrieval tasks: +12% recall@10 over best comparable multilingual model
How to Apply
- Drop-in replacement for any English-centric embedding model in multilingual RAG pipelines — especially valuable for Asian, African, and Middle Eastern language support.
- For multilingual document stores: index with this model to get cross-lingual retrieval for free — queries in one language can retrieve documents in another.
- Evaluate on your specific language mix before deploying — while broadly better, performance varies and your target languages may have specific characteristics worth checking.
Code Example
from sentence_transformers import SentenceTransformer
# Load model (HuggingFace: codefuse-ai/F2LLM-v2-1.7B)
model = SentenceTransformer('codefuse-ai/F2LLM-v2-1.7B', trust_remote_code=True)
# Basic embeddings (full dimension)
sentences = [
'안녕하세요, 오늘 날씨가 좋네요.',
'Hello, the weather is nice today.',
'مرحبا، الطقس جميل اليوم.',
]
embeddings = model.encode(sentences)
print(f'Full embedding shape: {embeddings.shape}') # (3, 2048)
# MRL: truncate to 128 dimensions for lightweight usage
embeddings_small = embeddings[:, :128]
print(f'Truncated embedding shape: {embeddings_small.shape}') # (3, 128)
# Apply retrieval instruction (query only)
query = '날씨에 관한 인사말을 찾아줘'
query_embedding = model.encode(
query,
prompt='Instruct: Given a query, retrieve relevant passages\nQuery: '
)
Terminology
Related Resources
Original Abstract (Expand)
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques, we present models that are far more efficient than previous LLM-based embedding models while retaining competitive performances. Extensive evaluations confirm that F2LLM-v2-14B ranks first on 11 MTEB benchmarks, while the smaller models in the family also set a new state of the art for resource-constrained applications. To facilitate open-source embedding model research, we release all models, data, code, and intermediate checkpoints.