F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

Mar 19, 2026•Ziyin Zhang, Zihan Liao, Hang Yu +2•View PDF

TL;DR Highlight

Multilingual embeddings supporting 200 languages without English bias that outperform Qwen3-Embedding at smaller sizes.

Who Should Read

Engineers building multilingual search, retrieval, or RAG systems that need strong embedding models for non-English languages.

Core Mechanics

Most existing embedding models have strong English bias — performance degrades significantly for non-European languages
The proposed model supports 200 languages with balanced performance across language families, including low-resource languages
Achieves better performance than Qwen3-Embedding at equivalent or smaller model sizes on multilingual benchmarks
Training used a curated multilingual dataset with explicit balancing to prevent high-resource language dominance
The model uses a modified architecture that avoids language-specific components — truly language-agnostic representations
Particularly strong on code-switching scenarios where multiple languages appear in the same document

Evidence

On MIRACL (multilingual retrieval benchmark): average nDCG@10 across 18 languages — new model 68.4 vs Qwen3-Embedding 64.1 at comparable model size
Low-resource language performance gap: new model shows <5% degradation from English baseline vs. 20-30% for comparable models
Code-switching retrieval tasks: +12% recall@10 over best comparable multilingual model

How to Apply

Drop-in replacement for any English-centric embedding model in multilingual RAG pipelines — especially valuable for Asian, African, and Middle Eastern language support.
For multilingual document stores: index with this model to get cross-lingual retrieval for free — queries in one language can retrieve documents in another.
Evaluate on your specific language mix before deploying — while broadly better, performance varies and your target languages may have specific characteristics worth checking.

Code Example

snippet

from sentence_transformers import SentenceTransformer

# Load model (HuggingFace: codefuse-ai/F2LLM-v2-1.7B)
model = SentenceTransformer('codefuse-ai/F2LLM-v2-1.7B', trust_remote_code=True)

# Basic embeddings (full dimension)
sentences = [
    '안녕하세요, 오늘 날씨가 좋네요.',
    'Hello, the weather is nice today.',
    'مرحبا، الطقس جميل اليوم.',
]
embeddings = model.encode(sentences)
print(f'Full embedding shape: {embeddings.shape}')  # (3, 2048)

# MRL: truncate to 128 dimensions for lightweight usage
embeddings_small = embeddings[:, :128]
print(f'Truncated embedding shape: {embeddings_small.shape}')  # (3, 128)

# Apply retrieval instruction (query only)
query = '날씨에 관한 인사말을 찾아줘'
query_embedding = model.encode(
    query,
    prompt='Instruct: Given a query, retrieve relevant passages\nQuery: '
)

Terminology

Embedding ModelA model that converts text into dense vectors (embeddings) that capture semantic meaning — similar texts have similar vectors.

nDCG@10Normalized Discounted Cumulative Gain at 10 — a retrieval quality metric that considers both whether relevant documents are retrieved and how highly they're ranked.

Code-SwitchingThe linguistic phenomenon of mixing multiple languages within a single text or conversation.

MIRACLMassively Multilingual Information Retrieval Across a Continuum of Languages — a benchmark for evaluating multilingual retrieval.

Low-Resource LanguageA language with limited training data available — smaller internet presence, fewer parallel corpora, less represented in model training.

Related Resources

Original Abstract (Expand)

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on previously underserved mid- and low-resource languages. By integrating a two-stage LLM-based embedding training pipeline with matryoshka learning, model pruning, and knowledge distillation techniques, we present models that are far more efficient than previous LLM-based embedding models while retaining competitive performances. Extensive evaluations confirm that F2LLM-v2-14B ranks first on 11 MTEB benchmarks, while the smaller models in the family also set a new state of the art for resource-constrained applications. To facilitate open-source embedding model research, we release all models, data, code, and intermediate checkpoints.