Qwen3-Embedding-0.6B
Strong MTEB score, 1024 dimensions, 32k context, and realistic serving footprint.
MTEB is the shortlist, not the deployment answer. In production, choose embeddings by retrieval quality, vector size, context length, serving latency, re-embedding cost, and whether your data can leave your infrastructure.
Strong MTEB score, 1024 dimensions, 32k context, and realistic serving footprint.
Highest listed MTEB average and retrieval score, but expensive to serve and store.
Managed APIs remove ops burden; test retrieval on your own corpus before buying the leaderboard story.
Both keep vectors compact; Qwen3 is stronger on aggregate, BGE has mature retrieval tooling.
Scores are from the local MTEB snapshot used on the CodeSOTA MTEB page. Storage assumes float32 vectors before quantization or compression.
| Model | Production pick | MTEB avg | Retrieval | Rerank | Dims | Context | Params | Storage | Latency |
|---|---|---|---|---|---|---|---|---|---|
| Qwen3-Embedding-0.6B | Default self-hosted production | 64.34 | 64.65 | 61.41 | 1024 | 32k | 0.6B | 3.8 GB / 1M vectors | Low |
| KaLM-Embedding-Gemma3-12B | Offline quality ceiling | 72.32 | 75.66 | 67.27 | 3840 | 32k | 11.76B | 14.3 GB / 1M vectors | High |
| Qwen3-Embedding-8B | High-quality self-hosted | 70.58 | 70.88 | 65.63 | 4096 | 32k | 8B | 15.3 GB / 1M vectors | High |
| bge-m3 | Practical multilingual baseline | 59.56 | 57.89 | 56.78 | 1024 | 8k | 568M | 3.8 GB / 1M vectors | Low |
| text-embedding-3-large | Managed API default | 58.96 | 56.12 | 54.12 | 3072 | 8k | API | 11.4 GB / 1M vectors | Network |
| voyage-3.5 | Managed API for long-context RAG | 58.46 | 55.89 | 53.45 | 1024 | 32k | API | 3.8 GB / 1M vectors | Network |
For RAG, measure recall@20 and answer quality on your own documents. A model with a lower aggregate MTEB score can win if it retrieves your domain language better or keeps vector storage small enough to allow a larger candidate pool.
Add a reranker when precision matters. Bi-encoder embeddings are the recall layer; cross-encoders or LLM rerankers are the precision layer.