Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota - NLP - Machine TranslationWMT - FLORES-200 - COMETTask page
00 - Translation

Machine translation task router

Choose a translation model by language pair, domain, document context, deployment boundary, and evaluation metric. WMT is the public benchmark signal; production quality still needs COMET, terminology checks, and human edit rate.

helloEnglishMT policycontext + glossaryczescPolishinput textmodel selectiontranslated text
01 - Benchmarks

Public scores are a shortlist, not the deployment answer.

BLEU is still reported for continuity, but modern translation decisions should look at COMET, human preference, terminology errors, and edit distance on your own corpus.

BenchmarkRoleMetricBest use
WMTPrimary competitionCOMET / human eval / BLEUHigh-resource language pairs and annual system comparisons.
FLORES-200Coverage testspBLEU / chrF++Broad multilingual coverage, especially underrepresented languages.
COMETQuality estimatorreference-based / QERanking fluent translations when BLEU misses meaning and adequacy.
Human reviewProduction gateedits / accept rateLegal, medical, marketing, and other high-liability translation.
02 - Models

Translation systems by lane.

Dedicated MT, open multilingual models, and LLM translators solve different versions of the task. The right model depends more on pair and corpus than on one global rank.

Model / systemLaneGood fitWatch out
HY-MT1.5WMT frontierCompetition-grade text translation; use as current WMT reference signal.Availability and deployment path depend on the publisher.
DeepLProduction APIEuropean languages, business copy, and editorial workflows.Narrower language coverage than massively multilingual systems.
Google Cloud TranslationProduction APIBroad language support, low operational burden, and AutoML-style customization.Quality varies by pair and domain; test domain terminology.
NLLB-200Open modelSelf-hosted translation across 200 languages and low-resource coverage.License and quality constraints; needs evaluation per pair.
MADLAD-400Open modelVery broad language coverage when you need one multilingual backbone.Larger deployment footprint and uneven pair quality.
Claude / GPTLLM translationDocument-level context, terminology instructions, style transfer, and format repair.Higher cost and more latency; require review for factual fidelity.
03 - Routing

Pick by operational need.

Fast website or app localization

DeepL or Google Cloud Translation

Low ops, mature APIs, glossary support, and predictable latency.

Low-resource language coverage

NLLB-200 or MADLAD-400

Open multilingual models cover many pairs that commercial APIs handle weakly.

Document with style and terminology

LLM translation with glossary + review

Long context preserves names, tone, and paragraph-level coherence.

Regulated or private corpus

Self-hosted MT + domain fine-tune

Keeps data inside your boundary and lets terminology be audited.

04 - Related

Need the lower-level explainer?

The building-block page explains encoder-decoder translation, LLM translation, beam search, and implementation options.

Open translation explainer ->