Read about

Blog/GenAI

It’s not about the MT engine anymore: Translation Automation in 2025

September 5, 2025

The translation industry is at an inflection point. This is not just another update on translation technology—the data reveals a measurable shift in enterprise translation strategy.

In our 2025 State of Translation Automation report (previously “State of Machine Translation”), we evaluated 46 Machine Translation engines and Large Language Models across 11 language pairs. The results point to something different than what we’ve assumed for years.

The end of baseline translation

For the first time in our annual evaluation, we’ve abandoned sentence-level scoring entirely. Why? Because traditional metrics no longer capture what matters in enterprise translation.

When we evaluated full-text translations against real-world requirements (general translation quality, terminology, tone of voice, formatting (tag handling), and full-text consistency), the results were clear. Most baseline models, regardless of their underlying technology, fail to meet professional standards with multiple errors per text.

But here’s what’s interesting: when properly customized with prompt engineering and RAG-based* glossaries, these same models transform dramatically. Error rates drop by 60-90%, making automated translation not just viable but mandatory for many use cases.

*Retrieval-Augmented Generation

It’s not about the MT engine – It’s about the solution

Automatic translation is no longer about choosing the “best engine” — it’s about building solutions that meet specific translation requirements.

We evaluated 80 solutions across 7 full texts in 11 language pairs, which resulted in the analysis of 6,000+ text translations. Our comparison included off-the-shelf models (both NMT and LLM), the same models adjusted to meet specific requirements using available customization options (excluding fine-tuning), multi-agent translation systems, and human translation.

14 MT and LLM solutions show the best results across all language pairs, with LLMs now representing 89% of top performers (vs 55% in 2024).

We focused on real-world enterprise needs of our customers – general translation quality alongside “standard” preferential requirements like terminology, tone of voice, and formatting. And yes, we used full-text translation because we’re in 2025 and segment-based translation is now obsolete.

What we found

Solutions customized to specific requirements outperformed others. While traditional metrics like COMET predictably fail to capture these advantages, the most telling indicator emerged from our human evaluation: reviewers often couldn’t distinguish AI from human translation—and sometimes rated AI translations higher.

The multi-agent approach (utilizing multiple AI agents to verify and test requirements, architected to prevent compounding hallucinations) delivered the best average performance. However, it wasn’t universally superior. These agents still require customization for specific languages and language pairs.

One of today’s most critical challenges is standardizing this customization process—without it, we’re stuck with excessive custom engineering and debugging overhead, making it feasible only for the largest-scale use cases.

The rise of LLMs in translation

LLMs now represent 89% of top performers (vs 55% in 2024). Models like Claude Opus 4, Claude Sonnet 3.7, and Gemini 2.5 Pro consistently exceed traditional MT quality, especially when customized with terminology and tone-of-voice instructions.

What’s particularly noteworthy is that LLMs fine-tuned for translation—such as Lara and DeepL next-gen—are achieving latency and price points competitive with traditional MT engines. This convergence of quality, speed, and cost represents a significant shift in the economics of translation automation.

The requirements-driven approach

The most significant development we’ve observed is the shift from data-driven to requirements-based translation. Instead of relying solely on historical data (which often plateaus in quality improvements), this approach uses AI agents to:

Fine-tune MT with historical translations, glossary support, and tone-of-voice control
Adapt and edit content both before and after translation
Reduce post-editing by up to 95%
Work effectively even with minimal training data

This addresses the three major limitations of traditional data-driven approaches:

Diminishing returns on quality improvements
Requirements that can’t be learned from data alone
Slow adaptation when requirements change

What this means for you

If you’re managing translation for a global organization, this has significant implications:

Customization is now essential, not optional. Our analysis shows baseline systems produce 10-15 errors per text, while customized solutions only have 0-2 errors. This 5-10x improvement in accuracy makes customization essential for any professional translation workflow.
The speed-quality trade-off is dissolving. While LLMs are still slower than specialized MT models, the newest translation-focused LLMs are approaching MT speeds. For many use cases, the quality improvements justify the additional processing time.
Multi-agent systems deliver the best results. Our multi-agent solution achieved “best” ratings in 9 out of 11 language pairs tested. These systems combine several agents for terminology integration, tone adjustment, and post-editing to consistently produce higher quality translations.
The market is expanding rapidly. We now track 103 vendors offering 549 unique languages across 195,928 language pairs. This expansion reflects the industry’s push toward truly global coverage.

The bottom line

The future of translation lies not in any single model or technology, but in intelligent orchestration of multiple AI systems working together. Successful translation automation now requires:

Adaptive workflows that select the best model for each specific use case
Requirements-based customization that goes beyond traditional training data
Quality-first approaches that prioritize consistency and accuracy over raw speed
Multi-agent orchestration that leverages the strengths of different AI models

This isn’t just an incremental improvement — it’s a fundamental shift in how we approach translation automation.

Want to see the full analysis with detailed metrics for all 80 evaluated solutions? Download our complete State of Translation Automation 2025 report.

Already a member? Sign In

Blog/GenAI

It’s not about the MT engine anymore: Translation Automation in 2025

The end of baseline translation

It’s not about the MT engine – It’s about the solution

What we found

The rise of LLMs in translation

The requirements-driven approach

What this means for you

The bottom line

The future of GRC is multilingual: AI agents for translation now live in MetricStream

Expert Q&A: Debunking myths about LLMs in Machine Translation

Data-driven vs requirements-driven translation

We know how to make your business multilingual and productive. Let's talk.