Read about

Blog/GenAI

Data-driven vs requirements-driven translation

April 23, 2025

Let’s talk about the data-driven approach to automatic translation—what it is, why it’s useful, where it falls short, why it’s not enough, and how we can fix it with something we call requirements-based translation.

If you’d rather watch than read, this content is also available as a video.

The idea behind data-driven translation is simple: use your past translations — your translation memories — to improve future ones. There are many ways to do this: domain adaptation, fine-tuning, and retrieval-augmented generation. But the principle is the same — feed your data into the model to make it work better for you.

In this setup, every dollar spent on post-editing isn’t just fixing errors — it’s investing in better AI, as that data goes back to improve future output.

Sounds like a win-win, right? So what’s the catch?

Three major limitations of data-driven translation

Limitation #1: Diminishing returns on quality improvements:
After the initial quality boost from historical data, improvements tend to taper off. Even the biggest enterprises don’t generate enough new content to keep that upward curve going.

Limitation #2: Some requirements can’t be learned from data alone:
Not all requirements can be learned from your data. So you still need post-editors — not just to fix errors, but to fill in the gaps where the AI didn’t get the memo.

Limitation #3: Issues with change management:
Training on past data means you’re walking forward while looking backward. If your requirements change — which they always do — half your historical data becomes irrelevant overnight. And you haven’t produced enough new content under the new rules for the AI to catch up.

Or take the more common case: you want to add a new language and apply the same requirements you use for ten others. With a data-driven approach, you’re out of luck — you simply don’t have the training data.

So yes, data-driven translation is the classic way to apply AI to enterprise localization. It fits well in human-first workflows. You’ll save on post-editing rates — but you’ll also keep doing post-editing. For all your content. Forever.

The recent shift from training to instruction

So how do we automate localization the AI-first way — where humans handle the hard stuff, but most of the work runs on autopilot?

To answer that, let’s step back and ask: why did we all start with data-driven translation in the first place?

Simple — until recently, the only way to steer AI output was to train it on our data. And the only thing AI could do was generate translation drafts. If you wanted to improve those drafts, you had to bring in humans — and explain exactly what needed to be fixed.

But today, we can instruct large language models much like you instruct human translators. They don’t just generate drafts — they can follow specific translation requirements, check if those requirements are met, and revise accordingly.

Redefining quality: Moving from metrics to requirements

This changes the game — including how we define automatic translation quality.

Until now, we’ve had a double standard. For human translation, quality meant meeting specific goals: following the style guide, using the right tone, staying compliant. For machine translation, it was just a score — a number between 0 and 1. Useful for triage, useful to compare different AI models, but not enough to trust the output without review.

That’s no longer the case. Today, we can apply the same standards to both. AI can check — and fix — specific requirements like terminology, tone of voice, and regulatory compliance. That closes the loop on quality management — automatically.

How requirements-driven translation works

This shift enables what we at Intento call a requirements-based approach to translation automation.

In the data-driven model, you collect all your data, clean it up, feed it into the AI — and hope for the best.

In the requirements-based model, you flip the script. You start by defining your requirements — what really matters in your translations beyond basic fluency.

Then you:

Implement automatic checks for those requirements so you always know what’s missing.
Train the AI on available data where it helps — the data still matters.
Fill the remaining gaps by adding AI agents instructed to check and fix missing requirements.

With this setup, your most critical, high-volume content can be translated fully automatically — and still meet every requirement.

That’s what we mean by requirements-based translation. And that’s exactly where our AI agents step in.

It doesn’t just improve translation quality — it also aligns expectations. Between you and your internal stakeholders. Between us and our customers. Between promises and delivery.

This isn’t a theory. We already have several real-world cases in production. And you’ll hear about them — at webinars, conferences, and anywhere else you find us.

This is the future of localization, and it’s already here.

Already a member? Sign In

Blog/GenAI

Data-driven vs requirements-driven translation

Three major limitations of data-driven translation

The recent shift from training to instruction

Redefining quality: Moving from metrics to requirements

How requirements-driven translation works

Why your Contact Center AI agents need to speak multiple languages

GPT-3 Translation Capabilities

The brave new world of AI in Enterprise Localization: Amadeus’ journey

We know how to make your business multilingual and productive. Let's talk.