Intento

Blog/Insights

How we used machine translation to make our website speak 7 languages (and you can too!)

Daria Sinitsyna

Lead Computational Linguist at Intento

Part 1. How to automatically translate a website?

We at Intento have a tool called a Website Translator. This simple web widget enables real-time, on-the-fly website translation using custom workflows configured in our Language Hub. It’s commonly used for translating internal enterprise websites, employee and community portals, help center content, and any web application requiring on-the-fly translation without a dedicated Language Hub connector.

We decided to demo the Website Translator on our own website. Unlike our large enterprise customers, we lacked dedicated resources to review machine translations in all languages. In the beginning, we simply enabled the widget. We used our “default MT routing,” selecting the best Machine Translation engines from our State of MT report to handle translation requests from the Website Translator.

However, the output quality was quite poor, and even if it’s justified for a cobbler’s kid to run barefoot, we wanted something better.

Should you use human translators to improve website translation? Yes, but only after you’ve done your homework and exhausted all automatic options. As the next step, we invested time in implementing the Website Translator for our site just as we do for our customers.

We want to explain this process. Check out the results on our website in your language. If you still find issues, let us know using the feedback form!

Part 2. How to Know Your Website Has Translation Issues

What should you do if you expect that the translation could be better, but lack specifics of an expert linguist review? In 2024, use an LLM! We’ve created a Language Skill that provides actionable feedback on website localization. It takes screenshots of the original and translated website and then identifies specific issues based on the feedback. Here’s a sample output for our Japanese website:

We don’t know if all the comments for every page and language (which we generated automatically) were accurate or if there are more issues, but it’s a good starting point. Based on these comments, we identified the following sources of website translation problems:

  • Some phrases were mistranslated
  • Product names and terminology were translated inconsistently
  • The tone of voice was wrong in some languages
  • The source was challenging (apparently, Machine Translation doesn’t know localization terminology in some languages, including the “MT” acronym)
  • The website layout was poor, causing some phrases to be split by block tags and sensitive to text length changes from translation
  • Translation was applied to the entire website (some elements should remain untranslated)

This is exactly how we felt in our customers’ shoes, as these are the same issues one might face when applying a default MT engine to a website.

Part 3. Preparing a Website for Automatic Translation

Often, the root cause of many issues isn’t the MT configuration, choice, or even the widget implementation – it’s the original website that was built without automatic translation in mind. Here’s what you should do before applying machine translation to your website – we did, and it helped!

First, you need to analyze your website content and its translation at scale. Theoretically, you can do this by copying and pasting it into ChatGPT or Claude, but that would require too many keystrokes.

At Intento, we have a tool called Translation Storage, which is a write-through cache. Every translated sentence is stored in the Translation Storage, eliminating the need for repeated (machine) translations. Additionally, you can write a prompt and apply it to your stored translations at scale – this could be prompts to analyze translations, modify them, or even explain changes.

After enabling Translation Storage with Website Translator on our website, we browsed through all the pages in the seven languages we decided to work with. This populated the storage with translations. Now it’s time to get some actionable insights!

We scanned the translations in the Storage interface and quickly found that the issues with mistranslated product names and text chunking (split sentences) appeared for all languages. To fix this once and for all, we built a proper noun glossary for Intento product names and adjusted the website layout, replacing block tags with inline tags where appropriate and reducing the level of tag nesting in the website content.

Once this was done, we cleared the Translation Storage of old translations and proceeded to improve specific languages.

Part 4: Optimizing Automatic Website Translation

We first optimized the website layout, excluding certain areas (legal texts, API Catalog) from automatic translation and making the layout responsive to accommodate varying translation lengths.

Next, we verified if our general routing based on the MT report was suitable for our AI-focused content with inline markup, which differs from the report’s content.

As we don’t have a localization program or translation memories at Intento, we were limited to stock MT engines with glossaries.

We conducted seven evaluations, translating the website with several new MT engines and automatically assessing translation risk using the “Evaluate translation with GPT-4” action in Storage. Based on the results, we adjusted the routing. The final table is shown below:

PairBeforeAfter
EN> GermanDeepLDeepL + glossary, Storage
EN> ChineseGoogleGoogle + glossary, Storage
EN> JapaneseDeepLDeepL + glossary, Storage
EN> SpanishGoogleDeepL + glossary, ToV, Storage
EN> FrenchDeepLDeepL + glossary, source quality improvement,, Storage
EN> ItalianDeepLMicrosoft + glossary, Storage
EN> RussianDeepLDeepL + glossary, Storage

 

After maximizing the performance of the available stock engines, we identified the top four remaining issues. Here are the details:

4.1 Achieving correct translation for proper nouns.

We automatically extracted a list of proper nouns and acronyms from the website and prepared a set of MT glossaries. Since our Glossary Management system unifies glossary usage across multiple MT engines, we simply uploaded the glossaries to Intento and enabled them in the MT routing for all languages.

As mentioned earlier, MT engines struggle with localization-specific terminology for some languages, such as French, including the “MT” acronym. For those languages, we also added expansion glossaries, which are applied on our side before machine translation.

4.2 Translation breaks tags for certain languages

The translations severely impacted the text markup for languages such as Russian, German, and French. While we could theoretically experiment with various website layouts to determine the optimal one, we opted for a simpler solution since these issues only occurred in specific languages.

We applied the “Fix HTML markup after MT” GenAI action, which is available in the Translation Storage:

4.3 Incorrect tone of voice in some languages

Our translation risk assessment identified that the tone of voice needed adjustment in Spanish. Since we have a tone of voice settings as part of Intento’s translation workflows, we enabled it for the MT engine used for this language.

Part 5. How to Do Even Better?

After applying all changes to the website layout and MT configuration, we reran the GenAI-based evaluation using screenshots of the original and translated website pages. The LLM conclusion confirmed that we improved the translation quality:

The translation quality has significantly improved. The encoding issue has been resolved, and the main headline is now more natural and engaging in Japanese. The enhanced translation better conveys the intended message and is more user-friendly.

You can view the results directly on our website. If you speak one of these languages, we value your feedback. If you still notice numerous bizarre issues, we may consider engaging human linguists for further refinement.

While there are a few areas that could be further improved, we currently believe they don’t warrant additional investment. One such area is translating images. For customers who see ROI in translating images, we use two approaches:

1. Pre-translating images: Website images can be pre-translated using OCR technology and uploaded under different URLs. The URL glossary can then be uploaded to Intento and enabled in the translation workflow.

2. Replacing images with HTML layout: In most cases, text can be extracted from images and added via HTML and CSS. This makes the images on the website natively translatable on the fly. However, this requires extensive localization testing, so we decided to postpone it until it’s reported as an issue.

Conclusions

Our goal was to improve our website translation as much as possible on a limited budget, and also make a prime example on the lean approach to website translation. Here are some of the quick tips as a summary.

Quick wins: Top tips for smoother website translation

 

1. Keep HTML simple (avoid excessive and nested HTML formatting). Complex markup can lead to inaccurate translations by default MT.

2. Translations can make text longer, which might cause layout issues. Plan your design to handle this, especially for multilingual sites.

3. Use glossaries to keep your terms consistent. Glossaries are really helpful if you want to keep product terminology the same in all languages.

4. Use CSS classes to mark elements that shouldn’t be translated, like usernames, product names, or any other element.

Ready to go global? Here’s how to set up your multilingual site

Website Translator works with HTML content and needs a JavaScript snippet added to your site. It supports direct integration with HTML-based websites, ensuring a smooth translation process for most standard web structures.

To learn more about the Website Translator and how to set up the integration, book a demo call, and we’ll guide you through it.

 

Read more

SHARE THIS ARTICLE
Continue reading the article after registration
Already a member? Sign In

We know how to make your business multilingual and productive. Let's talk.