We were super busy last half year with product development, but finally, it’s time to catch up on what changed in the Machine Translation market. We run our regular benchmark for pre-trained Machine Translation engines and eager to share our findings:
- 📊 Overall MT Quality significantly improved for Finnish<>English, German<>French, Romanian>English, Russian>English, Chinese>English. We have updated some of the datasets to WMT-2019, which led to quality fluctuations, but the language pairs above improved beyond that.
- 🏆 The best MT provider has changed for 19 language pairs since January 2019. To get the best quality across 48 language pairs, one needs 8 different engines.
- 🌏 Many engines increased their language coverage: Google, Amazon, Kakao, Systran PNMT, SDL, PROMT, ModernMT, IBM. Google actually added Portuguese Brazilian (although this is not reflected in docs).
- 📦 New pre-trained engines: Alibaba, Tencent, Tilde (EN-PL) and Cloud Translation.
Read the report on Slideshare.
If Slideshare is blocked by your government or company, we have another copy here: https://docsend.com/view/z3f2pes
Overall MT Quality
The number of language pairs with an average score over 0.7 continues to increase. Based on the projects with LQA we had recently, below 0.7 clearly indicate linguistic errors, hence MT still has a lot of room to grow.