At AMTA 2022 in Orlando, Andy Nikulin (Director of Customer Success and Professional Services, Intento) and Kevin Bruner (Global Vice President of Software Engineering, ULG) came together to share insights from a solution Intento developed with ULG to fulfill the heightened demands of a key ULG customer, improving project velocity while maintaining the quality of the e-learning experience.
Keep reading to learn about the current state of speech technology as it relates to the production and localization of e-learning content. This post highlights gaps in the typical human process for multilingual e-learning video production and the solution Intento and ULG have developed to automate the process while increasing performance standards.
• • •
The objective and initial concerns
The objective was to localize ~30 multimedia courses into 10 languages with up to 6 distinctive female and male voices in each language. The client was concerned about the overall cost of human voiceover and decided that doing text-to-speech (TTS) would be a cost-effective solution. ULG was concerned that the traditional process of implementing TTS would require too much effort.
• • •
The biggest challenge was getting it all set. Because this was new to Intento and ULG, we needed to learn how to work with a new vendor — what were the existing processes on the ULG side, and what materials were being exchanged with voice providers. Intento strived to reuse the traditional ULG processes and script formats for human voiceovers to minimize the cost/effort on the ULG side.
We learned that the main loop consists of several steps:
- Script exchange > Script processing > TTS < audio exchange
To ensure high-quality results in the face of many unknown elements, Intento added a quality assurance step:
- Initial loop > Quality analysis > Correction loop
• • •
Usually, TTS solutions require a massive, costly integration investment that could disrupt the set processes. To avoid any extra costs, ULG and Intento agreed to stick to ULG’s set processes based on an exchange of email files.
The tailored e-mail integration processed incoming emails, extracting the script and role and producing an audio file which was then sent back — requiring zero effort from ULG and no disruption to the traditional workflow.
• • •
Challenge: script processing
Down the line, we discovered that traditional scripts are not TTS-friendly, meaning it’s not always easy to see which data is relevant for extraction. Implementing a custom pipeline that processes the traditional script to extract the necessary information.
One of the challenges was that there were different types of script formats, which meant that 5 unique formats had to be made known at the preparation stage. The custom pipeline was designed to make it easy to add a new script if needed. Flexibility became a key factor, as 3 new script types were eventually explored during the production phase.
• • •
Challenge: source quality
Human actors are used to inconsistencies in a given script with the ability to use common sense to iron out any imperfections. On the other hand, machines will treat flaws literally, thus producing errors in the finished product. This made the QA step vital to ensuring the highest possible quality.
The pipeline was implemented so that once the QA team spotted errors, it was incredibly easy to update scripts and regenerate audio files in an automated fashion.
• • •
Challenge: voice selection
The market for human voices is practically unlimited. For neural TTS, a good number of voices are offered for popular voices, while rare languages can be a challenge. Custom model training is always a good option but comes at a higher cost. Pitch control and speed rate helped generate distinctive voices in cases that were not readily available.
We discovered that voice selection should be planned. Having the customer choose the voice they prefer for their content beforehand saves time when facing an array of voices available on the market.
If the material is heavy with unique terminology, phonemes curation can be used to reduce the risk of any pronunciation issues.
• • •
For reference, see below the entire workflow implemented by ULG and Intento for this project. Please note that in this post, we are discussing the speech aspects of the project.
• • •
Support from Intento
Intento is well known for providing the best-fit MT. In this case, we were able to automate streamlined workflows and data security while equipping the team with intelligent technology to create and translate content 4x faster.
Intento additionally brought implementation support for four TTS vendors; Google, Microsoft, IBM, and AWS. Through these vendors, we were able to offer the client 326 neural TTS voices across 42 languages and language variants and 295 standard TTS voices across 36 languages and language variants.
This image gives you an idea of the current TTS landscape, from global to more niche providers:
Result — $1.5M and 2 years of labor saved
The client received quality audio voice-over, generated quickly and at a reduced cost compared to the traditional human approach. Saving an estimated $1.5M and 2 years of labor, the client was extremely pleased with the result, and ULG has expanded its TTS offerings through Intento.
In the end, Intento solidified the relationship between an enterprise client and trusted service provider moving forward into an increasingly digital, automated future.
Insights from ULG
ULG, as a language service provider, realized that the demand for faster and more reliable service is driving innovations throughout the industry. Times are changing, and you need to be ready to get on the train before it leaves the station.
The biggest surprise was the sheer speed and efficiency of generating TTS files and the overall quality of the result. Voiceover is just a piece of the e-learning workflow but is generally considered expensive and time-consuming. TTS reduces this common problem.
Budgets are constantly being evaluated, so finding new and efficient solutions allows ULG to do much more in localization than in previous years.
• • •
Book a demo to see what new solutions Intento can bring to the table to help your business and customers create 20x more content with AI.