Zero-Shot Translation is Both More and Less Important than you think
Recent advances in neural machine translation (NMT) represent a significant step forward in machine translation capabilities. Although most media coverage has significantly oversold the technology, one of Google’s announcements may actually be the most important one in the long run – the first successful deployment of zero-shot translation (ZST).
Just what is zero-shot translation? It is the capability of a translation system to translate between arbitrary languages, including language pairs for which it has not been trained.
To understand why this is important, consider how traditional statistical machine translation (SMT) systems work. They build up bilingual phrase tables that correlate text in two languages. Because they connect individual tongues on a one-to-one basis, these systems require training data and a separate customized instance of the MT system for each pair. They cannot translate between two languages for which no engine exists unless they can use a shared third language – called the pivot.
For example, if a system needs to translate from Finnish into Greek, but has no Finnish↔Greek training data, it might use a Finnish↔English engine to translate into English – the pivot – and then use a separate English↔Greek one to arrive at the target. Although this approach can produce readable output, the results are often unreliable and inferior because errors in the first language pair tend to compound in the second one.
By contrast, Google’s neural system feeds all training data into one engine, which allows it to build connections across multiple languages rather than individually between them. If no data exists for a particular pair, the software can then use inferential logic to deduce correct translations from other ones that do contain relevant data. To return to the previous example, if no useful data exists for the Finnish→Greek case, it would observe correlations between other languages to produce output. The output is likely to not be as good as for cases where in-pair data exists, but it is better than nothing and it helps fill in gaps in in the more than 3,000 language pairs that Google’s 113 languages can produce.
Two crucial aspects stand out:
- Google’s system does not use a pivot. It does not translate from one language to another and then feed its output back in to reach the third. Contrary to Google’s press announcements and tech bloggers, the system did not invent its own language (much less one called “Interlingua,” a term that has a specific meaning in MT research) that serves as a pivot. Instead, it uses all available data to move directly from one language to another.
- It can leverage data from multiple language pairs. Unlike the pivot scenario where only one language pair at a time participates in the translation, Google’s system can potentially use as many language pairs as contain relevant data at the same time. The result is better than trying to bridge a gap using a single intermediary.
Why Is ZST important? In a field often driven by hype and hyperbole, this may be the rare case where a development is more important than media coverage makes it out to be. The biggest benefit comes for under-resourced language pairs such as Finnish↔Greek that remained stubbornly out of reach for SMT systems.
These benefits are likely to be especially important in the European Union, which faces ongoing difficulty in providing access to its institutions through all of its 24 official and working languages. Covering all of them would require 288 bidirectional engines with SMT technology, but the European Commission does not have sufficient training data for most of these pairs. It had planned to create training data for some pairs and use pivot translation for others. Zero-shot systems should produce better results at a much lower cost and help the EU address its language-blocking problems.
Google’s development significantly raises the bar for machine translation. But Google needs to be careful not to characterize the system’s accomplishments in ways that may play well with the tech press but that ultimately oversell its “cool factor” while underselling what is truly disruptive.
About the Author