Microsoft Custom Translator: A Big Step Forward

18Nov

Microsoft Custom Translator: A Big Step Forward

Most technologies follow a typical S-curve as they develop. They start off making very slow improvements as developers struggle with fundamental challenges and figure out new ways of doing things. However, at some point, their capability starts to increase exponentially at an astonishing level, with frequent announcements. At this point, it is easy for developers to extrapolate their gains to truly lofty heights, but sooner or later, this growth rate slows down as all of the easy gains are achieved and implementers switch to finding tweaks and marginal improvements. Eventually the technology matures and reaches a plateau. There is little significant improvement until someone comes along with a major disruptive approach that begins another S-curve.

Neural machine translation (NMT) has followed this pattern so far. In late 2015 most developers still treated it as a future technology that would come someday, but by the middle of 2016 several developers had released functional neural systems that seemed to outperform even the best statistical MT (SMT) engines. Since that point, momentum has shifted decisively in favor of NMT because it delivered better fluency in its output and seemed to promise the end of incomprehensible MT output. Various tech vendors loudly proclaimed that their systems were even outperforming human translators – although the fine print in their studies made it clear those claims weren’t quite what they seemed. However, this rate of stellar progress has slowed lately, with the latest improvements more incremental than disruptive.

For this reason, we were encouraged to see an announcement with truly stunning improvements from Microsoft in its Custom Translator offering. Aimed at companies that can provide their own training data, the new version solves some fundamental challenges in improving NMT performance and output quality for customized engines:

The time to train a new engine coupled with improved quality. The new version results in dramatic improvements in BLEU scores. Although CSA Research is skeptical of the value of small gains in BLEU scores (that is, anything less than four points), Microsoft reports significantly higher increases in many domains (see below). The breakthrough that enabled these gains was a new system that allows the company’s engines to add incremental training on top of an existing engine without the need to retrain it entirely. It reduces the time for full custom NMT training from several days to a few hours and thus makes it more affordable. Previously, the company handled custom training by using a separate system on top of the NMT layer that functioned differently – and much more slowly.

microsoftfigure

The size of the training dataset. The significance of what Microsoft has reported is that it shows that large-scale improvements are still possible, even with relatively limited datasets (tens of thousands of sentences rather than millions). Because most companies have limited training data – at best a few hundred thousand sentences rather than the ideal millions required for nuanced customization – this enhancement increases the pool of organizations that could benefit from an NMT boost to its translation processing.

Expect to see other providers moving swiftly to match what Microsoft has done: The gains are so significant in practical terms that others will be sure to follow. It is not often that a production system sees such dramatic improvements between versions. And Microsoft has made it effectively free for most of its clients to upgrade their systems, which ups the ante for others to deliver similar service.

Looking forward, we know of several MT software vendors that are working on full-document context systems – systems that use entire documents at a time, rather than focusing on single sentences. This is a development that has the potential to drive even greater quality gains, if the companies building these systems can solve the exponential increases in computing power the approach currently requires. Although it will likely be some months or years before these systems leave the laboratory, the improvement in contextual relevance and quality they promise to yield is likely to drive another S-curve in this space.

Microsoft’s announcement is likely to matter most to enterprise users of the technology and those specialist LSPs that train engines on their behalf. Those that rely on generic engines will not see benefit from this, so this highlights the importance for providers to invest in deep domain knowledge and build their data-management capabilities (“The Future of Language Services”). The capabilities will also benefit providers who work in an augmented translation paradigm, by delivering more relevant and correct MT to linguists. It also shows how fresh takes on even established technology can deliver striking gains.