Neural MT Leaves the Lab for Beta and Third-Party Assessment
SYSTRAN announced the beta test of its Pure Neural Machine Translation (PNMT) software with 30 language pairs (18 with English, 12 with French), a dozen corporate clients in diverse industries, and online public access to its software. This beta program caps a year of industry and media attention to deep learning, artificial intelligence, and more recently neural MT (NMT). SYSTRAN announced its PNMT product in August with this October beta. Google revealed its single-pair NMT solution in late September, to widespread mainstream media coverage and some dashes of cold water in assessments from California to China. Other companies such as Baidu, Facebook, Microsoft, and SDL have offered their own pronouncements, research, and solutions in the current round of machine translation evolution and revolution.
Last month CEO Jean Senellart briefed us at our office in Cambridge, MA where we discussed a variety of topics including SYSTRAN's business, its NMT technology, output assessment, open source, its current product set, and trust.
- The business. In mid-2014 Korea-based CSLi announced that it would acquire SYSTRAN, long the staid doyenne of the MT sector. The merged entity adopted the SYSTRAN name, and the nexus of development remained at the company’s Paris headquarters. Senellart told us that that the acquisition reinvigorated the European unit, which revamped its marketing, operations, sales, and development organizations. Post-merger SYSTRAN is visibly energized on the technology front. He told us of major wins at Continental Corporation, Hewlett Packard Enterprise, PwC, and Xerox Litigation Services. Business for 2016 is up 20% over last year, and the company now generates 25% of its revenue from e-discovery applications and governance applications.
- Neural technology. SYSTRAN is navigating the delicate balance between deep-learning hype and usable product. Its beta program will get the technology into the hands of MT-savvy companies with experiences ranging from improving post-edited output to the broader transformative potential of the technology. Senellart acknowledged NMT's performance challenges such as tasks that take about 10 times longer than statistical MT – thus causing real-time delays that won’t be competitive in some applications. While today's NMT may be slow and resource-intensive, CSA Research sees this as only the start of the development curve. Software optimizations and improvements in techniques should eliminate this performance penalty over time, as it has in other computation- and memory-intensive applications.
- Third-party assessment. While neural MT has delivered on its promise in the lab, it’s now time to prove its value in the field with objective assessments. SYSTRAN has engaged a third-party expert, CrossLang in Belgium, to oversee a comprehensive program to assess PNMT's performance. It will release the results of the evaluation on December 12th in Paris. Public scrutiny of its results should eliminate the gaming that we often see in self-assessments where developers choose the test that shows their performance in the most positive light, while they ignore the others. Most MT assessments do not correlate with usefulness or human understanding, which we hope to see CrossLang address in its program.
- Open source. SYSTRAN is contributing to an open-source project on NMT with Harvard University. He told us that, “All the code regarding our neural MT engine will be open source, including the core technology but also our algorithms and some additional software that we developed for integration. The product itself remains proprietary.” As it has in other IT sectors, open-sourcing can accelerate the development of complex technologies.
- Current technology. Should current users of SYSTRAN's products be worried? Senellart said the company will continue selling and supporting its current hybrid (rules-based and statistical) MT software. Some customizations such as user dictionaries, filters, and workflow cannot be easily changed or migrated to the PNMT product. SYSTRAN will work on a transition plan over the next six months.
The last point we discussed was whether information publishers and consumers can trust neural MT output. Senellart said that SYSTRAN’s blind tests with human evaluators in the technology domain have shown better results than human translators, but that he’s anxious to prove that the company’s NMT is better and more trustworthy than today’s dominant statistical MT. We ended our conversation with a corollary of Asimov’s first law of robotics – that MT should cause no harm.
The bottom line: CSA Research contends that NMT will work its way slowly into the technology stack, benefiting from mainstream investment in artificial intelligence and deep learning in much the same way that statistical MT (SMT) improved from widespread use of big data. Looking past the hype, we see promise in this emerging technology.
About the Author