Responsive Machine Translation: The Next Frontier for MT
CSA Research’s recent survey-based examinations of machine translation deployment at language service providers, enterprises, government agencies, and among freelancers revealed an ever-widening engagement with the technology. Although it didn’t surprise us, we also found widespread skepticism of claims that MT has reached human parity with numerous calls in open-ended survey comments for “truth in advertising.” Just as significantly, we saw widespread desire for MT to be more suitable for the use cases in which it finds itself plus a call for more guidance about when and how to use it. These perceptions of a technology that is at once over- and under-sold are a consequence of the very real improvements it has made in recent years.
In our conversations, we uncovered three trends that will drive the next act for machine translation:
- Increased adoption of MT as a platform service within other applications. This shift means that machine translation must serve a growing number of use cases servicing ever larger and more varied audiences.
- The shift to context-driven MT. Although most developers think of context as being about working with larger chunks of the text (such as paragraphs, pages, or whole documents), our analysis shows that the ability to address multiple kinds of context will lead to radical improvements in machine translation.
- The emergence of metadata-aware MT. Today most machine translation engines consider very little metadata in their training, but in the future, MT will be able to account for everything from the gender, age, or location of speakers or authors to the formality and register of text or the specific product lines it applies to. It will do this without needing domain-trained or product-trained engines, which are comparatively crude by comparison.
Taken together, these trends point to a future in which machine translation can respond intelligently to stakeholder requirements at multiple levels and deliver the best possible output for given contexts. The next step forward – we call it “responsive machine translation” – builds on the history of MT, including augmented translation (which CSA Research defined in 2016), but goes beyond to create something that is applicable in many more areas.
What Characterizes Responsive MT?
This new approach uses multiple levels and types of context and metadata to:
- Automatically adapt to domains and text types at the segment level. Rather than relying on document-level features and the selection of a single engine for a document, every segment can leverage the best and most relevant training data for it. A short legal passage in a marketing text can be machine-translated using legal training data and a technical note can be rendered appropriately even if it appears in an annual report.
- Consider context beyond the segment. Current development efforts at addressing context have focused on only one kind of context – what occurs before or after a segment. However, responsive MT will use a wide variety of context types encoded in metadata, such as information about who (or what) has created text, what kind of document it occurs in, the formality of the text, and many other features to adjust on the fly and select the most relevant training data and provide the best result.
- Adjust itself in response to user or consumer feedback. Unlike current one-size-fits-all MT, responsive MT incorporates the capabilities of adaptive neural TM to learn over time. But it goes further to integrate various sources of relevant feedback in order to deliver optimal results.
- Incorporate user-supplied resources without a full retraining cycle. Similarly, responsive MT is able to incorporate new translation memory or terminology materials without the need for full retraining. Integrating these materials ensures that engines are up-to-date and provide relevant results without the need to rebuild engines.
- Meet other stakeholder requirements for applicability and usability. Responsive MT will assess its own usability. In cases where the results do not meet usefulness and serviceability requirements as defined by measures such as MQM or a company’s own guidelines, it would flag that output for attention and cleanup by a professional linguist.
These advances require MT software developers to build in capabilities to ingest and apply metadata within training data and analyze incoming content to apply it as well. These advances will elevate MT beyond the current generation of domain- or company-trained engines that are fit only for narrow purposes toward general-purpose solutions that can be applied more broadly because they can deliver on the disparate functionality of many engines at once.
The advantages of these approaches will be MT that is both more fit-for-purpose and suitable for more applications. For LSPs and linguists, it will mean better input for augmented translation workflows. That improvement will make work simpler for professional translators and free them up to focus on the more interesting and challenging aspects of their jobs.
Although no systems yet meet the requirements for responsive MT, many of the components are available in individual systems or are under active development in research institutions. Taken together, they will deliver better and more useful output and lead MT into its next frontier.
About the Author