CSA Research

24May

Is GenAI Going to Replace NMT?

Listen to This Blog

It is incredible to think that, less than eight years after the first publicly available neural machine translation (NMT) systems appeared on the scene, some media coverage already sees NMT as so 2015. As generative AI (GenAI) really exploded into public view in 2022, it wasn’t surprising that an overactive tech press’ imagination would see it as the be-all and end-all of technology. Our recent survey with freelance linguists certainly reflects this view, with many language workers expressing negative sentiments about their future due to GenAI. But is it actually ready to replace NMT and language professionals? In this post, we will explore various ways to consider this question and provide a more nuanced understanding of what GenAI can – and cannot – do.

Before going on, we have to acknowledge that the rate of change in this field is stunning, and what is true in May 2023 may not be true by June 2023, much less May 2024, but based on the current state of the art, CSA Research maintains that GenAI is not yet a viable mainstream translation method, but it does offer some intriguing possibilities in other areas.

Also, it is important to note that GenAI is a diverse category, not a monolith. Some systems – such as ChatGPT – can translate, while others cannot, and yet others will sporadically translate but may go off on wild tangents: One system we tested translated a short text about a teacher for a few lines before it composed a poem in honor of teachers and then interjected some text on how to use JavaScript. For the purposes of this post, we will focus on those applications that consistently translate, using OpenAI’s state-of-the-art GPT-4 and ChatGPT as a representative of those that can.

How Well Does GenAI Translate?

The answer is, it depends. It depends on the language, on the subject matter, and sometimes on the phase of the moon or some other imponderable factor. In CSA Research’s tests, we found the following:

GenAI does well for common languages but struggles with others. It can deliver strikingly good results for language combinations such as English and German. But as you move to less common language pairs, the results can be wildly inconsistent. When we asked GPT-4 to identify the languages it could translate, it included some where the results were complete nonsense, not even usable for identifying the broad subject field of a text. For example, in the image shown below, we used a Biblical verse in Komi (the only text we could find in this language), but GPT-4 turned it into a statement about fishing.

fig01_poor-translati...

GenAI tolerates “noise” very well. We ran tests using a Hungarian text written with 16th-century orthography. This sort of text can be very difficult for a modern reader to decipher without some training, yet ChatGPT was able to return reasonable translations (albeit with a few errors), easily besting current neural MT systems. In other cases, we used texts with misspellings and grammatical errors, and it was able to return passable translations.

fig02_noisy-translat...

GenAI’s broader “attention windows” allow it to handle some things very well. Standard MT treats each sentence in a text in isolation. As a result, it struggles with translating things such as pronouns in gendered languages that may depend on a referent in another sentence. In our tests, GPT-4 excelled at getting these things right. This ability arises because it examines all of the sentences in a prompt simultaneously, allowing it to use information from one sentence to help with another.

Prompt engineering lets GenAI carry out “advanced” translation tasks. Want to translate using a particular level of formality, in a particular style, or targeting a specific reading level that differs from the source? This can be put in a prompt, such as “translate the following into German using the informal in a light tone and targeting a teenage audience with a fifth-grade reading level.” The results will be variable, but such tasks are time-consuming and expensive when human linguists are involved.

How Well Does GenAI Integrate with Localization Processes?

As of May 2023, integration is a weak point for most GenAI systems. Although several TMS and CAT tool developers have provided API access to ChatGPT and other systems, tools really are not set up to work with it. Most translation tools transmit sentences in isolation, which negates much of the advantage that GenAI provides. Changing this will require developers to engage in significant reengineering of data formats and how they integrate systems, and it may (finally) initiate a much-needed rethinking of technology approaches that date to the 1990s.

Even if all of that happens, the slow speed and high latency of GenAI limits its applicability. Most MT providers aim at latency of less than one second in their systems, but GPT-4 at times averages about one word per second in its output, which is too slow for on-demand use in CAT tools. We also found that many systems have problems with stopping in the middle of responses with no warning. In addition, content moderation systems that may be useful when dealing with public-facing chat interfaces can get in the way of translation where dealing with potentially problematic content may be required.

In addition, integrating language resources such as terminology can pose a challenge. GenAI is not built with translation or interpreting in mind. Although there are kludges to force GenAI to use specified terminology, this comes at the cost of careful prompt engineering, something that translation tools simply are not set up to do today. Although translation technology providers may add these capabilities, it will take time to do so, and GenAI itself represents a moving target, which will slow development.

How Does GenAI Compare on a Price Basis?

Different GenAI systems have different pricing models, but CSA Research has serious concerns about the viability and sustainability of current financial models. GenAI providers are currently running their systems at a financial loss in order to bring in customers and build demand that they expect to monetize in the future. Although precise per-word figures are difficult to determine, this strategy shows in our estimates of translation costs using ChatGPT and GPT4 at the published API prices: Chat GPT 3.5 Turbo’s pricing for translation is roughly 1/40 of the price of Google Translate, yet it consumes more energy. GPT-4 ranges from equal in price to about eight times as expensive as Google Translate, depending on the languages and model involved, but it also consumes far more energy. Unless OpenAI can massively improve system optimization – or overcome fundamental laws of physics – these prices will not be sustainable. LSPs or enterprises that build their content strategies on GenAI need to be aware that the financial models cannot last.

fig03_price-compare

Machine Translation Providers Are Not Sitting Still

GenAI offers some advantages not seen in previous generations of NMT, but MT developers are not sitting still. Already, we have seen developers – such as Microsoft and translated – start to increase the attention windows on their systems, which brings in many of the advantages of GenAI without the overhead. Other providers have developed capabilities to handle things such as formality levels (DeepL) or profanity filtering (Amazon). And, of course, MT developers have long excelled at allowing deep customization of their engines, which can result in optimized output for domains or specific companies.

The advent of GenAI has pushed MT development forward and we are now closer than ever to the reality of truly responsive MT that CSA Research predicted in 2021. One promising approach we have encountered is that MT providers may build customizations on top of optimized GenAI applications, thereby gaining the best of both worlds, allowing queries to modify and control the MT output, while still allowing training and delivering much leaner – and therefore faster and less power-hungry – systems.

The Real Power from GenAI Comes from Tasks Other Than Translation

GenAI may not be ready yet as a primetime translation solution, but that is far from meaning it is not valuable. Most LSPs can benefit from using it for a variety of sales, marketing, and operational tasks, as well as to assist with one-off linguistic needs such as rewriting content. For example, LSP sales staff can use GenAI to help them draft sales pitches or help with marketing by building customer personas. The key challenge is in developing best practices for prompt engineering to get the desired outcomes and then in adapting those results for specific needs.

Similarly, freelance linguists can use GenAI for research tasks, assistance in building terminology resources, or helping draft content for their web and social media presences. This is especially helpful if their native language is not English because editing GenAI output is typically much easier than drafting content in English.

The Net: Gain Experience with GenAI and Keep an Eye on the Future

GenAI can be tremendously useful, but don’t look for it to revolutionize MT or replace human linguists. Instead, treat it as a useful resource that needs to be checked and monitored carefully. It can help speed up processes, but – for now at least – it is not going to reach “human parity” or remove the need for experts in the core of processes. If you are a language professional, try out GenAI and gain experience with it. Decide how you will use it and where your value will lie. Become a specialist in fields where GenAI cannot take your place. Although we cannot say that it will not take some work from professionals, it will also create new opportunities for them.