What Does Talking to Whales Tell Us about Machine Translation?
Recent advances in machine translation (MT) have been truly astounding. Even if claims that the technology has reached parity with human translators are premature and an artifact of particular methods of evaluation, it is clear that the results are much more useful than they were even a few years ago.
Whether integrated with human linguists in augmented translation workflows or providing unedited translation on-demand, MT is everywhere. Results that seemed like science fiction twenty years ago are now ubiquitous and, as our research has shown, they deliver real benefit to people for whom the alternative to MT is not human translation but zero translation.
But now researchers are pursuing approaches similar to MT to facilitate communication with animals. In April, National Geographic described Project CETI, an attempt to use the same fundamental technologies as those behind MT to create translation between human language and the clicks uttered by sperm whales. With the largest brains in the animal kingdom and complex social structures, sperm whales are a good target for communication, but the effort is hampered by lack of any parallel corpora in whale and a human language, not to mention the possibility that whales might have dozens or hundreds of languages and dialects.
The challenge that these researchers face is akin to what human translators would face if they received a packet with a few thousand pages of text in an unknown tongue and had to derive a way to converse with whoever wrote it with no way to interact or ask questions. Such problems generally remain intractable without some sort of Rosetta Stone, leading to challenges such as the infamous Voynich manuscript, which may or may not even encode an actual language.
However, to communicate with a whale, the challenge is manifestly harder. Without a conception of what a whale’s internal world might be like, what concepts it does or does not have, and no way to start the conversation – or even indicate that we want to start a conversation – these efforts will face an uphill battle. Despite many claims in the mainstream tech press, current-generation machine learning does not work like the human brain and does not understand what it is analyzing. As Dr. Aljoscha Burchard of the German Research Center for Artificial Intelligence put it in the German press, “Machine translation is a parrot: a clever parrot, but still a parrot.” In other words, it can only repeat the sorts of associations it observes between text in one language and that in another.
For the case of understanding sperm whales, this limitation may prove insurmountable. On the other hand, researchers hope that by observing enough whale utterances (called “codas”) and tracking the animals’ behavior, they may deduce enough to associate certain portions of codas with concrete behaviors. And if they can get far enough with it, they may be able to frame an invitation to sit down to a nice chat over tea and giant squid. Or it may prove to be that what a whale ponders and what a human thinks will be too different to bridge with any form of machine learning or machine translation.
In more practical terms, however, this difficulty also tells us why MT is not going to displace human linguists any time soon. Because MT can only repeat what it finds in its training data, it cannot learn by itself and requires a constant influx of observations of humans engaged in multilingual communicative acts. If MT were to attempt to translate from Kurdish to Kiswahili without training data, it would face the same challenge that Project CETI faces. Because MT cannot understand what it takes in or spits out, it is always at risk of getting things wrong when they do not fit what it has previously seen well enough.
At CSA Research, we coined the term “augmented translation” to describe the close union of human and machine in a mutually beneficial symbiosis that leads to better outcomes for both human and machine. This approach requires a ubiquitous assemblage of technologies that work together to deliver just-in-time information to linguists and that in turn learn from them in real time. Maybe eventually we will see the equivalent with human divers entering the water with special equipment that lets them talk to the whales. But until then, we will have to content ourselves with making human translators more efficient through the use of machine learning.
About the Author