X

Our Analysts' Insights

23Oct

Artificial Languages and the Dream of Universal Knowledge

What do Barsoomian, Esperanto, Klingon, Ku, Na’vi, and Tenctonese have in common? They’re all languages created for sci-fi films, except for Esperanto, whose developer, Ludwig Zamenhof, sought to create a universal means of communication. They all also represent a human desire to explore or undo the effects of Babel. Although they seem to be a thoroughly modern project, they are actually part of a scholarly project that has been underway since at least the Middle Ages and has spawned a whole series of “a priori philosophical languages.” These artificial tongues aspired to directly mirror the nature of reality in their structure. The most ambitious creators of these schemes sought to create linguistic forms where the meaning of a word would be directly understandable from its form and sounds, even if a speaker did not already know the word. For a masterful account of such creations, see Umberto Eco’s The Search for the Perfect Language, from which the examples below are taken.

For example, Cave Beck’s 1657 The Universal Character, the sentence leb toreónfo pee tofosénsen and pif tofosénsen would unambiguously convey the Biblical commandment “honor your father and your mother,” with the individual words and pieces of words essentially serving as a lookup table for a universal dictionary of concepts not tied to any natural language. Another proposal, presented in George Dalgarno’s 1661 Ars signorum, contained “words” such as Neik (“quadruped animals”) and NeiPTeik (“warm-blooded animals”), Nƞk/pot (“horse = animal with an uncleft hoof-courageous”), and Nƞk/sof/pad (“mule = animal with an uncleft hoof-deprived-sex”), all of which are distinguished by the meaning of individual letters.

Although none of these creations were usable in practical terms – and may seem laughably naïve today – their creators put them forward as a way to eliminate ambiguity and promote world peace. In an era of nearly constant war, they believed that if everyone just understood and agreed upon the nature of reality and how to discuss it, there would be no need for conflict. This train of thought inspired countless imitators, some of whom worked to create actual usable languages, such as Volapük (the bane of companies that simply dump ISO language lists into their software) and Esperanto, which recently received a high-profile shot in the arm due to U.S. President Trump’s misspelling of his defense secretary, Mike Esper’s, name in a Tweet.

Even outside the realm of “let’s give peace a chance” projects such as Esperanto, the intellectual children of these efforts live on in the structure of library call numbers, the Standard Industrial Classification (SIC) codes used to classify vertical industries, bar codes, and almost any information cataloguing and retrieval activity. Even computer programming languages show the influence of these languages. Most of these derivatives have given up on the idea of universal knowledge classification to focus instead on narrower applications that they can realistically manage.

Picture1

However, the development of big data, web-scale architecture, and automated content enrichment is now reviving dreams of universal knowledge and understanding. Machines that can process large amounts of text and successfully disambiguate it can then link words – theoretically in any language – to authoritative references about concepts. Known as automated content enrichment (ACE), this technology allows machines to “understand” content, deliver more relevant results to readers, and act upon the meaning of the text. Artificial intelligence promises to usher in an era when intent and meaning can be conveyed accurately, although human language consistently refuses to cooperate with these plans.

Unfortunately, far too many of the resources that are meant to liberate content from the shackles of its superficial written form work only in English or Chinese, the main languages of academic research and target of most funding. As content creators move beyond a handful of major languages, they find that the resources they need to add intelligence-driven features for customers are few and far between, if they even exist at all. Standard formats, such as schema.org or Dublin Core, rely on the surface form of words and so cannot provide intelligence across language boundaries. One of the next great challenges for researchers and the language industry – on par with going to the moon or Mars, if not more difficult – will be to extend such resources to cover the hundreds or thousands of tongues which would benefit from them.

As we celebrate the dramatic advances of AI into areas we could scarcely conceive of a few years ago, it is useful to look back at the past to consider why the artificial languages of the past failed to live up to their promises and find ways to make sure that today’s ACE- and machine learning-driven approaches can avoid the same problems and deliver benefit to the entire world. Chatbots and personal virtual assistants may not create world peace, but they can deliver real benefit. LSPs will play a major role in providing the insight and data needed to realize these goals. If organizations succeed, they may finally realize the dreams of countless scholars from almost a thousand years of world history.

About the Author

Arle  Lommel

Arle Lommel

Senior Analyst

Focuses on language technology, artificial intelligence, translation quality, and overall economic factors impacting globalization

Related

TMS Is Dead. Long Live TMS.

TMS Is Dead. Long Live TMS.

A recent examination of how computing power has changed over time calculated how much it would cost ...

Read More >
How Three Companies Strengthened Software Development Efforts During COVID-19

How Three Companies Strengthened Software Development Efforts During COVID-19

In our early-in-the-pandemic call for action by company leaders, CSA Research recommended that compa...

Read More >
The Multilingual Conversation Challenge, Part 2: Logistical Challenges

The Multilingual Conversation Challenge, Part 2: Logistical Challenges

In our last piece on this topic, we examined some of the technology challenges that stymy enterprise...

Read More >
Small AI Is Beautiful – Lots of Data Complements Buckets of Money

Small AI Is Beautiful – Lots of Data Complements Buckets of Money

A reporter at a major business magazine recently asked CSA Research, “Which of the mega-tech compan...

Read More >
Don’t Let Searches for Support Content Fade Away

Don’t Let Searches for Support Content Fade Away

Are you leaving customers in the lurch when it comes to discovering the most relevant entries when a...

Read More >
When DNA Includes AI, ML, and MT

When DNA Includes AI, ML, and MT

An extremely popular gift for the holiday season is the family history DNA testing kit. Vendors such...

Read More >

Subscribe

Name

Categories

Follow Us on Twitter