CSA Research

22Jan

Small AI Is Beautiful – Lots of Data Complements Buckets of Money

A reporter at a major business magazine recently asked CSA Research, “Which of the mega-tech companies won the AI war? Which of them will likely prevail in the battle over the next 10 years?” Our answer was that their users were the real victors – and those users typically run apps from Amazon, Google, and Microsoft.

Our first reaction was that this question will seem hopelessly quaint very soon, much like asking in 1983 “who won the PC wars?” or in the early 1990s declaring Sears the winner of American retail sales. Stuff continues to happen.

It’s the same in artificial intelligence and those enormous platform companies. In the multi-supplier world of high tech and social media, users of their products regularly and necessarily touch a different part of the AI innovation provided by competing suppliers. And the companies themselves are the first runners-up in the AI wars – a competitive market compels them to keep investing and innovating not only in the same app categories, but everywhere.

The world of artificial intelligence is bigger than this platform battle royale. Much of today’s AI innovation is centered on making computing and communications faster, cheaper, and more accessible – for example, by automating processes that a machine could do more efficiently. That often requires evaluating language in its original form, transforming it for use in other languages or channels, and analyzing what it means. These requirements led to deep analytics and widespread development of machine translation. In fact, the role of natural language processing (NLP) was at the core of the earliest research into machine translation as far back as the 1930s, continuing through the decades with the Turing Test, Chomsky’s syntactic structures, Woods’ augmented transition networks, and dozens of other influential experiments and innovations.

It Takes Money and Data to Build AI

Of course there’s more to artificial intelligence than these enormous technology providers. Success relies on an entire ecosystem of developers, practitioners, users, and sometimes victims as they create value by consuming and exploiting resources. AI is no different – success comes from leveraging two assets, free cash flow and enormous amounts of data:

The first asset, free cash flow (FCF), is the money that a company generates after paying to support operations and maintain assets. FCF provides capital for investment in areas of near-term or future opportunity (or simply to disburse in executive compensation or to shareholders as dividends or stock buybacks). Of course, this is cash that in larger enterprises has to be spread across multiple projects, initiatives, spending packages, and other business needs. The amount of free cash flow can be staggering – in 2018, the last full year for which data is available, FCF at Microsoft pushed past US$32 billion while Google parent Alphabet yielded nearly US$23 billion. By way of contrast with publicly traded language technology (langtech) developers and LSPs, SDL returned nearly £32 million in FCF that year, while RWS (not a langtech company) had access to almost £35 million. It can be misleading to compare these smaller companies with the mega-tech firms, but the disparity of the amounts says a lot about how and what the larger companies can do with all that money.
The second asset is more democratic in a way − enterprises that process enormous amounts of data benefit from what CSA Research labels a “rich and reliable data flow.” This rich data flow is the legacy of big-data initiatives that began in the 1990s, along with the growing and relatively cheap availability of CPUs-on-demand starting in 2006 with AWS and Azure. While the mega-tech companies have leveraged this data into MT and speech platforms, AI bots, and other innovations, the langtech vendors have focused on using the data that passes through their systems to create many optimizations on the source and target content they process. They want to use this data in aggregate along with machine learning to lessen the cognitive load on linguists and project managers, letting them instead concentrate on higher value and hopefully more meaningful tasks.

What AI Means to the Language Crowd

Smaller langtech companies don’t have a lot of free cash flow, but they do have that rich data flow. While the mega-platforms have billions of dollars in free cash flow to the millions that langtech vendors do, the langtech ISVs that have paid attention to data collection, structure, curation, and analysis (“Bridging the Multilingual Training Divide”) have massive and leverageable amounts of data that they can use to inform, enrich, optimize, and otherwise improve interactions. The most perspicacious among them have assiduously collected and curated data even when there was no immediate or apparent need, sure that someday this data might have value. They were right.

Any such enhancements don’t go unnoticed by users. They begin to expect that every tool they use will offer similar capabilities. They will look for them wherever they go, both in the same and different apps (for example, AI that they find in Microsoft Office will set their expectations for the same functions in Google Docs, and vice versa). They will likely look for those same advances in NLP in other software as well – such as CAT tools, translation management systems, terminology databases, and language quality checkers. For example, predictive type-ahead search in a browser was the model for look-ahead adaptive MT translation tools from Lilt and SDL.

The bottom line for smaller langtech developers is that money is short, but data is large. Creative application of the mega-reams of multilingual content flowing through their software, if monitored and analyzed, provides the foundation for their frequently less flashy but nonetheless important innovations.

Small Langtech AI in Practice

As part of our research into this smaller form of AI, we contacted leaders at several language software vendors to learn what they’re doing with the massive amounts of data that pass through their systems. In a report on small AI we will expand on their experience and those of other langtech vendors and language service providers with similarly acquisitive data systems.

Hideo Yanagi, Founder and CEO: “Cistate shows how companies that use even off-the-shelf components benefit from crafting utilities to address persistent problems with MT. The company has built a suite of small applications to address everything from expanding abbreviations to correcting punctuation to bridge the last gap between Google Translation and what customers need.”

Ivan Smolnikov, Founder and CEO: “Smartcat started with the goal of reducing waste in project management and uses AI to preemptively address delivery and production problems, match content to the best linguists, and eliminate non-productive overhead for translation managers and translators, all in a free-to-use system.”

Jack Welde, Founder and CEO: “Smartling sees the role of AI as eliminating extraneous human work, for example by reducing or eliminating clicks. Even something as basic as using machine learning to automatically identify file types improves the client experience and reduces errors. Similarly, we have a tool that identifies the grammatical gender of strings for translation, with higher accuracy rates than humans, and automatically tags the strings accordingly: This speeds up translation, reduces rework rates, and – most critically – saves a human from having to do ‘scut work.’ The savings from each individual service may be small, but their cumulative impact is enormous."

Andrzej Zydroń, CTO: “XTM has been investing in small AI. These applications focus on relatively discrete tasks such as terminology extraction, improved corpus alignment, better tag handling, and adaptive MT and post-editing. Taken individually, none of these are revolutionary, but they combine to make translation far more efficient.”

José Vega, CEO and co-founder: “At Wordbee I question the applicability of the term ‘AI’ to most of what is going on in the industry. Our work focuses on automating or speeding up as much of the management process as we can, with an emphasis on the financial aspects that consume an inordinate amount of linguists’ time and that are a major source of dissatisfaction for all parties.”

Small Is Beautiful

The mega-tech companies provide platforms, APIs, education, cloud servers, funding for innovation, and a community of buyers and users. Leveraging that base and their own data repositories and analytics, the smaller langtech companies are standing on the shoulders of these giants and adding additional value. This network effect produces a virtuous cycle for artificial intelligence – as more AI technology gets developed, deployed, evolved, rethought, renovated, optimized, and more widespread, its value increases. Who wins? The user.