The Coming Content Cataclysm - Our Analysts' Insights
X

Our Analysts' Insights

Blogs & Events / Blog

Listen to This Blog
 

 

A friend asked, “what will happen when the cloud gets filled up and can’t hold any more videos?” Given the cloud metaphor, that’s not an illogical question – when stratocumulus clouds fill with water, we get rain or snow. As new technologies hoover up ever more data and content – numbers, text, images, audio, video – and suppliers sell unlimited cloud storage, you might wonder when it will start raining bits.

In any case, we’re not running out of data storage now but it’s clear that we’ll need ever more energy-intensive data centers to keep pace with the annual doubling of digital information (“The Calculus of Translation”). And if current data flows weren’t enough, new streams of content are gushing into data centers in all the world’s digital languages – and are likely to be joined by a variety of transcribed variants in some of the other several thousand non-digital languages.

CSA Research is tracking three content gushers:

  • A flood of non-textual data. Multimedia content and especially video is expanding dramatically, whether recorded or live-streamed, planned or ad-hoc. Voice recognition and synthetic voice solutions, aided by AI and machine learning, are improving rapidly and being embraced by buyers and some of their services and tech vendors alike. The original language for some of this business- and entertainment-oriented content won’t reach enough of your global market, so localization for dozens of languages has the potential to increase the amount of multimedia content by orders of magnitude (“Commercial Language Solutions for Multimedia Companies”). If it’s English only, you’ll leave 47% of your TAM to your competitors

jan3

  • A surge of transcribed speech. Transcription is the process of converting audio from conversations or recordings of speech to written text. Generating this record of oral text makes information more permanent (documenting testimonies or interviews), searchable (enabling broadcasters’ and podcasters’ content to be found), and easier to process (captioning or translating movies and videos) (“TechStack: Automated Transcription Solutions”). If it’s important or newsworthy enough, transcribed speech will find its way into multiple languages. 
  • A wave of automata-generated content. Large data models are rolling out with promises that they can “summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.” While CSA Research questions the efficacy of such models, their output will only add to the content deluge as it raises concerns about the trustworthiness of digitized data sources. The problem compounds as machine-generated content finds its way into repositories, is viewed as truth, is translated or otherwise transformed, or becomes the basis for further data models.

The Trust Challenge Posed by Oceans of Content

These three additions will further stretch the seams of repositories already bursting with a ballooning body of textual and structured content. But it’s not just size that matters. Having spent a good part of my life working for database management ISVs, I’ve internalized several fundamental concepts about data. Confidence in the data heads the list.

  • Enterprises learned lessons about managing Big Data and instilling trust… What that means is that the AWS Aurora, IBM DB2, Microsoft SQL Server, Oracle, and SAP HANA database management systems that underpin the operations of many organizations are called “the system of record,” the authoritative source for operations and decisions. Using these DBMSes, IT departments formalize how they process, manage, transform, deprecate, monitor, and perform a variety of other business-critical activities to keep that data trusted by its users. 
  • …but these learnings have yet to be applied to “unstructured” content. Most textual, non-textual, transcribed, and automata-generated content resides in repositories less formally structured or formal. Many such storehouses draw on a mélange of multiple sources, in many cases no longer traceable to their origin. Their shortcomings begin with the source and continue through a series of transformations such as summarizations, excerpting, repurposing, and, of course, translation. And all of this occurs against the usual backdrop of budgets more focused on creating and managing the original source, leaving only crumbs for required transformations. We observe this mismatch of resources in the content life cycle at many organizations.

It’s Time to Plan for Ever More Trusted Multilingual Content

The realities and requirements of all that source content, its sundry mutations including translation, and the lack of business alignment between the two guided us in writing our annual look-ahead for the language market, “Seven Trends for Globalization in 2023.” The language services and technology sector earned US$52 billion in 2022, growing to a projected US$65 billion by 2026. 

  • Language underpins global communication, commerce, and community. We prefaced our analysis of trends with the observation that the business-critical language sector is subject to the same enduring facts of life that affect any commercial enterprise. Those include a full array of macroeconomic forces such as inflation and war, global competition that requires local differentiation, consolidation through merger and acquisition, and price pressure, among others. VUCA – volatility, uncertainty, complexity, and ambiguity – continue to influence behaviors (“Weakened Confidence among LSPs”). 
  • The language sector struggles to connect internally or with external sources. In my DBMS days I was one of the founders of a startup that did pioneering work across multi-vendor networks. Our focus on system heterogeneity introduced me to the concept of “bits is bits.” That is, no matter which operating system or application manages it, the language they’re in, the format – none of those attributes should matter to anyone managing the bits. That’s been the driving force behind CSA Research’s long-term, seemingly quixotic demand for universal source and multilingual connectivity. Within an enterprise, it’s doubly important due to the potential for sharing multilingual content assets across internal and external groups and partners for product development, branding, support, HR, and other business-critical functions. 
  • People, process, and technology must be harnessed to power global content. The lack of heterogeneity and connectivity for source, derived, translated, adaptive, responsive, and otherwise transformed content results in a sloppy assortment of siloed solutions. In our research over the last 20 years, we’ve recommended solutions that we’ve called enterprise content management, business globalization, translation, and localization. But what’s in a name? In our experience, a content strategy called “localization” doesn’t sell as well as LangOps, so we’ve adopted this shorthand term derived from the accepted combination of IT and software development popularly known as DevOps. However, LangOps requires a change in mentality to bring language into everything. If it simply replaces “localization” – with its narrow focus and outlook – it will be just another meaningless bit of jargon. 
  • Carbon-based life forms are key to a unified content strategy. LangOps produces, manages, transforms, and delivers content for communication, commerce, and community. Humans sit at the core of processes developing and refining content, ideally in a position to give feedback to consumers, customers, or constituents when it doesn’t resonate or feels false. With humans occupying that key position, LangOps must address the same sustainability issues as other industries. Expect to hear the call for accessibility, corporate social responsibility (CSR), environmental social governance (ESG), diversity-equity-inclusion (DEI), and related terms like content moderation, fair trade translation, and trustworthy source as humans participate at the intersection of many content uses.

Conclusion

Why do we think these long-term content, connectivity, and operational issues will begin to get more attention in the next few years? The materials and means to do so have fallen into place over the last decade. We are in the center of a perfect storm that has inundated the world with enormous amounts of data and content, algorithms that are beginning to make sense (and nonsense) of it, and enough computing power with increasingly affordable GPUs to process it.

As organizations organize and leverage these three assets across their applications, CSA Research views them as seeding a future Cambrian explosion of creativity and innovation derived from deep learning, optimizations, financial and social profit, and as yet unanticipated epistemic benefits from their collective multilingual source and transformed content. And, of course, transformations linguistic and otherwise will expand the total available market into the 53% where English just doesn’t cut it.

It won’t happen in 2023, but with the right vision and investment starting sooner rather than later, we can hope to experience those future advances instead of being mired in the pundits’ dystopia of translators and proofreaders working as janitors who edit sketchy MT output or muck out automata-generated logorrhea. Per aspera ad astra!

Join Us for a Discussion

Want to hear more about these topics? Join me, Arle Lommel, and Alison Toon for a discussion of these topics on January 17th. Register here: https://csa-research.zoom.us/webinar/register/WN_o-3kgI2oSkKl62CNOWDxOg

Photo credit: Jimmy Chan

About the Author

Donald A. DePalma

Donald A. DePalma

Chief Research Officer

Focuses on market trends, business models, and business strategy

Related

When No Doesn’t Mean “No”

When No Doesn’t Mean “No”

The word No seems to have been going around a lot in conversations lately. From discussions (welfare...

Read More >
“Misusing” Large Language Models and the Future of MT

“Misusing” Large Language Models and the Future of MT

Large language models have been in the news a lot in November and December and the coverage has been...

Read More >
Don’t Let Your Children Grow Up to Be Content Moderators

Don’t Let Your Children Grow Up to Be Content Moderators

Would you encourage a colleague, a friend, or your children to work in a profession that is extremel...

Read More >
How Important Will Language Be in Web3 and the Metaverse? Same As It Ever Was.

How Important Will Language Be in Web3 and the Metaverse? Same As It Ever Was.

Done right, website localization involves extending brand voice and all its attributes to leverage c...

Read More >
What Separates Language from Accessibility and Responsibility?

What Separates Language from Accessibility and Responsibility?

All companies have many regulations and business requirements to comply with today – plus additiona...

Read More >
Let Data Save Your Budget

Let Data Save Your Budget

It’s planning time once again, but this task is especially fraught this year as companies are facin...

Read More >

Subscribe

Name

Categories

Follow Us on Twitter