Home > Blogs & Events > Blogs

The Coming Content Cataclysm

CSA Research January 4, 2023

0:00

Listen to This Blog

A friend asked, “what will happen when the cloud gets filled up and can’t hold any more videos?” Given the cloud metaphor, that’s not an illogical question – when stratocumulus clouds fill with water, we get rain or snow. As new technologies hoover up ever more data and content – numbers, text, images, audio, video – and suppliers sell unlimited cloud storage, you might wonder when it will start raining bits.

In any case, we’re not running out of data storage now but it’s clear that we’ll need ever more energy-intensive data centers to keep pace with the annual doubling of digital information (“The Calculus of Translation”). And if current data flows weren’t enough, new streams of content are gushing into data centers in all the world’s digital languages – and are likely to be joined by a variety of transcribed variants in some of the other several thousand non-digital languages.

CSA Research is tracking three content gushers:

A flood of non-textual data. Multimedia content and especially video is expanding dramatically, whether recorded or live-streamed, planned or ad-hoc. Voice recognition and synthetic voice solutions, aided by AI and machine learning, are improving rapidly and being embraced by buyers and some of their services and tech vendors alike. The original language for some of this business- and entertainment-oriented content won’t reach enough of your global market, so localization for dozens of languages has the potential to increase the amount of multimedia content by orders of magnitude (“Commercial Language Solutions for Multimedia Companies”). If it’s English only, you’ll leave 47% of your TAM to your competitors.

A surge of transcribed speech. Transcription is the process of converting audio from conversations or recordings of speech to written text. Generating this record of oral text makes information more permanent (documenting testimonies or interviews), searchable (enabling broadcasters’ and podcasters’ content to be found), and easier to process (captioning or translating movies and videos) (“TechStack: Automated Transcription Solutions”). If it’s important or newsworthy enough, transcribed speech will find its way into multiple languages.

A wave of automata-generated content. Large data models are rolling out with promises that they can “summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.” While CSA Research questions the efficacy of such models, their output will only add to the content deluge as it raises concerns about the trustworthiness of digitized data sources. The problem compounds as machine-generated content finds its way into repositories, is viewed as truth, is translated or otherwise transformed, or becomes the basis for further data models.

The Trust Challenge Posed by Oceans of Content

These three additions will further stretch the seams of repositories already bursting with a ballooning body of textual and structured content. But it’s not just size that matters. Having spent a good part of my life working for database management ISVs, I’ve internalized several fundamental concepts about data. Confidence in the data heads the list.

Enterprises learned lessons about managing Big Data and instilling trust… What that means is that the AWS Aurora, IBM DB2, Microsoft SQL Server, Oracle, and SAP HANA database management systems that underpin the operations of many organizations are called “the system of record,” the authoritative source for operations and decisions. Using these DBMSes, IT departments formalize how they process, manage, transform, deprecate, monitor, and perform a variety of other business-critical activities to keep that data trusted by its users.

…but these learnings have yet to be applied to “unstructured” content. Most textual, non-textual, transcribed, and automata-generated content resides in repositories less formally structured or formal. Many such storehouses draw on a mélange of multiple sources, in many cases no longer traceable to their origin. Their shortcomings begin with the source and continue through a series of transformations such as summarizations, excerpting, repurposing, and, of course, translation. And all of this occurs against the usual backdrop of budgets more focused on creating and managing the original source, leaving only crumbs for required transformations. We observe this mismatch of resources in the content life cycle at many organizations.

It’s Time to Plan for Ever More Trusted Multilingual Content

The realities and requirements of all that source content, its sundry mutations including translation, and the lack of business alignment between the two guided us in writing our annual look-ahead for the language market, “Seven Trends for Globalization in 2023.” The language services and technology sector earned US$52 billion in 2022, growing to a projected US$65 billion by 2026.

Language underpins global communication, commerce, and community. We prefaced our analysis of trends with the observation that the business-critical language sector is subject to the same enduring facts of life that affect any commercial enterprise. Those include a full array of macroeconomic forces such as inflation and war, global competition that requires local differentiation, consolidation through merger and acquisition, and price pressure, among others. VUCA – volatility, uncertainty, complexity, and ambiguity – continue to influence behaviors (“Weakened Confidence among LSPs”).

The language sector struggles to connect internally or with external sources. In my DBMS days I was one of the founders of a startup that did pioneering work across multi-vendor networks. Our focus on system heterogeneity introduced me to the concept of “bits is bits.” That is, no matter which operating system or application manages it, the language they’re in, the format – none of those attributes should matter to anyone managing the bits. That’s been the driving force behind CSA Research’s long-term, seemingly quixotic demand for universal source and multilingual connectivity. Within an enterprise, it’s doubly important due to the potential for sharing multilingual content assets across internal and external groups and partners for product development, branding, support, HR, and other business-critical functions.

People, process, and technology must be harnessed to power global content. The lack of heterogeneity and connectivity for source, derived, translated, adaptive, responsive, and otherwise transformed content results in a sloppy assortment of siloed solutions. In our research over the last 20 years, we’ve recommended solutions that we’ve called enterprise content management, business globalization, translation, and localization. But what’s in a name? In our experience, a content strategy called “localization” doesn’t sell as well as LangOps, so we’ve adopted this shorthand term derived from the accepted combination of IT and software development popularly known as DevOps. However, LangOps requires a change in mentality to bring language into everything. If it simply replaces “localization” – with its narrow focus and outlook – it will be just another meaningless bit of jargon.

Carbon-based life forms are key to a unified content strategy. LangOps produces, manages, transforms, and delivers content for communication, commerce, and community. Humans sit at the core of processes developing and refining content, ideally in a position to give feedback to consumers, customers, or constituents when it doesn’t resonate or feels false. With humans occupying that key position, LangOps must address the same sustainability issues as other industries. Expect to hear the call for accessibility, corporate social responsibility (CSR), environmental social governance (ESG), diversity-equity-inclusion (DEI), and related terms like content moderation, fair trade translation, and trustworthy source as humans participate at the intersection of many content uses.

Conclusion

Why do we think these long-term content, connectivity, and operational issues will begin to get more attention in the next few years? The materials and means to do so have fallen into place over the last decade. We are in the center of a perfect storm that has inundated the world with enormous amounts of data and content, algorithms that are beginning to make sense (and nonsense) of it, and enough computing power with increasingly affordable GPUs to process it.

As organizations organize and leverage these three assets across their applications, CSA Research views them as seeding a future Cambrian explosion of creativity and innovation derived from deep learning, optimizations, financial and social profit, and as yet unanticipated epistemic benefits from their collective multilingual source and transformed content. And, of course, transformations linguistic and otherwise will expand the total available market into the 53% where English just doesn’t cut it.

It won’t happen in 2023, but with the right vision and investment starting sooner rather than later, we can hope to experience those future advances instead of being mired in the pundits’ dystopia of translators and proofreaders working as janitors who edit sketchy MT output or muck out automata-generated logorrhea. Per aspera ad astra!

Join Us for a Discussion

Want to hear more about these topics? Join me, Arle Lommel, and Alison Toon for a discussion of these topics on January 17th. Register here: https://csa-research.zoom.us/webinar/register/WN_o-3kgI2oSkKl62CNOWDxOg

Photo credit: Jimmy Chan

Artificial intelligence Business climate Buyer strategic planning Content technology Digital transformation Global content Machine translation Translation market size Translation technology Corporate social responsibility

When Automation Meets Translation Quality

Learn how Automated Quality Estimation and Automated Post-Editing redefine translation efficiency and accuracy.

Explore The Report

Ready to Explore CSA Research Insights?

Access exclusive data, reports, and analyses that power smarter decisions across the global content industry.

Visit the platform

Meet Our Analyst

CSA Research

Recent Blogs

July 16, 2026 Peter Coleman

Your Sales Team Was Trained to Sell Translation. The Market Now Expects More.

LSPs are developing more sophisticated solutions, but many commercial teams still lack the confidence and language to sell their business value.

June 23, 2026 Alison Toon

Dublin 2026!

The week of June 8, I was in Dublin for two industry events: LocWorld55 where I presented “The Governance Gap in the Age of AI and Global Content” and XTM Live...

May 18, 2026 CSA Research

CSA Research’s GenAI Program: Built for Leaders. Designed for What’s Ahead.

CSA Research’s GenAI Program is a continuously updated research initiative that helps enterprises, GCSPs and LSPs understand and respond to AI-driven market cha...

May 8, 2026 Alison Toon

Reliable Automation Needs Measured Language Risk

Are you assessing, measuring, and mitigating language risk within your organization? AI, chatbots, machine translation, and other forms of automation make it su...

May 4, 2026 Don DePalma

Language Erosion Spans the Pond

Language programs are declining in the UK and US. Explore the structural causes and implications of growing reliance on English in global communication.

May 4, 2026 Don DePalma

The Quiet Erosion of US Language Capacity

US language programs are declining. Explore what shrinking enrollments and program closures reveal about national language capability.

March 12, 2026 Alison Toon

What Are the Language Needs for Healthcare Tourists and Their Providers

The language needs of healthcare tourists are straightforward in theory but surprisingly difficult in practice. Patients must understand procedures and risks, d...

January 20, 2026 CSA Research

AI and Global Content Predictions for 2026

Artificial intelligence (AI) is no longer an emerging topic in localization or global content operations. In 2026, AI maturity will become a decisive factor sep...

April 11, 2025 Peter Coleman

Powerling and OXO Merge: A Partnership towards A Global Content Service Provider (GCSP)

On April 10, 2025, Powerling and OXO (ranked #73 and #93, respectively, on CSA’s Ranking of the Largest LSPs in the World for 2024) announced their strategic me...

Turn Research Into Action

Our consulting team helps you apply CSA Research insights to your organization’s specific challenges, from growth strategy to operational excellence.

Contact our team