Home > Blogs & Events > Blogs

Generative AI and Copyright: Unraveling the Complexities

Arle Lommel July 20, 2023

0:00

Listen to This Blog

Disclaimer: I am not a lawyer, and this blog post does not contain legal advice. CSA Research cannot provide you with legal advice. If you have concerns about any issues raised in this post, consult your legal counsel concerning these matters and how they apply to your business.

A common worry about generative AI (GenAI) is that the content that it creates may be subject to copyright claims. Our recent survey of freelance linguists reflected this concern: Copyright issues are their second most important concern with GenAI, with 74% viewing the technology negatively or strongly negatively in this regard. However, an examination of claims about copyright and how GenAI works reveals a different picture.

Media coverage of high-profile lawsuits from companies such as Getty Images against AI companies for alleged copyright violation is a major driver for this concern. In general, such legal actions assume a “Google on Steroids” vision of how GenAI works. In it, GenAI systems vacuum up large amounts of content, store it, index it, and then copy from it to generate output. This viewpoint is evident in a statement from a website proclaiming a lawsuit against Stability AI: “Stable Diffusion contains unauthorized copies of millions – and possibly billions – of copyrighted images. These copies were made without the knowledge or consent of the artists” (emphasis added).

In reviewing these lawsuits, it is important to remember that anyone can file a lawsuit and that claims made in them have yet to be adjudicated in court. In addition, lawsuits often function as a legal tactic to bring parties to the bargaining table and may never make it to court, so you should never take claims at face value. Although trying to predict court outcomes in complex cases is a fool’s game, we anticipate that these lawsuits will face an uphill battle to succeed. Why? Simply put, the “Google on Steroids” understanding of GenAI gets some fundamental details wrong – and those details matter.

GenAI Doesn’t Store Copies of Training Data

Taking the description of the lawsuit above at face value, readers could be forgiven for assuming that the Stable Diffusion system actually “contains” “copies” of training data, but it – like other GenAI systems – does not.

GenAI processes the training data but doesn’t store it. It doesn't store specific documents, books, or sources, and it also doesn't know which documents were in its training set or the specifics about any individual data source. Instead, it synthesizes its input data into what machine learning specialists call “correlations.” These are statistical measures that describe the association between random variables that GenAI uses to find relationships between different features in a dataset. A high correlation between these variables suggests a strong relationship.

Its use of training data is similar to a student taking notes. One way to think of this process is to compare two students in a history lecture who each read several books before taking an at-home examination on the causes of the Hundred Years’ War. The first makes a verbatim copy of a paragraph from one of the books to answer an essay question on an examination. This action would be considered plagiarism and cheating. The second takes very good notes from the books for future reference and assimilates them into her understanding. When she answers that same essay test question, she draws on her understanding of various sources and answers in her own words. Although she drew on sources to create those notes, the notes are not copies of the sources and she is not guilty of plagiarism. Although she could not have written her essay without access to the books she read, she was not violating copyright when she read them and made notes.

What All This Means for Localization

A proper understanding of how GenAI works complicates any narrative based on the “Google on Steroids” theory, which relies on the existence of copies. But it also shows why GenAI has so much difficulty with “hallucination” (making things up): It can only rely on the probabilistic correlations it established, and these may be incorrect for a given prompt. In addition, statistically likely – but incorrect – outcomes can easily drown out factually correct – but statistically unlikely – outcomes.

Despite the challenges, lawsuits may yet set precedents that would affect how LSPs and corporate localization groups use GenAI. For example, if courts were to decide that the creators are guilty of copyright violation and must pay large settlements or license fees to rights holders, it would raise the cost of generative systems considerably and restrict their usefulness. However, most of the ways that LSPs are using GenAI – for translation, marketing, or sales – are unlikely to attract copyright claims directly. If it is questionable whether GenAI developers are guilty of copyright violation, it is even more so for users of their products who never even had direct potential access to the training data.

The risk is especially minimal for translation, where groups are translating their own copyrighted materials and where the output will have a clear pedigree in that content. It will be somewhat higher for cases where LSPs or enterprises use GenAI to create new public marketing content that may end up closely resembling existing material on the web. If that resemblance is close enough to terms or phrasing used in something out there, you might face a copyright claim from a rights holder hoping to get lucky. In most cases companies facing such a claim will simply delete the offending copy rather than run the risk that a court will find against them. One way to mitigate this possibility is to use GenAI for ideation and research rather than to write content directly.

Should You Use GenAI?

For now, at least, copyright concerns are not a reason for LSPs or enterprise translation groups to avoid using GenAI altogether. Accordingly, the decision should be made based on how well it meets your needs.

CSA Research maintains that GenAI is not yet a viable mainstream translation solution – due to its slow speed, cost, and lack of integration with translation processes, among other factors – but copyright currently ranks very low on the list of concerns about the technology today. Regulatory uncertainty is a much bigger concern.

LSPs and enterprise localization groups should monitor legal cases around GenAI and copyright, but they should not hold off on using it solely out of fear that they may be violating copyright. If they do have concerns specific to their use, they should consult with their legal counsel, but ultimately the industry’s long experience with statistical and neural MT – which face the same issues – shows that copyright is not likely to be a roadblock for using GenAI in the language sector.

So, do you need to worry about copyright and GenAI? Probably not as much as you do about other factors such as security, irrational hype, factually incorrect output, or regulatory factors. These should rank higher in your list of concerns than current lawsuits based on a faulty understanding of how the technology works.

When Automation Meets Translation Quality

Learn how Automated Quality Estimation and Automated Post-Editing redefine translation efficiency and accuracy.

Explore The Report

Ready to Explore CSA Research Insights?

Access exclusive data, reports, and analyses that power smarter decisions across the global content industry.

Visit the platform

Meet Our Analyst

Arle Lommel

VP Of Research

After obtaining a BA in linguistics in 1997, I began working for the now-defunct Localization Industry Standards Association (LISA), where I headed up standards development and worked on quality assessment models. At the same time, I completed a...

Connect with Arle Lommel

Recent Blogs

June 23, 2026 Alison Toon

Dublin 2026!

The week of June 8, I was in Dublin for two industry events: LocWorld55 where I presented “The Governance Gap in the Age of AI and Global Content” and XTM Live...

May 18, 2026 CSA Research

CSA Research’s GenAI Program: Built for Leaders. Designed for What’s Ahead.

CSA Research’s GenAI Program is a continuously updated research initiative that helps enterprises, GCSPs and LSPs understand and respond to AI-driven market cha...

May 8, 2026 Alison Toon

Reliable Automation Needs Measured Language Risk

Are you assessing, measuring, and mitigating language risk within your organization? AI, chatbots, machine translation, and other forms of automation make it su...

May 4, 2026 Don DePalma

Language Erosion Spans the Pond

Language programs are declining in the UK and US. Explore the structural causes and implications of growing reliance on English in global communication.

May 4, 2026 Don DePalma

The Quiet Erosion of US Language Capacity

US language programs are declining. Explore what shrinking enrollments and program closures reveal about national language capability.

March 12, 2026 Alison Toon

What Are the Language Needs for Healthcare Tourists and Their Providers

The language needs of healthcare tourists are straightforward in theory but surprisingly difficult in practice. Patients must understand procedures and risks, d...

January 20, 2026 CSA Research

AI and Global Content Predictions for 2026

Artificial intelligence (AI) is no longer an emerging topic in localization or global content operations. In 2026, AI maturity will become a decisive factor sep...

April 11, 2025 Peter Coleman

Powerling and OXO Merge: A Partnership towards A Global Content Service Provider (GCSP)

On April 10, 2025, Powerling and OXO (ranked #73 and #93, respectively, on CSA’s Ranking of the Largest LSPs in the World for 2024) announced their strategic me...

March 5, 2025 Peter Coleman

How Official English Language Changes US Policy

On March 1st US President Trump signed an executive order (EO) designating English as the official language of the United States. This action takes the first st...

Turn Research Into Action

Our consulting team helps you apply CSA Research insights to your organization’s specific challenges, from growth strategy to operational excellence.

Contact our team