For the complete documentation index, see llms.txt.
Skip to main content
New
17% · 1/6
Lesson 12 min 20 XP

AI and Copyright: Who Owns What AI Creates?

The legal and ethical questions surrounding AI training data, copyright infringement claims, and whether AI-generated content can be copyrighted.

The Training Data Problem

Every large AI model is trained on vast amounts of human-created content — text, images, code, music, video — much of it copyrighted. The legal question that will define AI's relationship with creative industries is whether training an AI model on copyrighted works constitutes fair use (in the US) or an exception to copyright (in other jurisdictions).

The New York Times sued OpenAI and Microsoft in December 2023, alleging that ChatGPT and Bing Chat were trained on millions of Times articles without permission and can reproduce them nearly verbatim when prompted. Getty Images sued Stability AI for training Stable Diffusion on 12 million Getty photographs. Visual artists filed class-action suits against Midjourney, Stability AI, and DeviantArt. Authors including Sarah Silverman, Michael Chabon, and the Authors Guild sued OpenAI and Meta.

AI companies argue that training is transformative use — the model learns patterns, not copies. Rights holders argue that the models cannot generate output without ingesting their work, making training a mass copyright violation.

AI and Copyright: Who Owns What AI Creates? | Model Diplomat