The New York Times sues OpenAI and Microsoft for Copyright Infringement in AI Models Development

The New York Times Company (“The Times”) has initiated legal proceedings against Microsoft Corporation and various OpenAI entities, accusing them of illegally using its copyright-protected material. According to the lawsuit, this material was used in the development of AI models like ChatGPT and Bing Chat, which allegedly violates The Times’s copyrights. The case underscores the potential negative effects of AI on traditional media, including potential revenue losses and a decline in reader trust.

The Times, a recipient of 135 Pulitzer Prizes and publisher of over 250 original articles daily, emphasizes the resource-intensive nature of crucial news reporting. They generate significant revenue through content licensing, including some free licenses for academic and non-profit use. The lawsuit alleges that ChatGPT used The Times’s articles to train the system. Once the system is trained, it presents those articles to users, either in whole or in summary form, and imitates their style. Truly interesting examples of the alleged infringement were presented where only small alterations were made by ChatGPT.

Data Training

The first step in training an AI model is to gather a large dataset. This data can be anything from text, images, and sounds to more complex data like user interactions or sensor readings. The quality and quantity of this data significantly impact the model’s performance. Earlier versions of ChatGPT used substantial content from The Times (e.g., a dataset with 333,160 entries led to The Times). Later versions diversified their data sources (mostly social media posts and comments), yet The Times remained a critical source of reliable information. These entries were used for training without The Times’s permission. One of the sources used is a large collection of online material called “Common Crawl,” which the suit alleges contains information from 16 million unique records from sites published by The Times. Some of the articles were copied in their full length.

According to the lawsuit, Microsoft developed specialized systems to replicate the content of The Times for AI models. To train the GPT models, Microsoft and OpenAI worked together to create a complex and customized supercomputing system that could store and replicate copies of the training dataset, including The Times’ content. For the purpose of training GPT models, allegedly, The Times articles were copied and ingested multiple times.

The defendants publicly defend their actions as fair use, arguing that the use of copyrighted material in AI training serves a transformative purpose (the AI-generated output has a different character than the input). However, The Times argues this is not transformative, as it involves creating competing products without compensation.

(Non)Profit

A competing product? OpenAI initially declared as an altruistic organization, saw a shift in 2019 when an affiliate company was established for profit. Since transitioning to a for-profit model, OpenAI ceased open-source releases of its models, starting with GPT-3 in 2020, keeping subsequent model designs and training details secret. As of August 2023, OpenAI was on pace to generate more than USD 1 billion in revenue over the next twelve months. The market valuation of ChatGPT now is as high as USD 90 billion. Users might get the same or similar articles in both The Times and ChatGPT, which could lead to market disruption.

Negotiations re Licensing

Different standpoints led to the negotiations. The Times, with numerous other media outlets, began talks regarding the price and terms of licensing of the content to the AI creators. The negotiations focused on a concept of partnership around the real-time display of The Times Articles (with attribution) in ChatGPT, in which The New York Times would gain a new way to connect with their existing and new readers, and ChatGPT users would gain access to their reporting. However, the negotiations with The Times have not resulted in a settlement. On the other hand, The Associated Press reached an agreement.

Claims

The Times contends that the success of OpenAI and Microsoft’s AI models heavily relies on copyright infringement.

The professional public already talks about the “hallucinations” Chat GPT has. The lawsuit also addresses this issue along with the false attributions to The Times, causing commercial harm.

The lawsuit seeks to address various legal claims, including vicarious and contributory copyright infringement and trademark dilution and requests the destruction of all infringing AI models that are based on The Times’s articles.

AI is neither good nor bad, and there are still nuances in its creation and use. This court case will certainly make more aspects clear and enable more legal security in the field of new technologies.

OpenAI responds

On 8 January 2024, OpenAI published a blog post with its position claiming:

They collaborate with news organizations and are creating new opportunities;
Training is fair use, but we provide an opt-out because it’s the right thing to do;
“Regurgitation” (providing almost unchanged articles) is a rare bug that they are working to eliminate;
The New York Times is not telling the whole story, emphasizing the content of the negotiations and good faith actions of the defendant when they took down Chat GPT to solve bugs that tackled The Times.

The official response before the court in New York is still expected.

The entire tech and IP world is watching how these interdependent interests will be resolved.

The information in this document does not constitute legal advice on any particular matter and is provided for general informational purposes only.

By Rastko Petakovic, Senior Partner, Goran Radosevic, Partner, and Nikola Kliska, Senior Associate, Karanovic & Partners

Sidebar

Navigation

Radovan Hrisny Joins Havel & Partners as Security Director

Pop & Partners Advises Guris on EPC Contract for Battery Energy Storage Project in Romania

Sorainen Advises Tallinna Lauluvaljak on Contractual Framework for Song Festival Grounds

Comms and BD Make Equity Partners at Kochanski & Partners

Meltem Azbazdar Becomes GC Europe at Diageo

Wolf Theiss Advises Sibelco on Acquisition of Czech Glass Recycling Group

White & Case Advises UniCredit on PLN 660 Million Financing for Modivo

A&O Shearman Advises Polenergia Subsidiaries on Onshore Wind Farm Refinancing in Poland

CMS Advises Investor Consortium on Formation of EUR 3 Billion Pan-European Outlet Mall Joint Venture

Putting Your Legal Experts in the Thought Leadership Spotlight

The Pre-Pandemic Glory Days in the Czech Republic: A Buzz Interview with Lukas Hejduk of CMS

Why Clarity Makes Your Legal Insights Matter

If Out Shopping in Slovakia: A Buzz Interview with Bruno Stefanik of Wolf Theiss

Future-Proofing Legal Operations: Insights into AI, LLMs, and Next-Gen Tools

The Promise of Tomorrow in Ukraine: A Buzz Interview with Iurii Dynys of DLF

Future-Proofing Legal Operations: Insights into AI, LLMs, and Next-Gen Tools

2025 Turkish GC Summit Sneak Peek: Interview with Kerem Turunc of Turunc

Cybersecurity in the AI Age

Inside Insight: Interview with Mihaela Scarlatescu of Farmexim

Inside Insight: Interview with Ana Zakovska of IT Labs

Inside Insight: Simone Quantschnigg of Vamed Care

The New York Times sues OpenAI and Microsoft for Copyright Infringement in AI Models Development

Tools

Typography

News Categories

Latest News

More Analysis

Latest Analysis and Commentary

In-House Categories

Latest In-House

Tools

Typography

Share This