Click to open contact form.
Your Global Partners in the Business of Innovation

AI Developers Face Growing Legal Scrutiny Over Copyrighted Datasets

Client Updates / March 03, 2025

Written by: Haim Ravia, Dotan Hammer

Recent legal developments highlight a judicial trend toward scrutinizing the datasets used in AI model training, as part of the continued legal struggle over copyright infringement allegations against AI developers.
In Thomson Reuters v. Ross Intelligence, the U.S. District Court for the District of Delaware found that Ross Intelligence infringed Westlaw’s copyrights by using Westlaw headnotes to train its AI-based legal research tool. This is the first court decision that clearly finds AI training to be a copyright-infringing practice. The court rejected Ross’s fair use defense, emphasizing that Ross’s product was intended to serve as a market substitute for Westlaw, thereby adversely impacting its potential market and precluding a fair use argument.

In a separate high-profile case, Kadrey v. Meta, unredacted documents recently revealed Meta’s alleged systematic use of illegally obtained copyrighted material to train its AI, Llama. The plaintiffs, which include authors Richard Kadrey, Sarah Silverman, and Christopher Golden, allege that Meta used their copyrighted books obtained from sources like Library Genesis (LibGen) without permission. These claims now seem to be corroborated by internal Meta correspondence showing that Meta’s engineers torrented the dataset. The practice was purportedly approved by Meta’s CEO, Mark Zuckerberg, but hidden due to concerns about potential negative publicity. Meta’s engineers also admitted to filtering out copyright lines to create a version stripped of copyright information, to train Llama.

In another ongoing litigation filed by the same plaintiffs against OpenAI, a federal judge ordered OpenAI to provide plaintiffs with the dataset used to train GPT-4. In previous discovery cycles, copyrighted material was found in databases used to train the AI model. This evidence may influence the subsequent decisions in the case, which could have significant implications for OpenAI and training practices in the AI industry.

Click here to read the court decision in Thomson Reuters v. Ross Intelligence.

Click here to read the plaintiff’s pleadings regarding the unredacted documents in Kadrey v. Meta.

MEDIA HIGHLIGHTS