Apple sued for allegedly training AI models on pirated books
text_fieldsApple is facing a new lawsuit in the US over claims that it trained its artificial intelligence (AI) systems on copyrighted books without permission.
Two authors have accused the company of using a dataset that contained pirated works.
The lawsuit was filed on Friday in the federal court of Northern California. Authors Grady Hendrix and Jennifer Robertson allege that Apple’s large language model, OpenELM, was trained using illegally obtained copies of their books.
The filing seeks class action status against Apple.
According to the complaint, Apple’s model card for OpenELM, published on Hugging Face, noted that the training included RedPajama, a dataset sourced from the internet. The plaintiffs claim that RedPajama contained Books3, described as “a known body of pirated books.” They argue that their works were part of this dataset.
The lawsuit requests that the court allow a jury trial and award class statutory damages, compensatory damages, restitution, disgorgement, and other relief. It also calls for the destruction of any Apple AI models trained on the disputed data.
Apple, which released OpenELM as an open-source model in 2023, has previously said that it does not power Apple Intelligence or other machine learning features in its devices. The company described the project as “a contribution to the research community.”
The controversy around OpenELM is not new. In 2024, a report claimed that part of its training data included video subtitle text from YouTube.
In a related case, AI startup Anthropic revealed on Friday that it will pay $1.5 billion (roughly Rs. 13,200 crore) to settle a separate lawsuit brought by a group of authors. They accused the company of training its Claude AI models on copyrighted works without consent.
Anthropic did not admit liability as part of the settlement.