Aries - Apple Hit with Copyright Lawsuit Over AI Training Data

Apple Hit with Copyright Lawsuit Over AI Training Data

Introduction

Apple now faces a high profile copyright lawsuit that alleges its AI development relied on pirated content. Two authors say Apple used the Books3 dataset to train Apple Intelligence and OpenELM without permission, drawing material from shadow libraries such as Bibliotik. The suit is a test case for authors rights, licensing and the rules that will govern generative AI going forward.

Background

Modern language models need vast amounts of text to learn. The Books3 dataset, which contains about 196,000 works, has become infamous for including copyrighted ebooks obtained from shadow libraries. Researchers and developers have used Books3 to train models, and several companies have been vague about whether their training data included such sources.

Key allegations

Dataset used: Plaintiffs claim Apple trained commercial models using the Books3 dataset, which includes allegedly pirated ebooks.
Access method: The complaint contends Apple crawlers ingested copyrighted works from shadow libraries like Bibliotik without consent or payment.
Commercial use: The authors say Apple used that content to build Apple Intelligence and OpenELM, commercial systems that benefit from unlicensed material.
Named works: The filing names specific books by the plaintiffs as part of the corpus allegedly used in training.

Why this matters

The case could set major legal precedent about whether unlicensed copyrighted material may be used for commercial AI training. If the court rules for the authors, AI developers may be required to rebuild models using licensed content and to pay creators for training data. That would affect development costs, compliance and licensing markets for creative works.

Implications for stakeholders

For AI companies: Expect pressure to prove training data provenance and to negotiate licenses for books, music, artwork and code used in model training.
For creators: A successful suit could strengthen authors rights and create new revenue streams for licensing training data.
For legal landscape: Courts may clarify how fair use applies to model training and whether prior settlements, such as the recent Anthropic settlement, influence outcomes.

Takeaways

This lawsuit highlights the need for transparency about training datasets and for robust licensing practices in generative AI. Companies that build or buy AI should ask vendors for clear documentation about data sources and licensing. Creators and rights holders are increasingly asserting control over how their work is used in AI development.

Learn more

Read the full legal breakdown and follow updates as the case unfolds. Stay informed about developments in AI training data, Books3, shadow libraries and authors rights.

selected projects

Unlock new opportunities and drive innovation with our expert solutions. Whether you're looking to enhance your digital presence

View Post

Uber Plans Robotaxi Testing in Germany by 2026

View Post

ASML Stakes €10 Billion on European AI: Chip Giant Becomes Mistral's Top Shareholder

Ready to live more and work less?

Get started