Apple now faces a high profile copyright lawsuit that alleges its AI development relied on pirated content. Two authors say Apple used the Books3 dataset to train Apple Intelligence and OpenELM without permission, drawing material from shadow libraries such as Bibliotik. The suit is a test case for authors rights, licensing and the rules that will govern generative AI going forward.
Modern language models need vast amounts of text to learn. The Books3 dataset, which contains about 196,000 works, has become infamous for including copyrighted ebooks obtained from shadow libraries. Researchers and developers have used Books3 to train models, and several companies have been vague about whether their training data included such sources.
The case could set major legal precedent about whether unlicensed copyrighted material may be used for commercial AI training. If the court rules for the authors, AI developers may be required to rebuild models using licensed content and to pay creators for training data. That would affect development costs, compliance and licensing markets for creative works.
This lawsuit highlights the need for transparency about training datasets and for robust licensing practices in generative AI. Companies that build or buy AI should ask vendors for clear documentation about data sources and licensing. Creators and rights holders are increasingly asserting control over how their work is used in AI development.
Read the full legal breakdown and follow updates as the case unfolds. Stay informed about developments in AI training data, Books3, shadow libraries and authors rights.