Authors Sue Anthropic Over Pirated Books in AI Training

Three authors—Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson—have filed a class action lawsuit against Anthropic AI, claiming the company used pirated versions of their copyrighted books to train its AI models.

The lawsuit focuses on Anthropic’s use of “The Pile,” a dataset that allegedly includes hundreds of thousands of books sourced from the pirating site Bibliotik. The authors argue that Anthropic’s Claude chatbot, which the company promotes as superior to OpenAI’s model, was trained on this stolen content, allowing the company to profit without compensating the creators.

Anthropic, which has raised around $6 billion in funding from major companies like Google and Amazon, is projected to earn $850 million in revenue in 2024. The authors contend that this success is built on unpaid creative work.

This lawsuit follows similar legal actions against Nvidia for using the same dataset, and comes amid growing concern from YouTubers whose content was also used in The Pile without permission. The lawsuit underscores the ongoing tension between AI companies and creators over the use of copyrighted material in training datasets.