Over the past two years, a thriving market for licensing copyrighted material to train artificial intelligence systems has emerged. OpenAI was the first to strike deals with publications, like Axel Springer, News Corp. and the Associated Press. A few others in the field followed.
Such agreements weren’t in place when AI firms first started to face litigation accusing them of widespread infringement. Now, lawsuits are increasingly targeting the existence of this licensing market to argue that AI companies are illegally pilfering creators’ works.
Related Stories
Authors, in a proposed class action filed on Monday evening, accused Anthropic of illegally downloading and copying their books to power its AI chatbot Claude. The lawsuit alleges that the Amazon-backed company “usurped a licensing market for copyright owners.”
Without intervention from Congress, the legality of using copyrighted works in training datasets will be decided by the courts. The question will likely be answered in part on fair use, which provides protection for the use of copyrighted material to make a secondary work as long as it’s “transformative.” It remains among the primary battlegrounds for the mainstream adoption of the tech.
The authors’ lawsuit nods to AI firms’ position that their conduct is covered by the legal doctrine. By refusing to license content that allowed it to build Claude, Anthropic is undercutting an existing market that’s been established by other AI companies, the complaint says.
The allegations could be aimed at undermining a fair use defense, which was effectively reined in when the Supreme Court issued its decision in Andy Warhol Foundation for the Visual Arts v. Goldsmith. In that case, the majority said that an analysis of whether an allegedly infringing work was sufficiently transformed must be balanced against the “commercial nature of the use.” The authors, and other similarly situated creators, are looking to leverage that ruling to establish that Anthropic could’ve simply licensed the material that it infringed upon and that its conduct illegally hurts their prospects to profit off of their material by interfering with potential deals.
“Instead of sourcing training material from pirated troves of copyrighted books from this modern-day Napster, Anthropic could have sought and obtained a license to make copies of them,” the lawsuit states. “It instead made the deliberate decision to cut corners and rely on stolen materials to train their models.”
Absent Anthropic’s alleged infringement, the authors say blanket licensing practices would be possible through clearinghouses, like the Copyright Clearance Center, which recently launched a collective licensing mechanism. Record labels, publications and different authors suing AI companies have advanced similar arguments in other litigation.
The Authors Guild is exploring a model for its members to opt-in to the offering of a blanket license to AI companies. Early discussions have involved a fee to use works as training materials and a prohibition on outputs that borrow too much from existing material.
“We have to be proactive because generative AI is here to stay,” Mary Rasenberger, chief executive of the organization, told The Hollywood Reporter in January. “They need high-quality books. Our position is that there’s nothing wrong with the tech, but it has to be legal and licensed.”
The proposed class was filed shortly before the announcement on Tuesday of OpenAI reaching a deal with Condé Nast to display content from its brands, including Vogue, The New Yorker, and GQ, within ChatGPT and a search tool prototype. It was filed on behalf of Andrea Bartz (The Lost Night: A Novel, The Herd), Charles Graeber (The Good Nurse: A True Story of Medicine, Madness, and Murder, Kirk Wallace Johnson (The Fisherman and the Dragon: Fear, Greed, and a Fight for Justice on the Gulf Coast). The lawsuit, which only alleges copyright infringement, seeks to represent other authors whose books were used as training data and a court order blocking further infringement.
The authors also argue that Anthropic is depriving authors of book sales by facilitating the creation of rip-offs. When Kara Swisher released Burn Book earlier this year, Amazon was flooded with AI-generated copycats, according to the complaint. In another instance, author Jane Friedman discovered a “cache of garbage books” written under her name.
According to the lawsuit, authors have turned to Claude to generate “cheap book content,” and the complaint highlights an individual who has created dozens of books in a short period of time to make its case.
The authors claim that Anthropic used a dataset called “The Pile,” which incorporates nearly 200,000 books from a shadow library site, to train Claude. In July, Anthropic confirmed the use of the dataset to various publications, according to the lawsuit.
Anthropic didn’t immediately respond to a request for comment.
Aug. 23, 9 am Updated to revise a paragraph within this story as well as include more detail from the complaint and remove an incorrect reference to author Tim Boucher.
THR Newsletters
Sign up for THR news straight to your inbox every day