Meta Argues BitTorrent Uploads of Pirated Books Are Fair Use in AI Training Lawsuit

A Novel Fair Use Defense Emerges in AI Copyright Battle

In a landmark copyright lawsuit, Meta has advanced a novel and controversial legal argument: uploading pirated books to strangers via the BitTorrent protocol qualifies as fair use. This defense, detailed in a recent court filing, is Meta's latest move in a class-action lawsuit originally filed in 2023 by authors including Richard Kadrey, Sarah Silverman, and Christopher Golden.

The lawsuit centers on Meta’s use of copyrighted books sourced from shadow libraries like Anna’s Archive to train its Llama large language model (LLM). While a federal court last summer ruled that the *training itself* constituted fair use, it left Meta liable for the act of downloading and distributing the books via BitTorrent.

Now, Meta seeks to extend that fair use shield to cover the distribution element. The company argues that using BitTorrent was the only practical way to obtain the massive datasets needed and that uploading was an inherent, unavoidable part of the download process. This defense could set a significant precedent for how AI companies source training data.

BitTorrent as a 'Necessary' Tool for AI Development

Meta’s legal team laid out its reasoning in a supplemental interrogatory response filed in a California federal court. The core of the argument is technical necessity. Meta states that the datasets from Anna’s Archive were “only available in bulk through torrent downloads,” making BitTorrent “a more efficient and reliable means of obtaining the datasets.”

The protocol’s peer-to-peer nature means downloading also involves uploading pieces of files to other users. Meta contends this incidental sharing was “part-and-parcel of the download” and served the same “transformative fair use purpose” as the ultimate training activity. In essence, they claim the ends (fair use training) justify the means (distributive infringement).

This argument directly challenges the plaintiffs' claim that Meta engaged in “widespread and direct copyright infringement” by seeding torrents. The outcome will hinge on whether the court accepts that a technical requirement for access can legitimize an otherwise infringing act.

Legal Wrangling Over Discovery and Defense Timing

The authors’ legal team has sharply contested Meta’s late-stage introduction of this defense. In a letter to Judge Vince Chhabria, they accused Meta of making an “improper end-run around the discovery deadline.” The plaintiffs note Meta was aware of the uploading claims since November 2024 but never previously raised this specific fair use argument.

Meta swiftly responded, filing its own letter the next day to assert the defense was not new. The company pointed to a joint case management statement from December 2025 where it had flagged the issue, and noted the authors’ attorney had addressed it in a subsequent hearing.

This procedural skirmish underscores the high stakes. If Judge Chhabria allows the defense to proceed, it will become a central battlefield. If he rejects it as untimely, Meta's position on the remaining distribution claims could be severely weakened.

continue reading below...

Authors’ Testimony and the Question of Harm

Meta’s filing also leverages deposition testimony from the plaintiff authors to bolster its position. The company highlights that every named author admitted they were unaware of any Meta model output that directly replicated content from their books.

Notably, author Sarah Silverman testified that whether Meta’s models output language from her book “doesn’t matter at all.” Meta argues these admissions undercut any claim of market harm—a critical factor in fair use analysis. If the authors cannot demonstrate infringing outputs or lost sales, the lawsuit appears more focused on the training process itself, which the court has already sanctioned.

This strategic use of the plaintiffs’ own words mirrors Meta’s successful argument in the earlier fair use ruling on training. It frames the litigation as an attempt to control the AI training process rather than to remedy a specific economic injury.

The Broader Context: Shadow Libraries and AI's Data Hunger

Meta’s case is not isolated. The use of shadow libraries like Anna’s Archive is a widespread, open secret in AI development. According to a separate lawsuit filed by publishers, Anna’s Archive “publicly claims to have given ‘high-speed access’ to its illegal collection of more than 140 million copyrighted texts to companies in China, Russia, and elsewhere, many of them LLMs.”

That complaint alleges Anna’s Archive offered premium data access for AI training for $200,000, suggesting payment in cryptocurrency. This highlights the commercial dimension of the shadow library ecosystem and the intense demand for high-quality text data.

Meta subtly references this competitive landscape in its filing, stressing that its AI investment “helped the U.S. to establish U.S. global leadership,” positioning it ahead of geopolitical rivals. The implication is that restrictive copyright interpretations could hinder American tech dominance.

Why This Case Matters for the Future of AI

The legal theories being tested here will resonate far beyond Meta’s courtroom. If uploading via BitTorrent is deemed fair use, it could create a significant loophole for AI companies to ingest copyrighted material from peer-to-peer sources. Conversely, a ruling against Meta would reinforce the legal risks of using shadow libraries.

This case represents the final frontier of the original lawsuit. The BitTorrent distribution claims are “the last live piece” of litigation from the 2023 filing. The judge’s decision on whether to allow the new defense will shape the final chapter.

The outcome will also influence the broader industry dialogue around ethical data sourcing. As noted in other publishing industry reports, literary agents are increasingly urging writers to avoid AI due to a perceived “change in nature of submissions,” highlighting growing creator anxiety.

Ultimately, this case is about more than piracy. It’s a foundational clash between the insatiable data needs of generative AI and the traditional rights of copyright holders, with the legal concept of fair use as the contested middle ground.