OpenAI’s Legal Battles Over AI Copyright Claims: Implications for AI Development and Content Use

With the rapid ascent of artificial intelligence (AI), the legal boundaries concerning AI and copyright law are swiftly transforming. At the center of this legal shift is OpenAI‘s advanced artificial intelligence language model, ChatGPT, which has faced a surge of copyright infringement claims over the past year. Nonetheless, OpenAI recently celebrated a legal victory when a federal judge largely dismissed copyright claims brought against the company, emphasizing that merely alleging ChatGPT’s training on copyrighted material falls short of proving vicarious infringement. The ruling represents another instance where a court has scrutinized and questioned the fundamental liability theories presented by creators in their battle against AI.

Claims against OpenAI

The consolidated cases, which were brought against OpenAI by authors Paul Tremblay, Sarah Silverman, Christopher Golden, and Richard Kadrey, seek to represent a broad class of U.S. copyright owners. In the suits, the plaintiffs claim their works have been improperly used as part of the training data for OpenAI’s language models. Specifically, they allege that OpenAI’s ChatGPT was trained using datasets sourced from unauthorized “shadow library” websites, such as Bibliotik, Library Genesis, and Z-Library, where their books were accessible in mass via torrents.

The plaintiffs also presented evidence that ChatGPT can produce summaries of their works, citing instances with Silverman’s “Bedwetter,” Golden’s “Ararat,” and Kadrey’s “Sandman Slim” as examples, thereby infringing on their copyrights. They highlight that the AI failed to retain any copyright management information that accompanied their published works.

By asserting unauthorized use of their copyrighted materials in OpenAI’s training regimen, the authors presented six claims that include copyright infringement, negligence, unjust enrichment, and unfair competition, seeking statutory and actual damages, permanent injunctive relief, and coverage for legal fees.”

Judge’s Ruling

In her ruling, U.S. District Judge Araceli Martínez-Olguín dismissed the assertion that every output from OpenAI’s ChatGPT constitutes an infringing derivative work sourced exclusively from copyrighted sources. She noted that the plaintiffs failed to explain what the outputs entail or demonstrate that any specific output bears substantial — or any — similarity to their copyrighted books.

Moreover, Judge Martínez-Olguín rejected the authors’ allegation that OpenAI removed copyright management information, a claim they made under the Digital Millennium Copyright Act (DMCA), 17 U.S.C. § 1202(b). She clarified in her decision that, according to the DMCA’s explicit wording, liability necessitates the distribution of the original “works” or “copies thereof.” Since the plaintiffs did not assert that OpenAI had distributed their books or their copies, such claims were deemed inadequate for establishing a DMCA violation.

The judge further dismissed the plaintiff’s allegation that OpenAI violated California’s Unfair Competition Law, as outlined in Cal. Bus. & Prof. Code § 17200 et seq. This dismissal was based on the determination that it hinged on the previously dismissed DMCA claim and posited only a “speculative” harm. Regarding the “fraudulent” category, she found the UCL claim to be inadequately substantiated, being overly dependent on the unsuccessful DMCA allegations.

Finally, Judge Martínez-Olguín dismissed the plaintiffs’ negligence claim, concluding that they did not successfully establish a duty owed by OpenAI nor a pertinent relationship between the parties to underpin such a claim. Additionally, she rejected their claim of unjust enrichment, stating, “Since the Plaintiffs have not demonstrated that OpenAI unjustly accrued benefits from the Plaintiffs’ copyrighted works via fraud, mistake, coercion, or request, this claim is without merit.”

The suit was not dismissed in its entirety, however. Judge Martínez-Olguín chose not to dismiss the UCL claim under the “unfair” prong, stating, “Given the assumed veracity of the Plaintiffs’ allegations — that the Defendants employed the Plaintiffs’ copyrighted works to train their language models for commercial gain — the Court finds that such conduct might represent an unfair practice.” Furthermore, in a footnote, the judge remarked, “OpenAI has not contested preemption… [T]he Court acknowledges the potential that, should the UCL claim replicate the copyright infringement allegations, it could be subject to preemption by the Copyright Act.”

The authors were granted leave to amend, providing them an opportunity to refile their lawsuit. Additionally, their claim alleging a violation of California’s unfair competition law was allowed to proceed, based on the argument that the company’s utilization of copyrighted works to train its AI model for commercial gain amounts to an unfair business practice. Interestingly, OpenAI chose not to seek dismissal of the direct copyright infringement claim.

Similar Lawsuits

This ruling echoes the sentiments of two other judges in the Northern District of California, who have raised doubts about whether creators can prove their core claims without showing that the AI tools create outputs closely resembling the allegedly infringed works. In a pertinent case involving artists and AI art generators, U.S. District Judge William Orrick criticized the allegations as “defective in numerous respects.”

A crucial point of this debate is the necessity for creators to demonstrate “substantial similarity”—a standard test in copyright law to assess if one work has infringed upon another by comparing the content. Creators argue that this standard should not apply to them, asserting that companies like OpenAI and Meta directly utilize their work to develop their AI technology.

This series of legal battles could set a precedent for whether AI companies must obtain licenses for the content they use to train their systems and might influence the future of such technologies. Some lawsuits are even demanding an injunction to compel these firms to dismantle their AI models.

Conclusion

In conclusion, the legal confrontations being brought by creators against AI companies mark a pivotal moment in defining the interplay between artificial intelligence and copyright law. The cases brought forth by authors Paul Tremblay, Sarah Silverman, Christopher Golden, and Richard Kadrey against OpenAI illuminate critical concerns regarding the use of copyrighted materials in AI training processes. Despite a partial victory for OpenAI, the door remains open for further legal scrutiny, particularly concerning the fairness of AI’s use of copyrighted content for commercial gain. This and similar cases across the U.S., underscore the broader legal and ethical debates surrounding AI development. They question the adequacy of existing copyright frameworks to address AI’s unique challenges and may herald significant changes in how AI technologies are developed, utilized, and regulated, potentially driving the necessity for licensing agreements or even more transformative legal reforms.

My Attorney Is A Robot