Home / Technology / Judge Orders OpenAI to Reveal Data Deletion Secrets
Judge Orders OpenAI to Reveal Data Deletion Secrets
2 Dec
Summary
- OpenAI must disclose lawyer communications about deleted book datasets.
- Datasets 'Books 1' and 'Books 2' were sourced from shadow library LibGen.
- Judge cites OpenAI's shifting claims as reason for privileged information disclosure.

A federal judge has ordered OpenAI to produce internal communications with its lawyers concerning the deletion of datasets "Books 1" and "Books 2." These datasets, allegedly compiled from the shadow library Library Genesis, are critical to an ongoing class-action lawsuit filed by authors who claim their copyrighted works were used without permission to train OpenAI's ChatGPT. The court's decision stems from OpenAI's apparent contradictions regarding the reasons for deleting these datasets.
Judge Ona Wang cited OpenAI's shifting stance on whether the datasets' "non-use" was a reason for their deletion. Initially, OpenAI claimed "non-use" was a factor, then sought to shield related discussions under attorney-client privilege. This vacillation led the judge to rule that OpenAI could not claim privilege to avoid discovery, especially after the datasets had been on file for over a year. OpenAI must now provide these communications by December 8.
Authors believe revealing OpenAI's rationale for deleting the datasets could bolster their claims of willful copyright infringement. The judge's order also noted that many communications within a Slack channel named "excise-libgen" were not privileged, as they lacked requests for legal advice. OpenAI stated it disagrees with the ruling and intends to appeal.




