Courts Are Starting To Define What “AI Discovery” Means

Here’s What You Need to Know

eData Edge: Navigating the Everchanging World of eDiscovery

By Melissa Weberman

As companies increasingly rely on generative AI tools to assist with research, writing, coding, and other business functions, opposing parties in litigation have begun to seek that data in discovery. Lawyers are now grappling with new questions: Are AI prompts and outputs discoverable? What about logs, settings, or other data showing how an AI tool was used? A recent court decision sheds light on these emerging issues.

On September 19, 2025, the U.S. District Court for the Southern District of New York issued an important ruling in one of the most closely watched AI-related cases in the country: In re OpenAI Inc. Copyright Infringement Litigation, brought by The New York Times against OpenAI and Microsoft. The case, part of a broader wave of copyright lawsuits against AI companies, raises questions about how these technologies use copyrighted material to train and generate content.

The SDNY’s September ruling confirms something important for all litigants: Ordinary discovery rules still apply to AI data. The court refused to compel The New York Times to produce prompts and outputs from its internal AI tool, finding the request not relevant to the core copyright issues and, even if it were, not proportional given the review burden (tens of thousands of entries and significant privilege review time and cost).

Earlier in May, the court ordered OpenAI to preserve all output logs going forward, raising concerns that companies would need to keep every AI interaction indefinitely. The September ruling clarified that preservation obligations must still be targeted and defensible — not a mandate to preserve every piece of AI data indefinitely — and later proceedings ended the broad, going-forward preservation requirement.

What AI Content Might Be Discoverable

Requests may seek prompts (what custodians typed or uploaded) and outputs (what the tool returned) as well as AI-generated summaries, transcripts, or drafts when they relate to the dispute. Limited metadata or logs (timestamps, tool/model, request IDs) and administrative or audit information (who had access, what settings applied) may also be relevant. By contrast, system-wide logs or training data are typically outside the scope unless they directly bear on the claims and are within the company’s control. In some cases, contract terms with AI vendors may also shape what information can be accessed or produced, particularly if the agreement addresses data ownership, confidentiality, or notice requirements.

How To Approach Preservation Practically

From a preservation standpoint, the focus should be on identifying which custodians with relevant information used AI tools, which tools were involved, what kinds of data were entered, and where that information resides. Preservation should be targeted — limited to prompts, outputs (including AI-generated summaries, transcripts, or drafts), and minimal logs that relate to the issues in dispute. The goal is a defensible, proportional approach grounded in documentation and reasonableness — not a wholesale capture of all AI activity across the organization. As AI becomes more embedded in daily workflows, preservation decisions are also becoming more complex. Understanding how and why AI content was created can help determine whether it needs to be preserved.

Negotiating Discovery on AI

AI-related discovery is highly fact-dependent — in some cases it will be appropriate, and in many others, not. Often, parties can object to production of AI-related material where prompts, outputs, or logs do not directly relate to the issues in dispute. In those situations, objections grounded in relevance and proportionality are entirely consistent with the rules and with emerging case law.

When AI discovery is appropriate, discussions typically focus on scope and burden — narrowing requests through custodian, topic, and date limits, or by agreeing to metadata or sampling before any broad content review. When responding would require reviewing large volumes of entries with privilege implications, quantifying that burden has proven persuasive with courts. Some organizations are also finding that employees occasionally use workplace AI tools for personal purposes, which can raise privacy issues and add review burden — another reason to limit requests to clearly relevant data.

Takeaways

When it comes to AI-related data, it will often be appropriate to object to production unless the prompts, outputs, or logs are directly tied to the claims or defenses. But when AI content is connected to the issues and proportional to the case, it should be produced just like any other category of ESI. The key is to draw the line thoughtfully, limiting production to what truly matters and defending against unnecessary overreach.

Preservation remains a separate obligation. Even if production objections are expected, organizations should still identify relevant custodians and sources if AI discovery is in scope, and take reasonable steps to preserve potentially relevant data. Courts focus on reasonableness and documentation, not perfection.

© Arnold & Porter Kaye Scholer LLP 2025 All Rights Reserved. This Blog post is intended to be a general summary of the law and does not constitute legal advice. You should consult with counsel to determine applicable legal requirements in a specific fact situation.