OpenAI responds to NY Times copyright lawsuit: 'without merit'

Following the bombshell news that broke at the end of last year that The New York Times, one of the most widely read and iconic newspaper brands in the world, was suing ChatGPT maker OpenAI and its backer Microsoft over copyright infringement, today OpenAI hit back publicly with a blog post arguing the suit is “without merit.”

“We support journalism, partner with news organizations, and believe The New York Times lawsuit is without merit,” the post from OpenAI begins.

The post goes on to make three broad claims:

1. We collaborate with news organizations and are creating new opportunities

2. Training is fair use, but we provide an opt-out because it’s the right thing to do

3. “Regurgitation” is a rare bug that we are working to drive to zero

Each claim is further elaborated upon in the post.

The big headline (pun intended) is OpenAI’s attempt to square its recent content licensing deals with other rival news outlets and publishers — including Axel Springer (publisher of Politico and Business Insider), and the Associated Press (AP) — with its prior position that it could and can continue to lawfully scrape any public website for training data on which to train its AI models, including the GPT-3.5 and GPT-4 models powering ChatGPT.

Since its DevDay developer conference in November 2023, OpenAI has offered indemnification — or legal protections out of its own pocket — for organizations and subscribers to its AI products.

How did we get here?

The NYT originally filed the suit in late December 2023 in the famed New York Southern District Court (which oversees Manhattan). It accused OpenAI of not only training on its copyrighted articles without proper permission or compensation but also provided examples of ChatGPT generating text that was nearly identical in its content to previously published NYT articles, which it says constitutes direct copyright infringement by making “unauthorized reproductions and derivatives” of NYT works.

The suit was filed after reportedly months of failed negotiations between OpenAI and NYT representatives to reach a content licensing deal.

In today’s blog post, OpenAI says that it believes “using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents” but notes that it provides “a simple opt-out process for publishers (which The New York Times adopted in August 2023) to prevent our tools from accessing their sites.”

Yet OpenAI does not explain that it provided this opt-out only after the launch of ChatGPT in November 2022, so the NY Times nor any other publisher had much of a chance to stop their data from being scraped before then.

However, the implication is that now that OpenAI has provided this mechanism and some organizations have taken advantage of it, the deals with other publishers are a way of circumventing them from using it and blocking OpenAI from being able to train on their material.

OpenAI accuses NYT of ‘intentional manipulation’

Also of note: OpenAI accuses NYT of “intentionally manipulating prompts” to get the exhibits of evidence of article reproduction for its case, in violation of OpenAI’s Terms of Service.

“Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third–party websites. It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate. Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.

Despite their claims, this misuse is not typical or allowed user activity, and is not a substitute for The New York Times. Regardless, we are continually making our systems more resistant to adversarial attacks to regurgitate training data, and have already made much progress in our recent models.”

That claim essentially boils down to the idea that the NYT sought to prompt ChatGPT specifically in ways to produce responses close to its articles and selectively focused on only these responses out of many possible responses to make its case, which OpenAI argues is not acceptable user behavior and that it is working technically to prevent.

In response, a spokesperson from Trident DMG, a communications firm that says it represents NYT, provided a statement from an NYT lawyer to TechForgePulse via email:

“The blog concedes that OpenAI used The Times’s work, along with the work of many others, to build ChatGPT.As The Times’s complaint states, ‘Through Microsoft’s Bing Chat (recently rebranded as ‘Copilot’) and OpenAI’s ChatGPT, Defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.’ That’s not fair use by any measure,” said Susman Godfrey partner Ian Crosby, lead counsel for The New York Times.

OpenAI and the NYT will square off before Federal District Court Judge Sidney H. Stein, though our review of the case docket did not show any date for an initial hearing. The docket also doesn’t show that this blog post has been entered as an argument or evidence, though, likely, some version of it calling for a dismissal will ultimately appear there.

With increasing examples of numerous AI services reproducing copyrighted material — including AI image generator Midjourney, already sued by artists and taken to task by an artist and AI entrepreneur Gary Marcus in a recent guest article published by IEEE Spectrum, complete with examples — 2024 will most likely be a defining year for the technology and the legality of its controversial training data sources.

TechForgePulse's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.