While publishers’ approach to OpenAI scraping publisher content may seem fragmented, from The New York Times suing OpenAI and Axel Springer doing a deal with OpenAI on its content, it’s all part of natural friction that occurs around any substantial technological innovation.
While a “unified industry voice” seems ideal (leverage!) for addressing Big Tech questions, at a practical level, it is near impossible due to the range of market forces, regulatory bodies, and jurisdictions at play regarding questions such as AI and LLM training.
Last week, I discussed The New York Times lawsuit, deals announced, and the plight of creators and smaller publishers with UK solicitor JJ Shaw. JJ, by way of a quick introduction, is a Managing Associate in the Digital, Commerce and Creative team at the top-tier, London-based law firm Lewis Silkin.
From the conversation, it quickly became apparent that developing a “fair value model” for scraping publisher content would be almost impossible. In the UK, for example, conversations about payment of “fair license fees” for using others’ data or content creations have not come to fruition.
According to JJ, getting formulas that work across multiple sectors was impossible. “It is almost like the government is saying, ‘This is too complicated; we are not going to be able to find this line in the sand between AI companies and publishers.’ So they’re almost giving it to [market forces and] litigation to determine that value.”
It also makes perfect practical sense, just as existing media-to-media international content licensing deals differ from brand to brand and channel to channel.
It’s all Mark Zuckerberg’s fault
The New York Times suing OpenAI and Microsoft is all Mark Zuckerberg’s fault with his “move fast and break things” mantra adopted by everyone in tech with stars in their eyes.
No, not really, but it helps contextualise what is happening now.
It was inevitable that the friction between LLM trainers and creative content producers would lead to a giant clash.
According to JJ, the “move fast and break things mantra” (which suggests making bold and quick decisions even if it means risk and unintended consequences), OpenAI “most certainly did not get it right from the off with rights, permissions, and clearances to scrape data on the scale it does” to train the ChatGPT 3.5 and 4 models.
However, as we all know, once the genie is out of the bottle, it’s gone.
This is especially true when technological advancement generates quick consumer adoption. I saw this first hand in Kenya circa 2007/8 when the leading mobile phone operator launched M-Pesa, a peer-to-peer money transfer app. Before that, Kenyans often sent cash to extended families outside cities such as Nairobi.
There was rapid consumer adoption of the app, and soon, consumers (and businesses) found all means of using M-Pesa to facilitate cashless transactions in a society where credit card penetration was low.
As I recall, it all happened in a flash before the Central Bank of Kenya had time to regulate its use. As we see with AI again, the regulator had to adapt after the fact rather than setting the ground rules at the start.
Independent creators and smaller publishers
While The New York Times’ lawsuit against OpenAI is not the first one, it is arguably the most high-profile media company suing over the use of its content in LLM training.
What makes it more interesting is that while The Times has gone this route, Axel Springer (and Associated Press earlier) recently announced an agreement with OpenAI. The Times, of course, says in the lawsuit that it had also been in conversations with OpenAI but could not reach a deal.
Moreover, according to The Times (sub required), others such as Gannett, NewsCorp IAC and the News/Media Alliance in the US are in talks with OpenAI.
Individual content creators and smaller publishers will require alliances or collective bargaining on their behalf (such as the News/Media Alliance example mentioned above). This is hard for the reason of markets, regulators and jurisdictions mentioned above.
The Times’ case could help.
While JJ can only speak from the perspective of English law, he believes that The New York Times has strong legal grounds to bring their claim. Unlike if OpenAI’s scraping of data was purely for research purposes, they have commercialised the ChatGPT model (e.g. through $20/month subscriptions), which puts the case in a different realm.
Should this go to trial (an open guess) and, as suggested, The New York Times could win, it would set a precedent. It’s not a solution for independent creators and smaller publishers, but it will help establish some “ground rules” early on.
Besides some of what we touched on above, there are several other questions to be answered. One of the interesting aspects for me is, for example, the “input” versus the “output” side of LLMs.
LLMs are changing the way we interface with content. Use search as an example: if (or when) we become accustomed to summaries in response to our prompts, would search results with a bunch of blue links still matter?
And can we say those summaries are materially so different from the data used at the input phase that they cannot be considered copyright infringement?
The Times’ suit lists examples of where the summaries contained pretty verbatim quotes from articles. That is not “materially new”, but would a generic user prompt deliver the same result?
There are ways to block content from being scraped, and this, one would argue, is particularly important for content protected and only available to paid subscribers. What is the case here?
A final thought
All said and done, there are sufficient innovations, experimentations and an increasing number of use cases around the effective use of AI among even small content-creator businesses.
Focusing on defence alone is not a strategy.
The glass remains at least half full.
Download the special report from our Mx3 AI mini-summit held in London in December 2023 for thoughts and ideas from across the spectrum. It’s available for free here.
Join us for more in Barcelona, Spain:
AI will feature in a number of sessions at our Mx3 Barcelona summit on 12-13 March, which puts innovation in Consumer and B2B media in and across media verticals into the spotlight.
This includes the closing keynote by Prof Lucy Kueng, board member, researcher and author, on ‘strategic smarts and how top media leaders are navigating the AI revolution’. There is more here.