Charles Benaiah is the publisher of unCharles, a nugget of Substack gold. In this feature, specially syndicated for Mx3’s Collectif, he questions whether publishers can successfully block AI from scraping their content to train LLMs. Spoiler alert: No…it’s game theory 101.
My friend, Rob P (not a subscriber) was one of many people who sent me a story that the New York Times and other big-name publishers will block OpenAI from accessing their stories. I said, “..a story…” because Rob’s version came from CNN. Nearly identical stories also showed up on The Verge, The Guardian, NPR, Ars Technica, Ad Week, The Wrap…
I’ll stop there to save digital trees — because: carbon footprint. More on that nonsense another time. Just know, I could have listed a few hundred dozen more. And, that’s before the aggregators regurgitate it.
All the notes my friends sent were some version of, “This will put the brakes on training AI.”
It won’t. Because: Game Theory. This is not just my crackpot notion. John Nash won a Nobel in economics for this. Russell Crowe played him in A Beautiful Mind.
If you’re thinking AI is trying to get laid. You’re right. The New York Times and the other blondes of publishing expect AI to make a mad rush for them. The blondes are playing checkers trying to block AI. AI is playing three dimensional synergized hypercube game theory.
AI doesn’t need the New York Times. There are hundreds of dozens other publishers producing training-quality content. Those other publishers won’t block AI. Because: they don’t care.
As usual, the smart folks move first. While the NYT took Paul Lynde in the center square to block, AP got millions from OpenAI to let them access 170 years worth of old stories. Who won?
There are so many reasons and workarounds it’s not funny. The sagacious societal folks who run erudite editorial for the cultured class ought to know this. I don’t even need to say it louder. Keeners like that sit up front.
One, media is highly derivative. For any given story, dozens of outlets cover it nearly verbatim. Toss a bunch of unblocked ones on the floor. AI will $ grep -i “words” file_pattern to grok knowledge in the patterns the way a leprechaun would use Excel to =count(grains). This type of training is about quantity first. Your LLM is about enriched quality.
Which brings us to number two. To misquote Tina Turner, “And any old talking will do.” Ok, so you can’t read Wall Street Journal stories for input. But, the WSJ will happily read you its stories. Literally, an MP3 file that’s unblocked. With a simple speech to text converter and you’re all set. Ok, it’s still won’t. But what tech bro would ignore a loophole that big?
We’re on to number three. 347 billion emails are sent each day. 85% of them are spam. I’d guess another 13% are newsletters. All AI would have to do is get a Gmail account and give it to the GAP for 10% off chinos. Moments later, the inbox would be full of training materials. Just read the crappy sales pitches, the schlocky offers on garbage I don’t need (Not yours Sherry. Yours are gold.), the reminders to get a reservation at that place you ate at that one time, and four hundred million newsletters.
And, finally, four. AI has billions of dollars and a lust for words. They could send people to every library, scan every book, be done by lunch, and have money left over to pop over to Draft Kings and place an NFL parlay. If they ask nicely, maybe Google could give them a Lyft in those cars Google uses to take pictures of the world.
Before you poo-poo it, Google did the book thing and the courts sided with them. Margaret Atwood is already complaining about how that could lead to a dystopian society.
I’m going to close with something akin to what Karen emailed me. Which I choose to take as a compliment. Charles, “You don’t write like AI.” I’d add, “Or the 17 trillion inputs it was trained to mimic.” Does AI really need more of the same?
Charles Benaiah is the CEO of Watzan, a techy company for medical media. When he’s not running a media company, he reads about media, thinks about it, pull out what’s left of his hair dealing with it, and, then, he writes about it over on unCharles. Charles is a member of Media Makers Meet – Mx3 Collectif.