Cloudflare wants to save the internet from the risks of “zero-click” content. Will it succeed?

Close-up of a person reading an article on a smartphone news app

Author

Anabelle Nicoud

Staff Writer

IBM

Can the internet as we know it survive the AI age? Cloudflare believes it can, at least when it comes to protecting content creators. The tech giant, which helps manage and secure traffic for 20% of the web, announced this week that it will be the first internet infrastructure provider to block AI crawlers that scrape sites without compensation or permission.

The move, which was welcomed by media giants like The Atlantic, Fortune, TIME and The Associated Press, as well as tech companies like Pinterest and Reddit, is the first step toward building a “pay per crawl” marketplace, wrote Cloudflare Cofounder and CEO Matthew Prince.

“Cloudflare, along with a majority of the world’s leading publishers and AI companies, is changing the default to block AI crawlers unless they pay creators for their content,” he wrote. “That content is the fuel that powers AI engines, and so it’s only fair that content creators are compensated directly for it.”

“But that’s just the beginning. Next, we’ll work on a marketplace where content creators and AI companies, large and small, can come together. Traffic was always a poor proxy for value. We think we can do better.”

The rise of the bots

With the rise of generative AI and AI search powered by Anthropic, OpenAI, Meta and Perplexity, the web is seeing a new type of visitor: bot scrapers. This shift affects not only news publishers, which rely on referral traffic to monetize their journalism, but also content creators and large tech platforms. In one instance, Reddit recently filed a suit against Anthropic and claims its bots are scraping its content—which Anthropic denies.

“Tech companies are affected by AI crawlers too,” said Will Allen, Head of AI Control, Privacy and Media Products at Cloudflare in an interview with IBM Think. “Pinterest, Quora and Reddit are some of the most popular user-generated content tech sites that have signed on in support of our permission-based approach to AI crawlers, along with companies in the AI space like ProRata AI and Hyperscience.”

Bots are used for training, but also for retrieval augmented generation (RAG), which connects generative AI models to external knowledge bases, such as publicly available content on the internet. According to a report released last month by tech company TollBit, RAG bot traffic observed on their partners’ sites grew 49%, nearly 2.5 times the rate of training bot traffic at 18%. Out of the top 12 bots crawling websites, TollBit found that in the first quarter of 2025, ChatGPT, Meta and Perplexity were the most active, making up a total of about 70% of monthly average scrapes by AI bots.

This new traffic takes a toll on servers and drives growing costs on publisher infrastructure. In April, Wikimedia, the nonprofit behind Wikipedia, noted that 65% of its most expensive traffic came from bots. “Our content is free, our infrastructure is not,” the organization said in a blog post.

The data-hungry bots have also impacted click-through rates on the search engine results page, or SERP, which have taken a sharp decline in recent months. Take Google’s AI Overviews: a recent study by marketing company Ahrefs shows that AI Overview, a product rolled out by the search giant to every user last May, reduced clicks by 34.5%. While AI Overviews continue to grow—by 116% since last March—the sites served up on the SERP take a hit.

“What that means is that if you’re making money through subscriptions, through advertising, [through] any of the things that content creators are doing today, visitors aren’t going to be seeing those ads,” said Cloudflare’s Prince during a recent interview on CNBC. “They’re not going to be buying those subscriptions anymore. And that means it’s going to be much, much harder for you to be a content creator.”

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

Good bot, bad bot

But not all bots are equal: with the rise of AI crawling bots also comes a rise in well-meaning bots—and unknown ones.

Miso Technologies Cofounder and CEO Lucky Gunasekara leads Project Sentinel, which monitors more than 8,300 sites from leading publishers around the world in news and academia, including Newsweek, The Guardian, USA Today and BBC. According to figures collected for the project, there are more than 1,700 bots on the radar of 7,000 publishers, Gunasekara shared with IBM Think. This figure grew by 35% since February, while most publishers target only 17 bots.

“We talk to a lot of publishers, and the question mark is how we know that this is working when it comes to small, bad actors,” he said in an interview. Among the biggest bots he monitored, he found several that can’t be tied to a major AI company. “What do we do when a bad actor purchased 100,000 IP addresses that are just a bunch of bots?” he asked.

Allen also distinguishes “well-intentioned operators of crawlers, bots and agents” that want a clear way to identify their bots to site owners from bad actors. “Our proposals and support for WebAuthn [web authentication] continue to receive a lot of support and collaboration across the tech ecosystem,” he said.

“When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we’re able to fingerprint. We use Cloudflare’s network of over 57 million requests per second on average to understand how much we should trust the fingerprint,” he added. “We compute global aggregates across many signals, and based on these signals, our models are able to consistently and appropriately flag traffic from evasive AI bots.”

A partial solution?

Cloudflare isn’t the first company to try to “negotiate” on behalf of content creators. The past year has seen companies such as ScalePost and TollBit emerge and propose solutions for publishers to monitor, sell or monetize data for AI companies.

But Cloudflare’s enviable market could make its move more impactful.

“If you were to describe a group that is best positioned, it would be Cloudflare,” said Gunasekara.

“It’s important that we are seeing one of the big first steps of publishers standing up to the companies. What’s tricky is we don’t know if the AI companies will circumvent it,” said Lily Ray, an SEO expert and Vice President at Amsive, in an interview with IBM Think. Many content creators might not necessarily grasp the impact of blocking by default—after all, not everyone wants to disappear from AI search. “It’s a bit dangerous for sites that don’t understand the implications,” she said.

Cloudflare says publishers have the option to choose to allow crawlers to access their content for training, search or inference. Existing customers can block AI crawlers anytime with a single click in their Cloudflare dashboard.

“Customers can let Cloudflare create and manage a robots.txt file, which creates the appropriate entries to let crawlers know not to access their site for AI training,” Allen explained. “Customers can choose to block AI bots only on portions of their sites that are monetized through ads.”

Mixture of Experts | 23 January, episode 91

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Different tech, same dilemmas

The question of regulating the exchanges between AI companies and publishers might get a lot of coverage now as new AI labs emerge and investments flow. But it isn’t a new one, observes Eric Goldman, a Professor of Law at Santa Clara University School of Law in Silicon Valley, who studied the “infomediary” model during the nineties, when the internet was created.

“The technology might be different or might have evolved, but what we’re talking about today is not new,” he told IBM Think.

“This issue has been discussed for decades, and no one has yet successfully built an infomediary model, though there were billions of dollars of easy money thrown at that problem in the 1990s. So, Cloudflare may have cracked the model; they may be able to make it work, but the historical track record in this field is not great.”

Goldman published “Generative AI is Doomed,” a paper on the topic, last year. According to him, the prevailing regulatory and legal responses to generative AI will limit or even negate its benefits.

The legal landscape still has to be shaped by outcomes of various lawsuits launched by authors and publishers against major AI companies in the US and around the world. “So far, we have reason to believe that the default rule is that training a generative AI model on copyrighted works is not infringement, but these issues are going to go up on appeal, all of them,” Goldman said. “Until we start to get appellate rulings, they’re just early data points.”

Related solutions
IBM® watsonx Orchestrate™

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx Orchestrate™.

Explore watsonx Orchestrate
Natural language processing tools and APIs

Accelerate the business value of artificial intelligence with a powerful and flexible portfolio of libraries, services and applications.

Explore NLP solutions
AI consulting and services

Reinvent critical workflows and operations by adding AI to maximize experiences, real-time decision-making and business value.

Explore AI services
Take the next step

Easily design scalable AI assistants and agents, automate repetitive tasks and simplify complex processes with IBM® watsonx Orchestrate™.

Explore watsonx Orchestrate Explore NLP solutions