Cloudflare accuses Perplexity of violating crawl rules — fair or not?

Miri · Aug 7, 2025

Cloudflare claims that Perplexity has been bypassing robots.txt directives using undeclared crawlers with rotating IPs and user-agents to avoid being blocked.

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives

Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites.

blog.cloudflare.com

Perplexity responded by saying the traffic likely came from a third-party partner (Browserbase), and emphasized that their system only accesses websites in response to direct user queries — not for autonomous scraping or training.

https://www.perplexity.ai/hub/blog/agents-or-bots-making-sense-of-ai-on-the-open-web

What’s your take? Is this a serious breach of web standards, or just how modern AI tools operate
today?

MentaL · Aug 7, 2025

I trust CloudFlare.

ForumDevs · Aug 7, 2025

Miri said:
Cloudflare claims that Perplexity has been bypassing robots.txt directives using undeclared crawlers with rotating IPs and user-agents to avoid being blocked.

It's a serious accusation (plus this could absolutely trigger legal action against Perplexity or Browserbase, depending on the contractual relationships)

Miri said:
Is this a serious breach of web standards, or just how modern AI tools operate

AI standards are just being created, so this could definitely help establish some rights in the field of AI

Miri · Aug 7, 2025

Modern AI assistants work fundamentally differently from traditional web crawling. When you ask Perplexity a question that requires current information—say, "What are the latest reviews for that new restaurant?"—the AI doesn't already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question.

https://www.perplexity.ai/hub/blog/agents-or-bots-making-sense-of-ai-on-the-open-web#:~:text=Modern%20AI%20assistants,your%20specific%20question.

Personally, I always assumed that tools like Perplexity were storing the data they access in some form — maybe even using it to train their models. Something about their explanation still feels unclear to me.

smallwheels · Aug 10, 2025

Miri said:
Perplexity responded by saying the traffic likely came from a third-party partner (Browserbase)

It is a lame but still common "excuse" to claim failures would have happened at an independent "partner". It is more than common for ages that companies of all industries delegate the more filthy and dirty (and sometimes illegal) parts of their businesses to 3rd parties formally - for the exact reason that they can play the innocent and still gain profit from dirty practices. I'd assume in most cases the companies in question know very well what is going on - and tolerate or even stimulate it. It is often the reason to go via a 3rd party in the first place...

Cloudflare accuses Perplexity of violating crawl rules — fair or not?

Miri

Well-known member

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives

MentaL

Well-known member

ForumDevs

Active member

Miri

Well-known member

smallwheels

Well-known member

We value your privacy