Cloudflare accuses Perplexity of violating crawl rules — fair or not?

Miri

Well-known member
Cloudflare claims that Perplexity has been bypassing robots.txt directives using undeclared crawlers with rotating IPs and user-agents to avoid being blocked.


Perplexity responded by saying the traffic likely came from a third-party partner (Browserbase), and emphasized that their system only accesses websites in response to direct user queries — not for autonomous scraping or training.


What’s your take? Is this a serious breach of web standards, or just how modern AI tools operate
today?
 
Cloudflare claims that Perplexity has been bypassing robots.txt directives using undeclared crawlers with rotating IPs and user-agents to avoid being blocked.
It's a serious accusation (plus this could absolutely trigger legal action against Perplexity or Browserbase, depending on the contractual relationships)

Is this a serious breach of web standards, or just how modern AI tools operate
AI standards are just being created, so this could definitely help establish some rights in the field of AI
 
Modern AI assistants work fundamentally differently from traditional web crawling. When you ask Perplexity a question that requires current information—say, "What are the latest reviews for that new restaurant?"—the AI doesn't already have that information sitting in a database somewhere. Instead, it goes to the relevant websites, reads the content, and brings back a summary tailored to your specific question.

Personally, I always assumed that tools like Perplexity were storing the data they access in some form — maybe even using it to train their models. Something about their explanation still feels unclear to me.
 
Perplexity responded by saying the traffic likely came from a third-party partner (Browserbase)
It is a lame but still common "excuse" to claim failures would have happened at an independent "partner". It is more than common for ages that companies of all industries delegate the more filthy and dirty (and sometimes illegal) parts of their businesses to 3rd parties formally - for the exact reason that they can play the innocent and still gain profit from dirty practices. I'd assume in most cases the companies in question know very well what is going on - and tolerate or even stimulate it. It is often the reason to go via a 3rd party in the first place...
 
Back
Top Bottom