Crazy amount of guests

LOL... I love finding those type, full of /24 ranges across all different countries. 100% dodgy actor, whether VPN or Proxy.
Yes, there are quite a bunch of remarkable patterns. Also, that there seems to be an invisible network of actors cooperating. For one, it is often a group of certain ASNs that show up as source of traffic at the same time nd sometimes there seems quite an overlap between them regarding the persons and companies behind them. Then you have a connection between the recent scraping attacks and bogus Crypto Companies - a couple of weeks ago I had an ASN that had changed ownership from an obvious Crypto-Company to a Scraper/Spam-Company and possibly the same goes for some of those companies offering resident proxies. Plus another often seen pattern: An ASN registered to a company (or rather a P/O box) in Panama or the Seychelles. It's really astonishing how easy it is to find patterns on the one hand and how nothing seems to be done by anyone against bad actors of all kinds on the other.
 
I have something to confess.
This giant distributed scraping wave is walking past some of my non-xenforo fail2ban setups which have a very high CPU load to hit ratio.
I moved them to cloudflare, and they are already complaining about some occasional outages in different regions and want my system back :(
Interestingly, fail2ban still catches plenty of stuff

My interest in finishing my prototype just went up. But business at the service company is starting to really kick up. If anyone is interested in some tag team action to build an alternative to Anubis in PHP, you have my ear.
 
I moved them to cloudflare (...) Interestingly, fail2ban still catches plenty of stuff

Opsie, Cloudflare still lets AI bots enter even with several preventive measures on the CF side.
It would be interesting to see if and how this changes, depending from the Cloudflare product (free vs. paid) one is using. According to their pricing matrix


Bildschirmfoto 2026-02-25 um 08.15.43.webp
Bildschirmfoto 2026-02-25 um 08.17.35.webp
Bildschirmfoto 2026-02-25 um 08.18.52.webp

one can probably not expect too much from the free tier as it only detects and stops simple bots. Juging from the description even the "Super Bot Fight Mode" in Business and Pro levels is rather marketing bling than a solution, judging from the description:

Bot Mitigation
Manage good and bad bots in real-time with speed and accuracy by harnessing the data from the millions of Internet properties on Cloudflare.

Content Scraping Protection
Protect all of your content including text, images and email addresses from web scrapers with Cloudflare's ScrapeShield™ service.


Free tier:
Bot Fight Mode
For an individual website. Challenge easy-to-detect bad bots from popular cloud providers.


Business and Pro tier (paid):
Super Bot Fight Mode
Block and challenge easy-to-detect bad bots from any source. Plus, bypass bot settings using WAF Custom Rules.


Only the enterprise tier offers more than that:
Bot Management
Manage AI crawlers and bot traffic to web and mobile apps without CAPTCHAs. Stop account abuse, malicious botnets, credential and card stuffing, content scraping, and inventory hoarding.


So it possibly comes down dot what they consider to be "easy to detect bots" and one can only hope that this is not limited to simple things like the transmitted user agent. In 2024 they invented the BotShield against AI Bots:


However - as we see obviously the current scraping attacks are not stopped by cloudflare as people report in the forum here.
Searching for "resident proxies" on cloudflare.com shows mainly one result fro 2024:


This refers to be used in the Product "bot management"

For existing Bot Management customers we recommend toggling “Auto-update machine learning model” to instantly gain the benefits of ML v8 and its residential proxy detection, and to stay up to date with our future ML model updates. If you’re not a Cloudflare Bot Management customer, contact our sales team to try out Bot Management.

and this again is included only in the highest paid plan (see table higher up in this post).


So there is no such thing as free lunch with Cloudflare as it seems. As usual: If you are not paying you are not the customer but the product. Cloudflare do offer their free tiers which offer some basic benefits - but they need those customers to be able to gain data and insights at scale that is then used in their paid products (only). Not surprising and in my eyes nothing to really complain about. Just somewhat surprising that a lot of people on this forums don't stop falsely claiming that Cloudflare's free tier would solve the bot problem.

The interesting question is how well Cloudflare detects bot and scraping traffic. According to their pretty interesting radar bot traffic makes slightly more than 30% of the requests currently:

Bildschirmfoto 2026-02-25 um 07.48.28.webp

Interestingly, this went down a bit. If I remember correctly it was up to 40% a couple of weeks ago. This does include all bots, legitimate bots like Googlebot as well as all sorts of shady ones. Judging from my own forums they do miss a fair bit of bot traffic then - I do have a share of on average ~40% of unwanted bot traffic (excluding bots like Google Bot or Bing Bot who alone visit countless times per day), and it goes up to above 70% on bad days - and still I do not catch all of them, mainly not being able to identify resident proxies from within central Europe reliably. Also, as I do block detected bots on their first request obviously the percentage of bot traffic on my forums is somewhat lower as it would be if I would leave them through, performing as many requests as they would like.

Obviously I don't know how "average" or typical my forum is compared to Cloudflare's average but judging from my numbers Cloudflare seems to miss a fair bit of bot traffic.

The source of bot traffic is mapped by Cloudflare like that:

Bildschirmfoto 2026-02-25 um 07.49.04.webp

Again a bit misleading, as it includes all bots, not just the bad ones. They also name the source ASNs:

Bildschirmfoto 2026-02-25 um 07.49.39.webp

These do not really fit the distribution I see on my forum. In contrast, they also map the percentage of bot traffic of all traffic per country and there you can smell the amount of resident proxies, especially in developing or smaller countries (along with the countries offering a lot of cloud datacenters and/or dodgy providers):


Bildschirmfoto 2026-02-25 um 08.00.06.webp

So overall it is somewhat unclear how comprehensively Cloudflare is able to detect bot traffic - but it seems that they possibly miss a fair bit on the one hand and that bot protection that tackles the current bot waves is only available in the highest paid plan (enterprise) anyway (and maybe possibly as a paid add on product on lower level plans). It remains somewhat vague what Cloudflare considers to be "simple bots" or "easy to detect bots", yet the level of protection offered by Cloudflare's free tier seems to be largely overrated when it comes to bots. The more as a lot of scraping bots lately claim to be able to overcome Cloudflare turnstile and other protection mechanisms like fingerprinting used to identify bots.
 
Last edited:
Yer, the bot traffic has to be a waste of CF resources too. They either need to roll out more advanced tools to everyone OR step up AI Labyrinth to tackle much much more.
 
Yer, the bot traffic has to be a waste of CF resources too. They either need to roll out more advanced tools to everyone OR step up AI Labyrinth to tackle much much more.
I'm not so sure: The free tier mainly offers customers increased speed and DDOS-protection, both pretty valuable features. Cloudflare on the other hand gathers huge amount of data to develop their premium products and is able to charge premium prices for them. Surely, with features becoming a commodity they could and probably will improve the features on the free plan - but, being relatively close to having a monopoly position they don't really have an incentive for that. In opposite: If only a small percentage of all websites have premium protection through the paid plans and most are left helpless against bots it is easy for Cloudflare to upsell into paid plans and less likely for bad actors to develop strategies that overcome Cloudflare's paid protection as it would only bring relatively small amounts of additional content to be scraped but need huge effort. So it is beneficial and pretty comfortable for Cloudflare as well as for it's paying customers to leave the free tier as it is w/o decently working protection against bots. Probably the benefits largely outweight the cost for Cloudflare.
 
It is btw. worth reading the Cloudflare blog from time to time. For one to see what they found out and are up to do, but also, as there ore are often hidden gems mentioned with links to interesting resources. For anyone tinkering with blocking bots using home grown tools i.e. these two could be of interest:

A curated list of AI-bot useragents, read to be used, in various formats:


A daily updated list of ~500.000 IPs of proxys used by bots and scrapers on different levels of reliabilty, ready to be used in custom solutions. not to be used to block blindly, bot possibly very helpful:


At the same time, Cloudflare give tipps how to make the contents of webpages easier and cheaper to access for AI-bots byy using markdown:


Btw. an approach that I found as well in the AI afficianado bubble. Wich (unwrittenly) intrinsically means: Not using/offering markdown makes it harder and more expensive for bots to scrape your content. Good to know if you don't want AI bots to make use of your content...

Other than that they proposed a cryptographic framework for authenticating bots:


In general, this sounds like an interesting idea - I had been thinking about such a concept as well some time ago. Surely, it won't solve the "bot pretends to be a human" problem but it would make segmentation easy to i.e. let Googlebot through reliably while blocking others or let them see only limited content. It could also be the foundation for a system where bots actually pay for visiting your site (very much like the token system used to today for using Ai models).

Interesting times...
 
FYI i'm on the pro plan and i'm getting effective protection after some tuning.
If they shove me into the business tier ( $200/mo ), i'm 100% building the alternative.
So far my 31 other servers don't need cloudflare, they are not running ultra slow magento :)
 
On the Pro plan, most of the time bots are blocked, and it's just a few times per week that bots are getting in for some reason.

7 days stats below: 5 million requests were successfully blocked.

1772056684950.webp
 
On the Pro plan, most of the time bots are blocked, and it's just a few times per week that bots are getting in for some reason.

7 days stats below: 5 million requests were successfully blocked.

View attachment 334371

Honestly, amazing ratio of fradulent traffic to not. I Think this is not because cloudflare is great, but whatever forum this is is great :)
 
  • Love
Reactions: rdn
Back
Top Bottom