Crazy amount of guests

LOL... I love finding those type, full of /24 ranges across all different countries. 100% dodgy actor, whether VPN or Proxy.
Yes, there are quite a bunch of remarkable patterns. Also, that there seems to be an invisible network of actors cooperating. For one, it is often a group of certain ASNs that show up as source of traffic at the same time nd sometimes there seems quite an overlap between them regarding the persons and companies behind them. Then you have a connection between the recent scraping attacks and bogus Crypto Companies - a couple of weeks ago I had an ASN that had changed ownership from an obvious Crypto-Company to a Scraper/Spam-Company and possibly the same goes for some of those companies offering resident proxies. Plus another often seen pattern: An ASN registered to a company (or rather a P/O box) in Panama or the Seychelles. It's really astonishing how easy it is to find patterns on the one hand and how nothing seems to be done by anyone against bad actors of all kinds on the other.
 
I have something to confess.
This giant distributed scraping wave is walking past some of my non-xenforo fail2ban setups which have a very high CPU load to hit ratio.
I moved them to cloudflare, and they are already complaining about some occasional outages in different regions and want my system back :(
Interestingly, fail2ban still catches plenty of stuff

My interest in finishing my prototype just went up. But business at the service company is starting to really kick up. If anyone is interested in some tag team action to build an alternative to Anubis in PHP, you have my ear.
 
I moved them to cloudflare (...) Interestingly, fail2ban still catches plenty of stuff

Opsie, Cloudflare still lets AI bots enter even with several preventive measures on the CF side.
It would be interesting to see if and how this changes, depending from the Cloudflare product (free vs. paid) one is using. According to their pricing matrix


Bildschirmfoto 2026-02-25 um 08.15.43.webp
Bildschirmfoto 2026-02-25 um 08.17.35.webp
Bildschirmfoto 2026-02-25 um 08.18.52.webp

one can probably not expect too much from the free tier as it only detects and stops simple bots. Juging from the description even the "Super Bot Fight Mode" in Business and Pro levels is rather marketing bling than a solution, judging from the description:

Bot Mitigation
Manage good and bad bots in real-time with speed and accuracy by harnessing the data from the millions of Internet properties on Cloudflare.

Content Scraping Protection
Protect all of your content including text, images and email addresses from web scrapers with Cloudflare's ScrapeShield™ service.


Free tier:
Bot Fight Mode
For an individual website. Challenge easy-to-detect bad bots from popular cloud providers.


Business and Pro tier (paid):
Super Bot Fight Mode
Block and challenge easy-to-detect bad bots from any source. Plus, bypass bot settings using WAF Custom Rules.


Only the enterprise tier offers more than that:
Bot Management
Manage AI crawlers and bot traffic to web and mobile apps without CAPTCHAs. Stop account abuse, malicious botnets, credential and card stuffing, content scraping, and inventory hoarding.


So it possibly comes down dot what they consider to be "easy to detect bots" and one can only hope that this is not limited to simple things like the transmitted user agent. In 2024 they invented the BotShield against AI Bots:


However - as we see obviously the current scraping attacks are not stopped by cloudflare as people report in the forum here.
Searching for "resident proxies" on cloudflare.com shows mainly one result fro 2024:


This refers to be used in the Product "bot management"

For existing Bot Management customers we recommend toggling “Auto-update machine learning model” to instantly gain the benefits of ML v8 and its residential proxy detection, and to stay up to date with our future ML model updates. If you’re not a Cloudflare Bot Management customer, contact our sales team to try out Bot Management.

and this again is included only in the highest paid plan (see table higher up in this post).


So there is no such thing as free lunch with Cloudflare as it seems. As usual: If you are not paying you are not the customer but the product. Cloudflare do offer their free tiers which offer some basic benefits - but they need those customers to be able to gain data and insights at scale that is then used in their paid products (only). Not surprising and in my eyes nothing to really complain about. Just somewhat surprising that a lot of people on this forums don't stop falsely claiming that Cloudflare's free tier would solve the bot problem.

The interesting question is how well Cloudflare detects bot and scraping traffic. According to their pretty interesting radar bot traffic makes slightly more than 30% of the requests currently:

Bildschirmfoto 2026-02-25 um 07.48.28.webp

Interestingly, this went down a bit. If I remember correctly it was up to 40% a couple of weeks ago. This does include all bots, legitimate bots like Googlebot as well as all sorts of shady ones. Judging from my own forums they do miss a fair bit of bot traffic then - I do have a share of on average ~40% of unwanted bot traffic (excluding bots like Google Bot or Bing Bot who alone visit countless times per day), and it goes up to above 70% on bad days - and still I do not catch all of them, mainly not being able to identify resident proxies from within central Europe reliably. Also, as I do block detected bots on their first request obviously the percentage of bot traffic on my forums is somewhat lower as it would be if I would leave them through, performing as many requests as they would like.

Obviously I don't know how "average" or typical my forum is compared to Cloudflare's average but judging from my numbers Cloudflare seems to miss a fair bit of bot traffic.

The source of bot traffic is mapped by Cloudflare like that:

Bildschirmfoto 2026-02-25 um 07.49.04.webp

Again a bit misleading, as it includes all bots, not just the bad ones. They also name the source ASNs:

Bildschirmfoto 2026-02-25 um 07.49.39.webp

These do not really fit the distribution I see on my forum. In contrast, they also map the percentage of bot traffic of all traffic per country and there you can smell the amount of resident proxies, especially in developing or smaller countries (along with the countries offering a lot of cloud datacenters and/or dodgy providers):


Bildschirmfoto 2026-02-25 um 08.00.06.webp

So overall it is somewhat unclear how comprehensively Cloudflare is able to detect bot traffic - but it seems that they possibly miss a fair bit on the one hand and that bot protection that tackles the current bot waves is only available in the highest paid plan (enterprise) anyway (and maybe possibly as a paid add on product on lower level plans). It remains somewhat vague what Cloudflare considers to be "simple bots" or "easy to detect bots", yet the level of protection offered by Cloudflare's free tier seems to be largely overrated when it comes to bots. The more as a lot of scraping bots lately claim to be able to overcome Cloudflare turnstile and other protection mechanisms like fingerprinting used to identify bots.
 
Last edited:
Yer, the bot traffic has to be a waste of CF resources too. They either need to roll out more advanced tools to everyone OR step up AI Labyrinth to tackle much much more.
 
Yer, the bot traffic has to be a waste of CF resources too. They either need to roll out more advanced tools to everyone OR step up AI Labyrinth to tackle much much more.
I'm not so sure: The free tier mainly offers customers increased speed and DDOS-protection, both pretty valuable features. Cloudflare on the other hand gathers huge amount of data to develop their premium products and is able to charge premium prices for them. Surely, with features becoming a commodity they could and probably will improve the features on the free plan - but, being relatively close to having a monopoly position they don't really have an incentive for that. In opposite: If only a small percentage of all websites have premium protection through the paid plans and most are left helpless against bots it is easy for Cloudflare to upsell into paid plans and less likely for bad actors to develop strategies that overcome Cloudflare's paid protection as it would only bring relatively small amounts of additional content to be scraped but need huge effort. So it is beneficial and pretty comfortable for Cloudflare as well as for it's paying customers to leave the free tier as it is w/o decently working protection against bots. Probably the benefits largely outweight the cost for Cloudflare.
 
It is btw. worth reading the Cloudflare blog from time to time. For one to see what they found out and are up to do, but also, as there ore are often hidden gems mentioned with links to interesting resources. For anyone tinkering with blocking bots using home grown tools i.e. these two could be of interest:

A curated list of AI-bot useragents, read to be used, in various formats:


A daily updated list of ~500.000 IPs of proxys used by bots and scrapers on different levels of reliabilty, ready to be used in custom solutions. not to be used to block blindly, bot possibly very helpful:


At the same time, Cloudflare give tipps how to make the contents of webpages easier and cheaper to access for AI-bots byy using markdown:


Btw. an approach that I found as well in the AI afficianado bubble. Wich (unwrittenly) intrinsically means: Not using/offering markdown makes it harder and more expensive for bots to scrape your content. Good to know if you don't want AI bots to make use of your content...

Other than that they proposed a cryptographic framework for authenticating bots:


In general, this sounds like an interesting idea - I had been thinking about such a concept as well some time ago. Surely, it won't solve the "bot pretends to be a human" problem but it would make segmentation easy to i.e. let Googlebot through reliably while blocking others or let them see only limited content. It could also be the foundation for a system where bots actually pay for visiting your site (very much like the token system used to today for using Ai models).

Interesting times...
 
FYI i'm on the pro plan and i'm getting effective protection after some tuning.
If they shove me into the business tier ( $200/mo ), i'm 100% building the alternative.
So far my 31 other servers don't need cloudflare, they are not running ultra slow magento :)
 
On the Pro plan, most of the time bots are blocked, and it's just a few times per week that bots are getting in for some reason.

7 days stats below: 5 million requests were successfully blocked.

1772056684950.webp
 
Honestly, amazing ratio of fraudulent traffic to not.
True and that ist actually a good point. I'd assume it is more likely for a bigger and more well known forum to get attacked by bots than a smaller one in the first place. But once once your forum is known to the scrapers (which will happen sooner or later) you will be hit, independent from the size. But their resources are not endless, so I'd assume there's a upper limit to it (as long as we are not taking about a dedicated DDOS but things like scraping or spamming). My forum is tiny: 3000 registered users, 250 of them visit per day, plus between the same and three times this amount of legitimate guests). So about 1.000 at maximum, typically less. However, including scrapers it is between 1.250 and 4.500 visitors per day, based on counting the number of different IPs. Which results in a deflection quote between 30 and somewhat over 70% because of them.

In comparison, @rdn has about 900k visitors per week and 500 million requests with 5 million threats according to Cloudflare, so a threat rate of just about 1% of the requests. Obviously requests are not directly comparable to visitor IPs but it seems clear, that the percentage of "bad" traffic on his forums is lower than on mine.

So possibly with the size of a forum from a certain point on the bad actors do not or cannot scale up accordingly to the size of the forum while with very small ones like mine their number is over proportional.

Yet @rdn says:

Opsie, Cloudflare still lets AI bots enter even with several preventive measures on the CF side.

so the number of of bad requests probably looks better that it is in reality and the question is how many baddies are not getting caught currently.

On the Pro plan, most of the time bots are blocked, and it's just a few times per week that bots are getting in for some reason.

How would you now that the bots are blocked (in opposite to i.e. "some bots" or "many bots")? In my case I can be pretty sure of the amount to a wide degree due to the regional nature of my forum: Most of the requests that do not come from my core region (and not from search engines) are bad actors plus almost all that come from data centers as well. On top of that come the resident proxies from inside my core region that I cannot identify reliably at the moment. How do you @rdn measure, that bots are blocked other than b the success number of Cloudflare that may miss an unknown amount of undetected bots? What I can see is that there is a certain continuous amount of bots as well as some spikes/waves per day which look like that:

Bildschirm­foto 2026-02-24 um 11.53.57.webp

These are the IP-checks against the proxycheck.io API which cover parts of my traffic while some is already blocked or let through by other means earlier. What one can see here are waves that happened in the middle of the night (so clearly not genuine traffic) and also, that a certain amount of them (green) is not identified by proxycheck.io (but for the most part later in the process). These spikes are pretty typical though they do happen at different times and to different frequencies and amounts.
 
I haven't dug deeper into the comparison of block requests and how many bad bots are actually blocked and aren't.
I do not block proxy or VPN outright but only present them with a "Managed Challenge" page.
 
I do not block proxy or VPN outright but only present them with a "Managed Challenge" page.
On my forum these are blocked for the most part, especially the VPNs apart from Apple's privacy relay. Apart from that logged in members do come through no matter which way they are using. I had rather preferred to have the VPNs challenged than blocked (same as you do) but the add on that I am using has a bit of a weird constraint and blocks VPN if I also want to be able to block ASNs (which I want), so I don't have the choice. What I've learned over the last weeks is that - in opposite to what I expected - almost all requests over a VPN (or what proxycheck.io identifies as such) do come from very dodgy ASNs, so clearly worth blocking. It seems that the scrapers (ot the services they are using) have build up a large network of VPN hosts in data centers all over the world (but mostly in developed countries), many of them in ASNs that are well known for malicious traffic and on top of that some with normal cloud providers or smaller hosters where they possibly burn through rented IPs and machines quickly until they get thrown out.
I barely see "normal" VPNs being blocked on my forum (though they would be as stated before). The only ones that I saw that were possibly normal legitimate guest users were the Proton VPN and the VPN that Opera offers as part of their web browser.

Possibly the trend towards using VPN hosts in dodgy data centers for scraping my forums may also be a result from the fact that I've geo-blocked quite a bunch of countries where they used resident proxies, i.e. Brazil and Argentina, but also a lot of other countries that have no relevant audience for me and had resident proxies showing up in my logs. The luxury of having a regional audience. Possibly they have realized that and therefor started to switch to other ways - unsure.

So it could possibly be worth having a closer look at the VPN situation on your forum. The good part about having a small forum is that with the limited traffic it acts basically as a bit of a laboratory were one can easily observe and spot things that would be lost in the noise of massive traffic with a forum the size of your's.
 
Last edited:
Just out of curiosity, I had a look at some requests which CloudFlare were sending to their AI Labyrinth.
There's nothing in the UA to indicate that they're AI scrapers, so presumably they're basing their "bad bot" status on either ASN, IP or some traffic analysis.
1772114467328.webp
I've noticed a lot of these bots are going direct to the reactions path for posts. Not sure what they hope to learn by scraping those, but I'm happy for CloudFlare to "help them" ;)
 
Back
Top Bottom