Crazy amount of guests

LOL... I love finding those type, full of /24 ranges across all different countries. 100% dodgy actor, whether VPN or Proxy.
Yes, there are quite a bunch of remarkable patterns. Also, that there seems to be an invisible network of actors cooperating. For one, it is often a group of certain ASNs that show up as source of traffic at the same time nd sometimes there seems quite an overlap between them regarding the persons and companies behind them. Then you have a connection between the recent scraping attacks and bogus Crypto Companies - a couple of weeks ago I had an ASN that had changed ownership from an obvious Crypto-Company to a Scraper/Spam-Company and possibly the same goes for some of those companies offering resident proxies. Plus another often seen pattern: An ASN registered to a company (or rather a P/O box) in Panama or the Seychelles. It's really astonishing how easy it is to find patterns on the one hand and how nothing seems to be done by anyone against bad actors of all kinds on the other.
 
I have something to confess.
This giant distributed scraping wave is walking past some of my non-xenforo fail2ban setups which have a very high CPU load to hit ratio.
I moved them to cloudflare, and they are already complaining about some occasional outages in different regions and want my system back :(
Interestingly, fail2ban still catches plenty of stuff

My interest in finishing my prototype just went up. But business at the service company is starting to really kick up. If anyone is interested in some tag team action to build an alternative to Anubis in PHP, you have my ear.
 
I moved them to cloudflare (...) Interestingly, fail2ban still catches plenty of stuff

Opsie, Cloudflare still lets AI bots enter even with several preventive measures on the CF side.
It would be interesting to see if and how this changes, depending from the Cloudflare product (free vs. paid) one is using. According to their pricing matrix


Bildschirmfoto 2026-02-25 um 08.15.43.webp
Bildschirmfoto 2026-02-25 um 08.17.35.webp
Bildschirmfoto 2026-02-25 um 08.18.52.webp

one can probably not expect too much from the free tier as it only detects and stops simple bots. Juging from the description even the "Super Bot Fight Mode" in Business and Pro levels is rather marketing bling than a solution, judging from the description:

Bot Mitigation
Manage good and bad bots in real-time with speed and accuracy by harnessing the data from the millions of Internet properties on Cloudflare.

Content Scraping Protection
Protect all of your content including text, images and email addresses from web scrapers with Cloudflare's ScrapeShield™ service.


Free tier:
Bot Fight Mode
For an individual website. Challenge easy-to-detect bad bots from popular cloud providers.


Business and Pro tier (paid):
Super Bot Fight Mode
Block and challenge easy-to-detect bad bots from any source. Plus, bypass bot settings using WAF Custom Rules.


Only the enterprise tier offers more than that:
Bot Management
Manage AI crawlers and bot traffic to web and mobile apps without CAPTCHAs. Stop account abuse, malicious botnets, credential and card stuffing, content scraping, and inventory hoarding.


So it possibly comes down dot what they consider to be "easy to detect bots" and one can only hope that this is not limited to simple things like the transmitted user agent. In 2024 they invented the BotShield against AI Bots:


However - as we see obviously the current scraping attacks are not stopped by cloudflare as people report in the forum here.
Searching for "resident proxies" on cloudflare.com shows mainly one result fro 2024:


This refers to be used in the Product "bot management"

For existing Bot Management customers we recommend toggling “Auto-update machine learning model” to instantly gain the benefits of ML v8 and its residential proxy detection, and to stay up to date with our future ML model updates. If you’re not a Cloudflare Bot Management customer, contact our sales team to try out Bot Management.

and this again is included only in the highest paid plan (see table higher up in this post).


So there is no such thing as free lunch with Cloudflare as it seems. As usual: If you are not paying you are not the customer but the product. Cloudflare do offer their free tiers which offer some basic benefits - but they need those customers to be able to gain data and insights at scale that is then used in their paid products (only). Not surprising and in my eyes nothing to really complain about. Just somewhat surprising that a lot of people on this forums don't stop falsely claiming that Cloudflare's free tier would solve the bot problem.

The interesting question is how well Cloudflare detects bot and scraping traffic. According to their pretty interesting radar bot traffic makes slightly more than 30% of the requests currently:

Bildschirmfoto 2026-02-25 um 07.48.28.webp

Interestingly, this went down a bit. If I remember correctly it was up to 40% a couple of weeks ago. This does include all bots, legitimate bots like Googlebot as well as all sorts of shady ones. Judging from my own forums they do miss a fair bit of bot traffic then - I do have a share of on average ~40% of unwanted bot traffic (excluding bots like Google Bot or Bing Bot who alone visit countless times per day), and it goes up to above 70% on bad days - and still I do not catch all of them, mainly not being able to identify resident proxies from within central Europe reliably. Also, as I do block detected bots on their first request obviously the percentage of bot traffic on my forums is somewhat lower as it would be if I would leave them through, performing as many requests as they would like.

Obviously I don't know how "average" or typical my forum is compared to Cloudflare's average but judging from my numbers Cloudflare seems to miss a fair bit of bot traffic.

The source of bot traffic is mapped by Cloudflare like that:

Bildschirmfoto 2026-02-25 um 07.49.04.webp

Again a bit misleading, as it includes all bots, not just the bad ones. They also name the source ASNs:

Bildschirmfoto 2026-02-25 um 07.49.39.webp

These do not really fit the distribution I see on my forum. In contrast, they also map the percentage of bot traffic of all traffic per country and there you can smell the amount of resident proxies, especially in developing or smaller countries (along with the countries offering a lot of cloud datacenters and/or dodgy providers):


Bildschirmfoto 2026-02-25 um 08.00.06.webp

So overall it is somewhat unclear how comprehensively Cloudflare is able to detect bot traffic - but it seems that they possibly miss a fair bit on the one hand and that bot protection that tackles the current bot waves is only available in the highest paid plan (enterprise) anyway (and maybe possibly as a paid add on product on lower level plans). It remains somewhat vague what Cloudflare considers to be "simple bots" or "easy to detect bots", yet the level of protection offered by Cloudflare's free tier seems to be largely overrated when it comes to bots. The more as a lot of scraping bots lately claim to be able to overcome Cloudflare turnstile and other protection mechanisms like fingerprinting used to identify bots.
 
Last edited:
Yer, the bot traffic has to be a waste of CF resources too. They either need to roll out more advanced tools to everyone OR step up AI Labyrinth to tackle much much more.
 
Yer, the bot traffic has to be a waste of CF resources too. They either need to roll out more advanced tools to everyone OR step up AI Labyrinth to tackle much much more.
I'm not so sure: The free tier mainly offers customers increased speed and DDOS-protection, both pretty valuable features. Cloudflare on the other hand gathers huge amount of data to develop their premium products and is able to charge premium prices for them. Surely, with features becoming a commodity they could and probably will improve the features on the free plan - but, being relatively close to having a monopoly position they don't really have an incentive for that. In opposite: If only a small percentage of all websites have premium protection through the paid plans and most are left helpless against bots it is easy for Cloudflare to upsell into paid plans and less likely for bad actors to develop strategies that overcome Cloudflare's paid protection as it would only bring relatively small amounts of additional content to be scraped but need huge effort. So it is beneficial and pretty comfortable for Cloudflare as well as for it's paying customers to leave the free tier as it is w/o decently working protection against bots. Probably the benefits largely outweight the cost for Cloudflare.
 
It is btw. worth reading the Cloudflare blog from time to time. For one to see what they found out and are up to do, but also, as there ore are often hidden gems mentioned with links to interesting resources. For anyone tinkering with blocking bots using home grown tools i.e. these two could be of interest:

A curated list of AI-bot useragents, read to be used, in various formats:


A daily updated list of ~500.000 IPs of proxys used by bots and scrapers on different levels of reliabilty, ready to be used in custom solutions. not to be used to block blindly, bot possibly very helpful:


At the same time, Cloudflare give tipps how to make the contents of webpages easier and cheaper to access for AI-bots byy using markdown:


Btw. an approach that I found as well in the AI afficianado bubble. Wich (unwrittenly) intrinsically means: Not using/offering markdown makes it harder and more expensive for bots to scrape your content. Good to know if you don't want AI bots to make use of your content...

Other than that they proposed a cryptographic framework for authenticating bots:


In general, this sounds like an interesting idea - I had been thinking about such a concept as well some time ago. Surely, it won't solve the "bot pretends to be a human" problem but it would make segmentation easy to i.e. let Googlebot through reliably while blocking others or let them see only limited content. It could also be the foundation for a system where bots actually pay for visiting your site (very much like the token system used to today for using Ai models).

Interesting times...
 
FYI i'm on the pro plan and i'm getting effective protection after some tuning.
If they shove me into the business tier ( $200/mo ), i'm 100% building the alternative.
So far my 31 other servers don't need cloudflare, they are not running ultra slow magento :)
 
On the Pro plan, most of the time bots are blocked, and it's just a few times per week that bots are getting in for some reason.

7 days stats below: 5 million requests were successfully blocked.

1772056684950.webp
 
Honestly, amazing ratio of fraudulent traffic to not.
True and that ist actually a good point. I'd assume it is more likely for a bigger and more well known forum to get attacked by bots than a smaller one in the first place. But once once your forum is known to the scrapers (which will happen sooner or later) you will be hit, independent from the size. But their resources are not endless, so I'd assume there's a upper limit to it (as long as we are not taking about a dedicated DDOS but things like scraping or spamming). My forum is tiny: 3000 registered users, 250 of them visit per day, plus between the same and three times this amount of legitimate guests). So about 1.000 at maximum, typically less. However, including scrapers it is between 1.250 and 4.500 visitors per day, based on counting the number of different IPs. Which results in a deflection quote between 30 and somewhat over 70% because of them.

In comparison, @rdn has about 900k visitors per week and 500 million requests with 5 million threats according to Cloudflare, so a threat rate of just about 1% of the requests. Obviously requests are not directly comparable to visitor IPs but it seems clear, that the percentage of "bad" traffic on his forums is lower than on mine.

So possibly with the size of a forum from a certain point on the bad actors do not or cannot scale up accordingly to the size of the forum while with very small ones like mine their number is over proportional.

Yet @rdn says:

Opsie, Cloudflare still lets AI bots enter even with several preventive measures on the CF side.

so the number of of bad requests probably looks better that it is in reality and the question is how many baddies are not getting caught currently.

On the Pro plan, most of the time bots are blocked, and it's just a few times per week that bots are getting in for some reason.

How would you now that the bots are blocked (in opposite to i.e. "some bots" or "many bots")? In my case I can be pretty sure of the amount to a wide degree due to the regional nature of my forum: Most of the requests that do not come from my core region (and not from search engines) are bad actors plus almost all that come from data centers as well. On top of that come the resident proxies from inside my core region that I cannot identify reliably at the moment. How do you @rdn measure, that bots are blocked other than b the success number of Cloudflare that may miss an unknown amount of undetected bots? What I can see is that there is a certain continuous amount of bots as well as some spikes/waves per day which look like that:

Bildschirm­foto 2026-02-24 um 11.53.57.webp

These are the IP-checks against the proxycheck.io API which cover parts of my traffic while some is already blocked or let through by other means earlier. What one can see here are waves that happened in the middle of the night (so clearly not genuine traffic) and also, that a certain amount of them (green) is not identified by proxycheck.io (but for the most part later in the process). These spikes are pretty typical though they do happen at different times and to different frequencies and amounts.
 
I haven't dug deeper into the comparison of block requests and how many bad bots are actually blocked and aren't.
I do not block proxy or VPN outright but only present them with a "Managed Challenge" page.
 
I do not block proxy or VPN outright but only present them with a "Managed Challenge" page.
On my forum these are blocked for the most part, especially the VPNs apart from Apple's privacy relay. Apart from that logged in members do come through no matter which way they are using. I had rather preferred to have the VPNs challenged than blocked (same as you do) but the add on that I am using has a bit of a weird constraint and blocks VPN if I also want to be able to block ASNs (which I want), so I don't have the choice. What I've learned over the last weeks is that - in opposite to what I expected - almost all requests over a VPN (or what proxycheck.io identifies as such) do come from very dodgy ASNs, so clearly worth blocking. It seems that the scrapers (ot the services they are using) have build up a large network of VPN hosts in data centers all over the world (but mostly in developed countries), many of them in ASNs that are well known for malicious traffic and on top of that some with normal cloud providers or smaller hosters where they possibly burn through rented IPs and machines quickly until they get thrown out.
I barely see "normal" VPNs being blocked on my forum (though they would be as stated before). The only ones that I saw that were possibly normal legitimate guest users were the Proton VPN and the VPN that Opera offers as part of their web browser.

Possibly the trend towards using VPN hosts in dodgy data centers for scraping my forums may also be a result from the fact that I've geo-blocked quite a bunch of countries where they used resident proxies, i.e. Brazil and Argentina, but also a lot of other countries that have no relevant audience for me and had resident proxies showing up in my logs. The luxury of having a regional audience. Possibly they have realized that and therefor started to switch to other ways - unsure.

So it could possibly be worth having a closer look at the VPN situation on your forum. The good part about having a small forum is that with the limited traffic it acts basically as a bit of a laboratory were one can easily observe and spot things that would be lost in the noise of massive traffic with a forum the size of your's.
 
Last edited:
Just out of curiosity, I had a look at some requests which CloudFlare were sending to their AI Labyrinth.
There's nothing in the UA to indicate that they're AI scrapers, so presumably they're basing their "bad bot" status on either ASN, IP or some traffic analysis.
1772114467328.webp
I've noticed a lot of these bots are going direct to the reactions path for posts. Not sure what they hope to learn by scraping those, but I'm happy for CloudFlare to "help them" ;)
 
What is btw. somewhat interesting: In total, currently exist roughly 120.000 ASNs.

Bildschirm­foto 2026-02-27 um 16.37.32.webp

There was a massive rise over the years and about a quarter of them are in the US:
Bildschirm­foto 2026-02-27 um 16.37.44.webp


Both graphics taken from here:


If I leave out the resident proxies I have currently blocked ~300ASNs, give or take, and barely any malicious traffic does come through any more. Of these 300 a guesstimate of maybe 100 to 150 of them are really notorious, so provenly a constant source of malicious traffic. If I can identify those as a hobbyist running a small forum it should be possible for anyone. Seems like a bit of a mystery why they are not simply cut off - it is such a small number in comparison, even if it were three, four, five or ten times as many as I identified. A bit frustrating.

Regarding the resindt proxies: I've currently blocked ~115 ASNs of those but most are locked out through geo blocking. So this is an unknown number, it is way higher than those in my block list. Cloudflare say in their blog:

We started testing v8 in shadow mode in March 2024. Every hour, v8 is classifying more than 17 million unique IPs that participate in residential proxy attacks. Figure 4 shows the geographic distribution of IPs with residential proxy activity belonging to more than 45 thousand ASNs in 237 countries/regions. Among the most commonly requested endpoints from residential proxies, we observe patterns of account takeover attempts, such as requests to /login, /auth/login, and /api/login.

This is the graphic they mention:

1772207902929.webp

They do have some issues identifying those reliably but 45.000 ASNs is quite a number - that is more than a third of all ASNs. so clearly no easy thing to deal with. If I get this right resient proxies seem to hover around 17% of all traffic that goes through cloudflare.

Given that the blog post dates from June 2024 one can possibly assume that the situation has gotten way worse since then - judging from the forums here the majority seems to suffer from this kind of attacks to a massive amount only since autumn last year. The start on the other hand was earlier: According to Trend Micro the first offerings for these started in 2014.
 
If I can identify those as a hobbyist running a small forum it should be possible for anyone. Seems like a bit of a mystery why they are not simply cut off - it is such a small number in comparison, even if it were three, four, five or ten times as many as I identified. A bit frustrating.
Totally with you on the frustration, but I'm afraid that's where having a non-centralised global network of networks comes into play.

As you've found you are totally at liberty to not serve/accept traffic to those networks. Likewise the companies you choose to peer your network with you can select them based on who they themselves connect to. The same for the larger tier carriers you buy transit off are free to make the decisions. However on the larger scale those carriers maintain their market positions by being neutral. Individual companies are perfectly at liberty to block those networks and some may choose to, but there isn't a central "Internet police" to do that and I'm not quite sure there should be because as ever something like that would be open to abuse. As ever with things the balance between pros and cons to weigh up.

If you're not doing your own direct routing (ie BGP layer) and you have an upstream provider I'd approach them and show them the data you've gathered, explore if they'd like to use that to improve the quality of their network for all they customers. Generally once you manage to reach someone suitably knowledgeable you can have a proper discussion about their setup. Granted probably easier if it's a smaller ISP and not AWS or something :)

That all said I'm surprised services like cloudflare that seem targeted at the "clean up the crud" market are not unilaterally deciding to drop traffic, or offer to on their paid tiers (maybe they do?).

Tangentially I also think larger providers could do more to allow for better reporting. Now I'm guessing if I were to email the abuse contacts for one of the ASNs on your list I might not get that far since they whole lot are probably dubious. However when it comes to finding scam sites hosted on Azure, AWS, <insert other large "cloud" provider> I think they could be more proactive about taking down customers who are very clearly breaking their own T&C. Providing better routes for reporting might help that where the company you're reporting to are nominally "good guys". Right now I think they probably turn a blind eye to a lot of the crud they might be hosting because well they are paying the bills...
 
Right now I think they probably turn a blind eye to a lot of the crud they might be hosting because well they are paying the bills...
Absolutely. I've been dealing with abuse desks for more than 25 years, in the earlier parts of that as part of my job. Back then you had two types of providers: Black hat and white hat. Nothing in between.

The black hats (which were few and well known) would ignore you while all the others acted competently and quickly and it was easy to reach them as well. In the beginning you could sometimes just phone them as the number was in the whois database. Also, often the upstream of a filthy one was willing to act. It was a breeze, working with competent professionals with an attitude and the desire to keep their networks clean.

This has all changed long ago. Today I do not bother any more with informing abuse desks as it is a waste of time and most bigger companies have deliberately created processes that make reporting as cumbersome as possible - it is fruitless anyway.
 
Back
Top Bottom