Crazy amount of guests

chillibear · Dec 5, 2025

Tamer said:
modern bots can slip past CDN-level checks. That’s exactly why the application-layer approach outlined by ES Dev Team is becoming increasingly important.

Totally agree. Were it not for the quantity and speed of the AI bot "browsing" and the subsequent load it'd not get flagged for the most part. Whilst annoying I tend to only cull the visits that cause heavy load at present whilst we (like ES Dev Team) ponder updating some of our code (also using Clickhouse FWIW) to be a little more intelligent. It does seem at least for now they are still going for direct requests so an analysis of traffic for a given "AI scraper" doesn't look like a normal user at least. However given the photos you see online of automated mobile phone farms and of course conventional headless browsers I'd not be surprised if we saw more stuff that is indistinguishable from normal traffic (except maybe in the speed). However I suppose making those extra requests for JS/Images/CSS and so forth must add up on the scraper's side so maybe they'll stick with what they have. It's actually been quite quiet on the scraping front for us at least the last week or two, just the normal better behaved bots.

I'd be quite interesting if XF internally had more sense of "usage checking" to identify real visitors from bots and so forth. I did start writing some statistical analysis code (outside of the XF codebase) for users at the start of the year to idly see if we might use it to highlight suspicious accounts, but alas "real work" got in the way and I've not gotten back to it yet.

smallwheels · Dec 5, 2025

chillibear said:
I'd not be surprised if we saw more stuff that is indistinguishable from normal traffic (except maybe in the speed).

Depending from the audience of your forum/website geographics may also be able to be an indicator. If i.e. I get a sudden rush of visits from countries that I normally get barely visits from this may indicate a bot wave. Even more if they come from the same IP-Range/ASN. Also I see the URLs that the are visiting. If very old threads are visited all of a sudden from a bunch of guests at once (all visiting the same threads) and these come from unusual locations I can be pretty sure these are bots. There are a lot of behavioral patterns in bot traffic that aggregated can be used to identify them. Unfortunately many of them are specific to the website/forum, so for the most part nothing cloudflare could do automatically.

chillibear said:
It's actually been quite quiet on the scraping front for us at least the last week or two

same here. A bunch of residential proxies in various countries at smaller scale, but nothing massive that I would have recognized since weeks.

Tamer · Dec 5, 2025

We’re seeing the same trend. For now the volume and request speed still make the scraping traffic identifiable, but that won’t last. That’s why behaviour-based filtering at the application layer is becoming essential beyond what a CDN can handle. @chillibear

ES Dev Team · Dec 5, 2025

I'm honestly a little terrified for the future of the internet when i see that so much of it has centralized on a single provider who, at the moment, seems to be slipping. Hope they get their act together because half the internet is at risk if they don't.

Unfortunately my current solution requires someone with linux skills to implement and tune.
Most people have not thought about this for a second so the number of certified fail2ban-fu black-belts is small.

Arrested Development Reaction GIF by MOODMAN

However with an optimal fail2ban tune, you can do 10%-20% better than cloudflare, because your server has a little more information to think and act on that cloudflare has. You will be missing a few deluxe features, but few people really need those anyway.

PM me if you are interested in obtaining a black belt in fail2ban-fu. I can provide:

a 1 hour live training and demonstration of how the system works
a ~3 page document that explains everything incase you forgot something from the training
traffic analysis scripts that help you tune fail2ban faster
a very good stock tune for a relatively big xenforo site.

The long term prospect for both my best fail2ban tune and cloudflare is that eventually both forms of protection are going to hit a wall within 1-2 years. Sophistication on the attacker's part is rising at a pace i've never seen before. And i project it to continue to go up over time.

In order to battle that sophistication, you need more information than what fail2ban or cloudflare can receive and act on. When you are in PHP land, you have that information at your fingertips and a reasonably fast programming language for which to make logical decisions. The challenge is writing and reading that data quick enough to not slow down the app.

I'm working on this database challenge as we speak. Exotic high performance/scale databases have been very disappointing so far. I found 2 routes to making mysql fast, which is great, because the system could run on shared hosting. I'll start another thread about this once i'm past the early concept stage.

MentaL · Dec 5, 2025

ES Dev Team said:
I'm honestly a little terrified for the future of the internet when i see that so much of it has centralized on a single provider who, at the moment, seems to be slipping. Hope they get their act together because half the internet is at risk if they don't.

Unfortunately my current solution requires someone with linux skills to implement and tune.
Most people have not thought about this for a second so the number of certified fail2ban-fu black-belts is small.

However with an optimal fail2ban tune, you can do 10%-20% better than cloudflare, because your server has a little more information to think and act on that cloudflare has. You will be missing a few deluxe features, but few people really need those anyway.

PM me if you are interested in obtaining a black belt in fail2ban-fu. I can provide:

a 1 hour live training and demonstration of how the system works

a ~3 page document that explains everything incase you forgot something from the training

traffic analysis scripts that help you tune fail2ban faster

a very good stock tune for a relatively big xenforo site.

The long term prospect for both my best fail2ban tune and cloudflare is that eventually both forms of protection are going to hit a wall within 1-2 years. Sophistication on the attacker's part is rising at a pace i've never seen before. And i project it to continue to go up over time.

In order to battle that sophistication, you need more information than what fail2ban or cloudflare can receive and act on. When you are in PHP land, you have that information at your fingertips and a reasonably fast programming language for which to make logical decisions. The challenge is writing and reading that data quick enough to not slow down the app.

I'm working on this database challenge as we speak. Exotic high performance/scale databases have been very disappointing so far. I found 2 routes to making mysql fast, which is great, because the system could run on shared hosting. I'll start another thread about this once i'm past the early concept stage.

CF have good products, but it is an issue when they take down half the internet that easily due to whatever. Competition is good and I am sure someone will come along one day and surpass it. The only way I can see CloudFlare having any issues is if the US goverment interject itself somewhere.

Suzanne O · Dec 6, 2025

Yeah noticed that it was down. I just wonder if it's because of the bots and guests problem though.
Because it sounds like it's been hacked as many other sites have gone down with it.

ES Dev Team · Dec 6, 2025

MentaL said:
CF have good products, but it is an issue when they take down half the internet that easily due to whatever. Competition is good and I am sure someone will come along one day and surpass it. The only way I can see CloudFlare having any issues is if the US goverment interject itself somewhere.

Something almost as bad as that is already happening.

It's the right call to make the long bet that a service like this would lead to centralization and eventually a government would come along to abuse that trust. FSM-Hotline being based in the EU is worrying for free speech if they start having scope creep.

I've been concerned for a long time that Cloudflare's free plan is a 'you are the product' situation, if not now, then later; they are a publicly traded corporation after all, and you know how that goes with the first one's free business.

I have a number of clients i manage software development and infrastructure for who don't accept the idea of sending all their traffic to a third party so i was forced to investigate on-server solutions and i continue to be surprised that my pile of configuration files and decades old technologies performs slightly better.

But my secondary objective is to ensure there is a way to protect decentralized systems in a decentralized way. If we lose the ability to do that, we're on the road to losing the internet.

So go, text files!

ES Dev Team · Dec 6, 2025

BTW there's another issue with cloudflare.
They're being forced to be intellectual property cop now too.
https://www.japantimes.co.jp/news/2025/11/20/japan/crime-legal/cloudflare-manga-piracy/

What's next?

Well i'm motivated to work on my alternative now, ha

BrettC · Dec 6, 2025

ES Dev Team said:
Honestly, it's pretty bad out there!

Did a revisit of that Anubis human-check project this evening, and someone recently posted an issue: https://github.com/TecharoHQ/anubis/issues/1313

They have some screenshots posted depicting what many of us are experiencing - massive amounts of guests / scrapers / LLM ingesting bots. However, their issue in terms of attack scale is[was] vastly larger than what I've seen personally - in terms of IP addresses! In particular this is alarming:

Yesterday we received a huge traffic wave that lasted 17 hours, from 14:00 to 06:00. We managed to mitigate the wave around 02:00.

[....]

In a 17 hours of time span, our server received 8 millions of queries originating from 680 000 different IPs, which represents 277 GB of Internet traffic. The log file size was 2.1 GB.

During those 17 hours, none of those IPs has individually exceeded 800 queries.

We’ve tried to identify a couple of culprit AS numbers, though there is a tremendous amount of involved ASes and none of them seems to really stand out among the others.

No Tencent, no Alibaba, no Huawei, no Chinanet (not in the « top ASes », at least).

This sort of IP range being utilized, spanning multiple providers, is plain alarming. Many of those top abusers on this persons AS list are among the same ASN's that I also have severely limited or flat out blocked due to abusive behavior.

ES Dev Team · Dec 6, 2025

Hmm, nightmare fuel for sure.

That's the enormous Brazil network of hacked devices.
AFAIK they are using something very close to a real browser and are more distributed than anything we've seen.
It's hard to defend against and requires low level data and behavioral analysis... or just banning a country.

I don't think anubis is designed to handle this.
But hopefully it put some strain on the attackers' cpus - if only more people would use this kind of computational captcha, it would be impossible to strip mine the internet like this.

I have some countermeasures against these guys but the slip through rate is high.

smallwheels · Dec 6, 2025

BrettC said:
Did a revisit of that Anubis human-check project this evening, and someone recently posted an issue: https://github.com/TecharoHQ/anubis/issues/1313

They have some screenshots posted depicting what many of us are experiencing - massive amounts of guests / scrapers / LLM ingesting bots.

Very interesting read. The Provider that was on top of their list - Host Royale - has been in my logs as well some weeks ago and I blocked it entirely - worth it. It is AS203020.

Levina · Dec 6, 2025

smallwheels said:
Very interesting read. The Provider that was on top of their list - Host Royale - has been in my logs as well some weeks ago and I blocked it entirely - worth it. It is AS203020.

I wish I could block the entire AS132203 range from Tencent. But I'm on XF Cloud so I can't.

Is it possible to do this with Cloudflare?

I have to say I'm still on the fence about Cloudflare, mostly because TANSTAAFL (I think it's just a matter of time), and I have concerns about my members' privacy.

Jja · Dec 6, 2025

Levina said:
Is it possible to do this with Cloudflare?

Yes. You can block or put entire ASN under challenge.

Chromaniac · Dec 6, 2025

honestly, ai bot thing is similar to what happened with wordpress blogs in early years. the spam was insane. it just kept coming and you spent a lot of time cleaning it up. this is time before akismet. bulk of the web connecting to cloudflare has its issues as we have seen in the last couple of days. but it is also one of the cheapest (free for most) and time saving solution to this problem. you can either spend hours in your logs blocking asns and ips manually or your own custom firewall type solution or you can focus on your community. there is no easy way out.

Maa · Dec 7, 2025

I came across this thread after researching 43.173.* as I'm also receiving a high amount of guests from the range.

Chromaniac · Dec 7, 2025

just blocked the asn. my logs were full of them as well lol. https://ipinfo.io/43.173.174.224

BrettC · Dec 7, 2025

Chromaniac said:
honestly, ai bot thing is similar to what happened with wordpress blogs in early years. the spam was insane. it just kept coming and you spent a lot of time cleaning it up. this is time before akismet. bulk of the web connecting to cloudflare has its issues as we have seen in the last couple of days. but it is also one of the cheapest (free for most) and time saving solution to this problem. you can either spend hours in your logs blocking asns and ips manually or your own custom firewall type solution or you can focus on your community. there is no easy way out.

Thankfully, most of these bad actors are from datacenter IP blocks, and remain centralized to certain datacenters. Thus, for normal end-users on a generic ISP, one does not need to worry about shunning legitimate users, unless they're using a VPN from these select datacenters. Additionally, blocking inbound only from those CIDRs will get the job done - no need for two-way (in & out) blocking.

Andy.N · Dec 7, 2025

Chromaniac said:
just blocked the asn. my logs were full of them as well lol. https://ipinfo.io/43.173.174.224

How do you block them?

BrettC · Dec 7, 2025

Andy.N said:
How do you block them?

I know you're asking @Chromaniac but for me, I am using a script called/invoked via a monthly cron to poll RADB for announced IP address ranges from an ASN, then parsing (deduplication, aggregation) that list. Once formatted to what I want, i then toss it into IPTables (nfTables) to block bad actor traffic. This goes well past using XenForo's IP Banning/blocking feature.

Chromaniac · Dec 7, 2025

as for me, i am using the cloudflare addon from digitalpoint and i just block the asn number using the firewall setting for 100 days. i do not like to do this. i personally hate seeing the cloudflare captcha or you're blocked message on every single link i visit these days. but my site is a low traffic one and does not really generate any revenue. i just like keeping it online. i just am not going to spend a lot of time managing a localized firewall on it

Crazy amount of guests

Well-known member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Active member

Well-known member

Active member

Well-known member

Active member

Well-known member

Similar threads

We value your privacy