Crazy amount of guests

modern bots can slip past CDN-level checks. That’s exactly why the application-layer approach outlined by ES Dev Team is becoming increasingly important.
Totally agree. Were it not for the quantity and speed of the AI bot "browsing" and the subsequent load it'd not get flagged for the most part. Whilst annoying I tend to only cull the visits that cause heavy load at present whilst we (like ES Dev Team) ponder updating some of our code (also using Clickhouse FWIW) to be a little more intelligent. It does seem at least for now they are still going for direct requests so an analysis of traffic for a given "AI scraper" doesn't look like a normal user at least. However given the photos you see online of automated mobile phone farms and of course conventional headless browsers I'd not be surprised if we saw more stuff that is indistinguishable from normal traffic (except maybe in the speed). However I suppose making those extra requests for JS/Images/CSS and so forth must add up on the scraper's side so maybe they'll stick with what they have. It's actually been quite quiet on the scraping front for us at least the last week or two, just the normal better behaved bots.

I'd be quite interesting if XF internally had more sense of "usage checking" to identify real visitors from bots and so forth. I did start writing some statistical analysis code (outside of the XF codebase) for users at the start of the year to idly see if we might use it to highlight suspicious accounts, but alas "real work" got in the way and I've not gotten back to it yet.
 
I'd not be surprised if we saw more stuff that is indistinguishable from normal traffic (except maybe in the speed).
Depending from the audience of your forum/website geographics may also be able to be an indicator. If i.e. I get a sudden rush of visits from countries that I normally get barely visits from this may indicate a bot wave. Even more if they come from the same IP-Range/ASN. Also I see the URLs that the are visiting. If very old threads are visited all of a sudden from a bunch of guests at once (all visiting the same threads) and these come from unusual locations I can be pretty sure these are bots. There are a lot of behavioral patterns in bot traffic that aggregated can be used to identify them. Unfortunately many of them are specific to the website/forum, so for the most part nothing cloudflare could do automatically.

It's actually been quite quiet on the scraping front for us at least the last week or two
same here. A bunch of residential proxies in various countries at smaller scale, but nothing massive that I would have recognized since weeks.
 
Last edited:
I'm honestly a little terrified for the future of the internet when i see that so much of it has centralized on a single provider who, at the moment, seems to be slipping. Hope they get their act together because half the internet is at risk if they don't.

Unfortunately my current solution requires someone with linux skills to implement and tune.
Most people have not thought about this for a second so the number of certified fail2ban-fu black-belts is small.

Arrested Development Reaction GIF by MOODMAN


However with an optimal fail2ban tune, you can do 10%-20% better than cloudflare, because your server has a little more information to think and act on that cloudflare has. You will be missing a few deluxe features, but few people really need those anyway.

PM me if you are interested in obtaining a black belt in fail2ban-fu. I can provide:
  • a 1 hour live training and demonstration of how the system works
  • a ~3 page document that explains everything incase you forgot something from the training
  • traffic analysis scripts that help you tune fail2ban faster
  • a very good stock tune for a relatively big xenforo site.

The long term prospect for both my best fail2ban tune and cloudflare is that eventually both forms of protection are going to hit a wall within 1-2 years. Sophistication on the attacker's part is rising at a pace i've never seen before. And i project it to continue to go up over time.

In order to battle that sophistication, you need more information than what fail2ban or cloudflare can receive and act on. When you are in PHP land, you have that information at your fingertips and a reasonably fast programming language for which to make logical decisions. The challenge is writing and reading that data quick enough to not slow down the app.

I'm working on this database challenge as we speak. Exotic high performance/scale databases have been very disappointing so far. I found 2 routes to making mysql fast, which is great, because the system could run on shared hosting. I'll start another thread about this once i'm past the early concept stage.
 
I'm honestly a little terrified for the future of the internet when i see that so much of it has centralized on a single provider who, at the moment, seems to be slipping. Hope they get their act together because half the internet is at risk if they don't.

Unfortunately my current solution requires someone with linux skills to implement and tune.
Most people have not thought about this for a second so the number of certified fail2ban-fu black-belts is small.

Arrested Development Reaction GIF by MOODMAN


However with an optimal fail2ban tune, you can do 10%-20% better than cloudflare, because your server has a little more information to think and act on that cloudflare has. You will be missing a few deluxe features, but few people really need those anyway.

PM me if you are interested in obtaining a black belt in fail2ban-fu. I can provide:
  • a 1 hour live training and demonstration of how the system works
  • a ~3 page document that explains everything incase you forgot something from the training
  • traffic analysis scripts that help you tune fail2ban faster
  • a very good stock tune for a relatively big xenforo site.

The long term prospect for both my best fail2ban tune and cloudflare is that eventually both forms of protection are going to hit a wall within 1-2 years. Sophistication on the attacker's part is rising at a pace i've never seen before. And i project it to continue to go up over time.

In order to battle that sophistication, you need more information than what fail2ban or cloudflare can receive and act on. When you are in PHP land, you have that information at your fingertips and a reasonably fast programming language for which to make logical decisions. The challenge is writing and reading that data quick enough to not slow down the app.

I'm working on this database challenge as we speak. Exotic high performance/scale databases have been very disappointing so far. I found 2 routes to making mysql fast, which is great, because the system could run on shared hosting. I'll start another thread about this once i'm past the early concept stage.

CF have good products, but it is an issue when they take down half the internet that easily due to whatever. Competition is good and I am sure someone will come along one day and surpass it. The only way I can see CloudFlare having any issues is if the US goverment interject itself somewhere.
 
CF have good products, but it is an issue when they take down half the internet that easily due to whatever. Competition is good and I am sure someone will come along one day and surpass it. The only way I can see CloudFlare having any issues is if the US goverment interject itself somewhere.

Something almost as bad as that is already happening.
To view this content we will need your consent to set third party cookies.
For more detailed information, see our cookies page.

It's the right call to make the long bet that a service like this would lead to centralization and eventually a government would come along to abuse that trust. FSM-Hotline being based in the EU is worrying for free speech if they start having scope creep.

I've been concerned for a long time that Cloudflare's free plan is a 'you are the product' situation, if not now, then later; they are a publicly traded corporation after all, and you know how that goes with the first one's free business.

I have a number of clients i manage software development and infrastructure for who don't accept the idea of sending all their traffic to a third party so i was forced to investigate on-server solutions and i continue to be surprised that my pile of configuration files and decades old technologies performs slightly better.

But my secondary objective is to ensure there is a way to protect decentralized systems in a decentralized way. If we lose the ability to do that, we're on the road to losing the internet.

So go, text files!

G7Qf_67XMAAtOtQ.webp
 
Last edited:
Honestly, it's pretty bad out there!
Did a revisit of that Anubis human-check project this evening, and someone recently posted an issue: https://github.com/TecharoHQ/anubis/issues/1313

They have some screenshots posted depicting what many of us are experiencing - massive amounts of guests / scrapers / LLM ingesting bots. However, their issue in terms of attack scale is[was] vastly larger than what I've seen personally - in terms of IP addresses! In particular this is alarming:
Yesterday we received a huge traffic wave that lasted 17 hours, from 14:00 to 06:00. We managed to mitigate the wave around 02:00.

[....]

In a 17 hours of time span, our server received 8 millions of queries originating from 680 000 different IPs, which represents 277 GB of Internet traffic. The log file size was 2.1 GB.

During those 17 hours, none of those IPs has individually exceeded 800 queries.

We’ve tried to identify a couple of culprit AS numbers, though there is a tremendous amount of involved ASes and none of them seems to really stand out among the others.

No Tencent, no Alibaba, no Huawei, no Chinanet (not in the « top ASes », at least).

This sort of IP range being utilized, spanning multiple providers, is plain alarming. Many of those top abusers on this persons AS list are among the same ASN's that I also have severely limited or flat out blocked due to abusive behavior.
 
Back
Top Bottom