Crazy amount of guests

Thankfully, using Anubis in a blanket-coverage configuration also nabs these residential proxy clients. Good clients also run thru the check, and on average takes anywhere between 3 and 15 seconds (depending on machine age).
This is very interesting! If that's true Anubis seems to be possibly the only tool that is able to filter them out currently. Could you explain a little more how that works? Does this mean a legitimate client will have to wait between 3 and 15 seconds before gaining access to the URL it requested?
 
Rather impossible for @CTS :


So as far as I know he doesn't have access to .htaccess (pun intended).
Oh, I seem to have skimmed over that when i was catching up on this thread. That unfortunate that such a feature is not offered on their cloud platform, no less the lack of being able to apply more extensible rules via .htaccess or the likes thereof.


This is very interesting! If that's true Anubis seems to be possibly the only tool that is able to filter them out currently. Could you explain a little more how that works? Does this mean a legitimate client will have to wait between 3 and 15 seconds before gaining access to the URL it requested?
It operates in a similar fashion to how Cloudflare's WAF check functions (just without a manual turnstile UI-bit). In the case of Anubis, it does not utilize any captcha, but operates on the basis of a challenge that has to be solved by a clients machine, automatically. There is no user interaction at all, unless the user clicks the show more details button.

You have the full capability to set who gets to be screened via advanced settings such as a cookie verification (e.g.: user is logged in --> bypass the check), how harshly to screen them (I want IP range of 10.0.0.0/24 to have Challenge of 3 and 10.10.0.0/24 a level of 16), what sections of your website is to be screened (I want /threads/* to be screened by Anubis and nothing else), etc.

And then you can set varying levels of challenge difficulties. Where 1 to 3 is generally easily solvable by most bots and clients, including these AI scrapers - takes milliseconds to complete. Levels of 4 to 5 begin to really slow down clients connecting to the service, but ultimately depends on CPU speed of the client machine. Majority of bots, including these AI/LM Scrapers have yet to get past a level of 5. I've seen very few get past a level of 4 - but it's very uncommon. Once you get into levels of 6 and higher on a challenge, it can generally take up to a minute to solve - which is not ideal for legitimate clients. A level of 16 effectively becomes 'impossible' to solve - excellent to use as a shun-list.

My only worry is if Anubis becomes too widely used, that these AI/LLM Botnets will be configured to wait it out to continue it's ingestion hell.
 
Actually, this is a predicament our website is facing: too many AI crawlers are frantically scraping website content for training. We've also been under this kind of attack. We have only dozens of website visitors, but over 10,000 bots, and that number is still growing. Another website, with over 300,000 'visitors,' is simply impossible. Finally, we used AI to write a small altcha verification program, successfully blocking these crawlers.
 
Yep, cloudflare, fail2ban, or something equivalent is now a prerequisite for running a website now.

Not a single one of the 32 servers i manage does not have this problem, in addition to lots of people proving for vulnerabilities and trying to break passwords.

So is it the case that everyone has to click a captcha?
I personally work hard to avoid having to do that, because it's not friendly.

I also think in the future, if the AI can figure out how to generate a working captcha system, it can definitely figure out how to complete the captcha.

What you probably want is more like a proof of work system such as anubis. The idea is to increase the computational cost on the attacker's side to the point that they are deterred from the site.
 
What you probably want is more like a proof of work system such as anubis.
As a side note, someone on Github posted this as a proof of concept:


A way to trip up scraper bots who get past other filtering/blocking, a method which uses CSS rather than JavaScript (in those cases where a human visitor has JS disabled, this would still work).

I found this linked under an issue posted for Anubis. And having read about what Anubis does...for the bots doing an end run around our other filtering/blocking, I like it. I just wish I had a spare server I could test this on. I don't really want to try this on production servers. And I also wonder how much of a load it would put on the system when there are typically 3,000+ legit users visiting (typically 33% logged in, the other 66% human guests).
 
Interesting.
I think a lot of scrapers are using real browsers via something like chromedriver. The chinese bot farms tend to use mobile phones because they're cheap. I wonder how effective it is.
 
Not sure if this was identified, but I found this in my logs previously. This mob Bucklog, was one of the offenders, literally hitting me thousands of times. They don't hit your domain, they hit your server IP directly, so Cloudflare or AS blocking, won't work. You have to block the CIDR ranges for their two servers in your server firewall, so it drops those IP's immediately and doesn't consume your PHP / DB resources.

170.39.217.0/24
185.177.72.0/24

It took me a while, because I had the AS blocked at Cloudflare, but they were still hitting the server. So then I thought they were routing through a non-proxied sub-domain on Cloudflare, grey, but that wasn't it either. Basically after sifting through lots of logs, found they were hitting the server IP directly and they are known for being nasty.
  • Owner/ASN: Bucklog SARL (AS211590), hosted in France (often listed as Paris area or Vélizy-Villacoublay).
  • Reputation: This entire /24 (185.177.72.0/24) is heavily flagged across threat intel sources (AbuseIPDB, CleanTalk, CrowdSec, SOCRadar, etc.) for spam, brute-force attacks, hacking attempts (e.g., probing /info.php.bak, common vuln paths), reconnaissance/scanning (e.g., Next.js metadata probing), and general malicious activity. It's been active in reports since mid-2025 and continues into 2026. High confidence of abuse—many sources treat the whole subnet as noisy/malicious background noise or bot/scanner traffic.
I had more than this doing it, but this was the main offender, showing thousands of visitors in my site at times. They were hitting me, then stopping, then hitting me, stopping. The CPU and DB loads were going insane. Again, not just this one, I had others in the thousands as well, but some of them were via the domain, so I could block them at the edge in CF, and some were doing similar, direct to the server IP and had to be blocked at the server firewall instead to drop immediately.
 
Back
Top Bottom