Crazy amount of guests

Cloudflare has solved all the problems for us. It’s also filtering the vpn’s effectively so even the spoof registrations have dropped significantly.
 
We have completely blocked Bytespider, ByteDance, and BytePlus. They're just a waste of bandwidth and server resources.
.

How do you do it?

I of course have these agent strings banned but bytedance bots since last night are not identifying themselves as a bot in the browser string.
And a few new IP address ranges appeared while i was sleeping, so i didn't catch it.

I use apache .htaccess to ban those ranges, then fail2ban turns that into an iptables ban. Not the most efficient, but works and it's nice to just be able to edit a banfile and not have to restart processes etc.

A big problem on my server is that they always hit http:// first and this redirects to https://
This means 2 tcp/ip connections per hit and this botnet is huge so process limits, etc are easily overwhelmed.

I have some tcp/ip tuning that ups the available ports to 2048.. despite bumping up many apache settings.. it still chokes around 1000 connections total.

I am using the default mod_php and mpm_prefork and might have to go to the slower FPM and i'm not happy about it but..
The show must go on!
 
Last edited:
Cloudflare
Major AI Bots Blocked
GPTBot (OpenAI - ChatGPT)
ClaudeBot and Claude-Web (Anthropic - Claude)
CCBot (Common Crawl - used by many AI models)
Meta-ExternalAgent and Meta-ExternalFetcher (Meta - AI training)
Bytespider (ByteDance - TikTok/Doubao)
Amazonbot (Amazon)
Applebot (Apple)
Anthropic-AI
Google-Extended (Used for Bard/Gemini training)
Google-CloudVertexBot (Google Vertex AI)
PerplexityBot and Perplexity-User (Perplexity AI)
DuckAssistBot (DuckDuckGo)
TikTokSpider (ByteDance)
ImagesiftBot
 
Last edited:
You wont like it - Cloudflare
Just wondering how much they may have improved. A couple of weeks ago people reported here on the forum that cloudflare bot protection would not work against the latest stragegies of the scrapers (resident proxies). The huge wave of massive requests from single ip blocks that happened last year seem gone now for the most part - at least with my forum I haven't seen anything like that in a while. What I do however see is a permanent stream of requests from resident proxies from all around the world. I can identify them easily as my normal audicene is 99% from Germany, Austria and Switzerland. A couple of other countries occur as well - but not the countries I see and not the amount I see. So 99% of the requests that don't come from the DACH region are scrapers in my case.

The second pattern is, that each IP typically does only one single request, typically targeting an older thread or a posting within it, an attachment or a user profile. A normal visitor pushes a couple of requests for each page he visits, so it is easy to identify the bots - but only in hindsight. Cloudflare would have the possiblities to do this better but to do it really good they would have to know about the content they act as a proxy for which they hopefully don't.

So when you write
Cloudflare has solved all the problems for us.
I wonder if that is really the case or if you just don't see the scrapers any more as they now act like a swarm of moskitos and no longer like an elephant.
It’s also filtering the vpn’s effectively so even the spoof registrations have dropped significantly.
I have zil spam issues on my forum, despite not using cloudflare. I've been blocking malcious IP Ranges and ASNs for quite a while now pretty radically and it seems, that in fact most of the automated spam registration attempts seem to come from Russia, directly or indirectly. It seems to be only relatively small number of different actors, but they use IPs from all over the world including a lot of hosters that also seem to trace back to Russia in one way or another. The manual attempts seem a bit wider spread but often from India or Pakistan.

Ozzys Spaminator catches the bots reliably, a couple of countries are not allowed to register anyway and the occassional bad guy that get's around that get's caught by Xons Registration and Multiaccount Blocker. Not much to do for it however - maybe one or two over the last six months.

The resident proxies however are indeed a problem as they use normal dialup connections and the computers of regular home users that often won't know about it. They even use mobile phones and act in fact like a botnet as it was used for DDOS 20 years ago. Each request comes from a different machine and somewehre in the middle there's a spider that orchestrates this distributed scraping. Pretty hard to detect if you have a very international forum and pretty hard to get rid of, if you don't want to block private client networks / dialups to a massive extent, creating massive colateral damage. About half of the requests by resident proxies on my forum do btw. come from the US, form all major providers for private internet access as well as from a lot of smaller ones.

So when you say Cloudflare solved your bot-problem I ask: How do you know that it is solved?
 
Cloudflare
Major AI Bots Blocked
GPTBot (OpenAI - ChatGPT)
ClaudeBot and Claude-Web (Anthropic - Claude)
CCBot (Common Crawl - used by many AI models)
Meta-ExternalAgent and Meta-ExternalFetcher (Meta - AI training)
Bytespider (ByteDance - TikTok/Doubao)
Amazonbot (Amazon)
Applebot (Apple)
Anthropic-AI
Google-Extended (Used for Bard/Gemini training)
Google-CloudVertexBot (Google Vertex AI)
PerplexityBot and Perplexity-User (Perplexity AI)
DuckAssistBot (DuckDuckGo)
TikTokSpider (ByteDance)
ImagesiftBot
This is worth absolutely nothing. These are all bots that don't hide - you can simply block them yourself via .htaccess (and most of them even via robots.txt) within minutes. No need for cloudflare here. Those are for sure not the problem.
 
I of course have these agent strings banned but bytedance bots since last night are not identifying themselves as a bot in the browser string.
And a few new IP address ranges appeared while i was sleeping, so i didn't catch it.
I would block them proactively. Those are server ranges, so they should not visit your forum anyway. The IPs you posted belong to AS150436 and this belongs to Byteplus Pte. Ltd. (which is basically Bytedance). If you want to block the whole block based on IPs instead of simply the ASN you can look here:


and end up with:

Deny from 45.78.192.0/18
Deny from 69.5.0.0/20
Deny from 69.5.16.0/21
Deny from 69.5.24.0/23
Deny from 69.5.26.0/23
Deny from 69.5.28.0/23
Deny from 69.5.30.0/23
Deny from 71.18.227.0/24
Deny from 98.96.226.0/24
Deny from 98.98.103.0/24
Deny from 101.45.255.0/24
Deny from 101.47.0.0/19
Deny from 101.47.32.0/24
Deny from 101.47.33.0/24
Deny from 101.47.34.0/23
Deny from 101.47.36.0/22
Deny from 101.47.40.0/21
Deny from 101.47.48.0/20
Deny from 101.47.64.0/20
Deny from 101.47.80.0/21
Deny from 101.47.88.0/22
Deny from 101.47.92.0/23
Deny from 101.47.95.0/24
Deny from 101.47.96.0/23
Deny from 101.47.98.0/24
Deny from 101.47.128.0/18
Deny from 128.1.127.0/24
Deny from 128.1.169.0/24
Deny from 128.1.235.0/24
Deny from 129.227.102.0/24
Deny from 145.223.128.0/18
Deny from 150.5.128.0/17
Deny from 156.59.33.0/24
Deny from 163.7.0.0/17
Deny from 163.7.160.0/20
Deny from 163.7.176.0/20
Deny from 163.7.192.0/18
Deny from 187.42.0.0/17
Deny from 202.52.224.0/21
Deny from 202.52.252.0/22
Deny from 207.166.160.0/19
Deny from 216.19.0.0/18
Deny from 2401:4c20::/38

To be sure a quick check with bgp.tools does not hurt:


You'll see that it probably won't hurt simply blocking the whole thing.
 
Back
Top Bottom