#bytedance bots - unlabeled 01/25/2026 - DS
Deny From 45.78.192.0/18
Deny From 101.47.0.0/18
Deny from 101.47.112.0/20
Deny from 101.47.128.0/17
We have completely blocked Bytespider, ByteDance, and BytePlus. They're just a waste of bandwidth and server resources.
.
You wont like it - CloudflareHow do you do it?
Just wondering how much they may have improved. A couple of weeks ago people reported here on the forum that cloudflare bot protection would not work against the latest stragegies of the scrapers (resident proxies). The huge wave of massive requests from single ip blocks that happened last year seem gone now for the most part - at least with my forum I haven't seen anything like that in a while. What I do however see is a permanent stream of requests from resident proxies from all around the world. I can identify them easily as my normal audicene is 99% from Germany, Austria and Switzerland. A couple of other countries occur as well - but not the countries I see and not the amount I see. So 99% of the requests that don't come from the DACH region are scrapers in my case.You wont like it - Cloudflare
I wonder if that is really the case or if you just don't see the scrapers any more as they now act like a swarm of moskitos and no longer like an elephant.Cloudflare has solved all the problems for us.
I have zil spam issues on my forum, despite not using cloudflare. I've been blocking malcious IP Ranges and ASNs for quite a while now pretty radically and it seems, that in fact most of the automated spam registration attempts seem to come from Russia, directly or indirectly. It seems to be only relatively small number of different actors, but they use IPs from all over the world including a lot of hosters that also seem to trace back to Russia in one way or another. The manual attempts seem a bit wider spread but often from India or Pakistan.It’s also filtering the vpn’s effectively so even the spoof registrations have dropped significantly.
This is worth absolutely nothing. These are all bots that don't hide - you can simply block them yourself via .htaccess (and most of them even via robots.txt) within minutes. No need for cloudflare here. Those are for sure not the problem.Cloudflare
Major AI Bots Blocked
GPTBot (OpenAI - ChatGPT)
ClaudeBot and Claude-Web (Anthropic - Claude)
CCBot (Common Crawl - used by many AI models)
Meta-ExternalAgent and Meta-ExternalFetcher (Meta - AI training)
Bytespider (ByteDance - TikTok/Doubao)
Amazonbot (Amazon)
Applebot (Apple)
Anthropic-AI
Google-Extended (Used for Bard/Gemini training)
Google-CloudVertexBot (Google Vertex AI)
PerplexityBot and Perplexity-User (Perplexity AI)
DuckAssistBot (DuckDuckGo)
TikTokSpider (ByteDance)
ImagesiftBot
I would block them proactively. Those are server ranges, so they should not visit your forum anyway. The IPs you posted belong to AS150436 and this belongs to Byteplus Pte. Ltd. (which is basically Bytedance). If you want to block the whole block based on IPs instead of simply the ASN you can look here:I of course have these agent strings banned but bytedance bots since last night are not identifying themselves as a bot in the browser string.
And a few new IP address ranges appeared while i was sleeping, so i didn't catch it.
I am using the default mod_php and mpm_prefork and might have to go to the slower FPM and i'm not happy about it but..
The show must go on!
Could it be worth using a combination of Apache for your secure stuff and general serving and then something a little more lightweight such as Nginx to just sit and do 301/302 redirects on port 80 and nothing more? Granted that's not getting rid of unwanted traffic, but it might avoid tying up those (larger because of mod_php) Apache processes on really mundane stuff.A big problem on my server is that they always hit http:// first and this redirects to https://
This means 2 tcp/ip connections per hit and this botnet is huge so process limits, etc are easily overwhelmed.
Alas it's now an easy to purchase "service", there are several such as this around now. I would assume (given even I've thought about doing it) that Cloudflare have subscriptions to these services and use those subscriptions to identify the IP addresses in use and at least weight that in their analysis. I can't see of another really easy way except for large scale access pattern analysis (as you've already mooted) to identify compromised (well not really since I assume these are either paid for lines or ones where the home user is being paid for their use - probably against the T&C) "home" IP addresses. Alas I doubt it'll get better as more of the world hooks up to faster home connections.What I do however see is a permanent stream of requests from resident proxies from all around the world.
I would block them proactively. Those are server ranges, so they should not visit your forum anyway. The IPs you posted belong to AS150436 and this belongs to Byteplus Pte. Ltd. (which is basically Bytedance). If you want to block the whole block based on IPs instead of simply the ASN you can look here:
and end up with:
Deny from 45.78.192.0/18
Deny from 69.5.0.0/20
Deny from 69.5.16.0/21
Deny from 69.5.24.0/23
Deny from 69.5.26.0/23
Deny from 69.5.28.0/23
Deny from 69.5.30.0/23
Deny from 71.18.227.0/24
Deny from 98.96.226.0/24
Deny from 98.98.103.0/24
Deny from 101.45.255.0/24
Deny from 101.47.0.0/19
Deny from 101.47.32.0/24
Deny from 101.47.33.0/24
Deny from 101.47.34.0/23
Deny from 101.47.36.0/22
Deny from 101.47.40.0/21
Deny from 101.47.48.0/20
Deny from 101.47.64.0/20
Deny from 101.47.80.0/21
Deny from 101.47.88.0/22
Deny from 101.47.92.0/23
Deny from 101.47.95.0/24
Deny from 101.47.96.0/23
Deny from 101.47.98.0/24
Deny from 101.47.128.0/18
Deny from 128.1.127.0/24
Deny from 128.1.169.0/24
Deny from 128.1.235.0/24
Deny from 129.227.102.0/24
Deny from 145.223.128.0/18
Deny from 150.5.128.0/17
Deny from 156.59.33.0/24
Deny from 163.7.0.0/17
Deny from 163.7.160.0/20
Deny from 163.7.176.0/20
Deny from 163.7.192.0/18
Deny from 187.42.0.0/17
Deny from 202.52.224.0/21
Deny from 202.52.252.0/22
Deny from 207.166.160.0/19
Deny from 216.19.0.0/18
Deny from 2401:4c20::/38
To be sure a quick check with bgp.tools does not hurt:
AS150436 Byteplus Pte. Ltd. - bgp.tools
AS150436 (Byteplus Pte. Ltd.)'s is a 3 year old BGP network that is peering with 75 other networks and has 9 upstream carriersbgp.tools
You'll see that it probably won't hurt simply blocking the whole thing.
Could it be worth using a combination of Apache for your secure stuff and general serving and then something a little more lightweight such as Nginx to just sit and do 301/302 redirects on port 80 and nothing more? Granted that's not getting rid of unwanted traffic, but it might avoid tying up those (larger because of mod_php) Apache processes on really mundane stuff.
Alas it's now an easy to purchase "service", there are several such as this around now. I would assume (given even I've thought about doing it) that Cloudflare have subscriptions to these services and use those subscriptions to identify the IP addresses in use and at least weight that in their analysis. I can't see of another really easy way except for large scale access pattern analysis (as you've already mooted) to identify compromised (well not really since I assume these are either paid for lines or ones where the home user is being paid for their use - probably against the T&C) "home" IP addresses. Alas I doubt it'll get better as more of the world hooks up to faster home connections.
That definitely helped me thanks to your infoThey ignore robots.txt. Use .httacess.
Post in thread 'How to block Robot ByteDance' https://xenforo.com/community/threads/how-to-block-robot-bytedance.231581/post-1749652
It certainly raises an eyebrow doesn't it. Well there are half a dozen companies offering proxy services. The one I linked to actually uses https://pawns.app/ to supply the end user client devices and bandwidth - in essence end-users get $0.20 per GB of bandwidth they supply. Hell if you're on an unlimited line, why not I can hear many people saying (although I wonder if the T&C for residential lines might prohibit that strictly speaking). The others I'm aware of are https://www.nimbleway.com/pricing, https://netnut.io/static-residential-proxies/ and https://asocks.com/en/ourproxy/, but I imagine there are plenty of others. The whole using end-user connections is an interesting one and certainly has its legitimate uses. I use Global Ping for instance to debug routing issues a couple of times a year.Ugh, awful. How is that even legal.
FreeBSD on our own hardware here split over a couple of DCs.ubuntu server on AWS
What I do however see is a permanent stream of requests from resident proxies from all around the world. I can identify them easily as my normal audicene is 99% from Germany, Austria and Switzerland. A couple of other countries occur as well - but not the countries I see and not the amount I see. So 99% of the requests that don't come from the DACH region are scrapers in my case.
165.22.177.180 - - [03/Feb/2026:09:42:48 +0000] "GET /forums/proxy.php?image=https%3A%2F%2Fassets.rebelmouse.io%2FeyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbWFnZSI6Imh0dHBzOi8vYXNzZXRzLnJibC5tcy8yMTE2MTg0Ny9vcmlnaW4uanBnIiwiZXhwaXJlc19hdCI6MTYyMTM1MDk1MH0.lVlxgvI6iHOD1y2TY0TvgL8hPZgkujy1HCwpSoA1DxQ%2Fimg.jpg%3Fwidth%3D1200%26coordinates%3D0%252C40%252C0%252C40%26height%3D600&hash=736cce0227fae27ea58e52c7297641b9&return_error=1 HTTP/2.0" 404 5 "https://www.ourdomain.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
68.183.128.243 - - [03/Feb/2026:09:42:48 +0000] "GET /forums/proxy.php?image=https%3A%2F%2Fsistemaplastics.com%2F%2Fimages%2Fresizer_cache%2Fassets%2Fproducts%2FMICROWAVE%2FCOLOURED_Microwave%2F21117_EasyEggs_MicrowaveColoured_Purple_Wrap_Vent_258_350_90.jpg&hash=89adef1c16fc02b4de0170b25fd1e9b2&return_error=1 HTTP/2.0" 404 5 "https://www.ourdomain.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
157.245.134.15 - - [03/Feb/2026:09:42:48 +0000] "GET /forums/proxy.php?image=https%3A%2F%2Fblog.datawrapper.de%2Fcow-milk-and-vegan-milk-alternatives%2F..%2Fimg%2Ffavicon.ico&hash=7fe76782378136df9e33a8a3366f698e&return_error=1 HTTP/2.0" 404 5 "https://www.ourdomain.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"
5.36.154.175 - - [03/Feb/2026:09:42:48 +0000] "GET /forums/proxy.php?image=https%3A%2F%2Fwww.independent.ie%2Fbusiness%2Fbrexit%2Ff4511%2F39838816.ece%2FAUTOCROP%2Fw1240h700%2FTim_Cullinan&hash=232e0ff6fa4db2f841f63f9d9cb2747d&return_error=1 HTTP/2.0" 404 5 "https://www.ourdomain.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
Yep can confirm that in our traffic patterns and load over the last week or two. Evidently time for an AI refresh or something!A member:guest-ratio of 1:400 seems pretty insane for a forum like this. So there seems to be another big wave in process at the moment.
Realistically there are only three options that spring to mind:There seems not much you can do effectively against them if you can't block those providers b/c you would block your regular users as well then.
Yep. Given there are about 140M IP addresses currently associated with Germany having 10% of those compromised like this seems off, especially when you look as you rightly point out the households (and population is about double that) seems too boastful.One of the sellers of resident proxies claims, he would have 14 million resident proxies available in Germany alone wich seems way too high and not plausible at all, given that there are only about 41 million private households within Germany.
While I absolutely agree that behavioural pattern matching is probably the most promising way to go I think this is pretty tough (and in most cases too tough) for a hobby admin. But it could be worth checking if there is not already one ore more open source projects around the topic. On could even use AI for that.Realistically there are only three options that spring to mind:
- More detailed user profiling - ie a "normal" user will request HTML, JS, Images, send cookies and so forth that make sense in a normal user journey. Direct scrapers may fall down on this, but given those mobile phone farms you see in action it's clear some of this is going to be genuine browsers probably doing all that anyway. If it's a real browser being driven then it'll be nearly impossible I think to identify if it's a human or good simulation driving that browser. I think on this front we just have to hope that for them it's not worth the effort of doing a "really good job" and we can spot the mistakes.
I know of at least one IP reputation service that does that and would assume that others do that, too. However, as I am using said service I can safely say that until now they only identify a fraction of a fraction of the resident proxies. Barely noticable.
- Signing up to some of the proxy services and sending "test" traffic through them to identify the IP addresses and publishing them. Not that I really want to sign up with any of these rather dodgy seeming companies, but I don't really see a way of identifying the IPs on their books otherwise. Someone may already do this and I really should properly look.
Could be - information sharing in an automated way could be interesting. It is however challenging, as most of the hosts are short lived. And a huge project to implement.
- Data sharing (ala stopforumspam) type services, but this feels quite a nuanced issue and I'd worry about high false positives. For instance this week we've had a load of traffic from Vietnam, but I know we do have two or three legitimate members there. So evidently any blacklisting for me would need to have holes punched for their IPs, which may well change. So it's a bit of a moving target. Still an "abnormal for my forum" type data feed is a possibility I guess.
We use essential cookies to make this site work, and optional cookies to enhance your experience.