Crazy amount of guests

For anyone who wants to stop all proxies, without issue or further concern: https://xenforo.com/community/resources/xtr-ip-threat-monitor.10134/
I am using it too and like it. Solid and does work very well since the latest versions. However: I would not sign this:

blocking all proxies and VPN's that are nasty
By default it misses most of the resident proxies. It does catch data center based stuff to a high degree. To block resident proxies you have to block ASNs or countries with all the colateral damage that this may cause.

Also, it get's the data what to block from the proxycheck.io API and to use this does cost money (free up to 1000 queries/day, but even my small forum has more that that). The amount of queries depends a bit from the settings of the add on, however, if you want to block ASNs you have to query every IP.

While I really like the add on there are some things that I would like to see, especially better analytics. Lets hope that development continues as fast as it was in the past.
 
I've seen 10,000's-100,000's of unique IP addresses at times, yeah, that wouldn't work for me.

I'm not surprised that residential proxies walk through it!
 
I've seen 10,000's-100,000's of unique IP addresses at times, yeah, that wouldn't work for me.
The pricing of proxycheck.io is affordable in my opion:


The add on builds up a blocking list and countries are checked via a local maxmind database in the meantime. After a couple of weeks and with the actual version of the add on roughly 2/3 of the IPs visiting my forum are checked against the API
I'm not surprised that residential proxies walk through it!
The initial idea of the add on was flood protection: Loads of requests from one IP within very short time lead first to captcha and then to block. So basically it initially targeted the classic scraper and not at all resident proxies. At least in my forum the classic scraping with loads of requests from a single IP does not happen any more.
ASN blocking as a feature was added after a feature request from my side and this is a life saver for me.
 
I did a bit of reading on the residential proxy issue, and it seems CF does have tools in place to identify them and place them into Labrinth. I guess any IP running a constant stream of activity, rate limited or not, would be identifiable. Unless you can randomise the activity from it, and are limiting the time use daily, randomising that too, then CF could certainly find patterns and send the IP's into neverland. Just another tick for using CF.
 
I guess any IP running a constant stream of activity, rate limited or not, would be identifiable.
I think if I was a huge wealthy company like CF I'd just create some natty little companies that signed up with the proxy companies and plonked traffic through "their" networks to identify a good quantity of the addresses. You might even then not actually do much about them, but you'd be a be able to look at the traffic patterns and see if there were any "tells" to use more generally.
 
I guess any IP running a constant stream of activity, rate limited or not, would be identifiable. Unless you can randomise the activity from it, and are limiting the time use daily, randomising that too, then CF could certainly find patterns and send the IP's into neverland. Just another tick for using CF.
The issue with resident proxies is that they are rotating very fast. Typically, a single IP does only one single request (and within that one single call that leaves out elements of the webpage like JS, tracking or pictures, depending from how the scraper is configured). One can often identify them in hindsight by that and by other patterns, for which one would have to have knowlege about content and structure of the webpage. I.e. there are often requests that target directly a user profile, a single picture, or an older, inactive thread and often you have two or more IPs requesting the same unusual target in parallel.

So there are possible ways to identify residential proxies by behaviour but the number of requests, the user agent or classical finger printing are often useless.

CF does not have the knowlege about the content and structure of the website or about typical behavior, so they lack a lot of options to identify patterns. That's why CF does - as it has been written in this thread for months - often fail to identify those proxies reliably. CF has the advantage of high numbers, so they can identify suspicious IPs by i.e. the behaviour of sending single requests to vastly different websites within short time - but again this does work only after some time.

They may (and hopefully will) have improved their abilities over the last months - but as it has been written in this thread in the past it is not reliable and, within the product range of CF, the free tier will probably not offer protection. The proxy providers advertise a success rate of 99+% and science says only 10% of the resident proxies get identified.

I'd assume most webpage providers do not even recognize this traffic, the more if it comes from countries, where they do have their normal audience. And those who do will have a hard time finding out who is legitimate and who is a scraper before serving content.
 
I think if I was a huge wealthy company like CF I'd just create some natty little companies that signed up with the proxy companies and plonked traffic through "their" networks to identify a good quantity of the addresses.
As written further up in this thread: This is what some companies (like i.e. proxycheck.io) do and it can possibly be assumed that CF does this as well. No doubt that it helps in many ways with diagnosis, however: The IPs change frequently, there is a ton of these proxy providers, they change their methods and behavior frequently and there is a broad range of different tools they use and requests they send. So it is a bit of a Hydra with many heads.

On a sidenote: We start to repeat everything as "new" what has already been mentioned and discussed further up the thread. So it seems the discussion is starting to grind to a halt, get in a loop and repeat itself.
 
Last edited:
CF does not have the knowlege about the content and structure of the website or about typical behavior, so they lack a lot of options to identify patterns. That's why CF does - as it has been written in this thread for months - often fail to identify those proxies reliably. CF has the advantage of high numbers, so they can identify suspicious IPs by i.e. the behaviour of sending single requests to vastly different websites within short time - but again this does work only after some time.

They may (and hopefully will) have improved their abilities over the last months - but as it has been written in this thread in the past it is not reliable and, within the product range of CF, the free tier will probably not offer protection. The proxy providers advertise a success rate of 99+% and science says only 10% of the resident proxies get identified.
Although CF "out-of-the-box" doesn't do anything particularly 'intelligent' with requests from residential proxies, by looking at the web server logs for stuff that is getting through, it's possible to add additional mitigation to Cloudflare rules, even on the free level.

At the moment, a lot of the traffic I'm seeing from residential proxies is extremely simple - in some cases to the point of being a bit bizarre.
For example, looking at our XF "guests" page, a large majority of the guests are "Viewing an error page" - from the web logs, these are typically 404s for requesting non-existent things in the image/link proxy. Similarly, I see lots of requests to attachments, goto/post and reactions with either no referrer or one which cannot be correct (e.g. root URL as referrer when our forum is not in the root). No genuine web browser would make these requests, and generally they're not interesting to genuine search indexers.
Use CF security rules to issue a "managed challenge" to these requests and it takes quite a lot of bot traffic out, with a completion rate of 0.01%
e.g. last 24 hours:

1773314835659.webp
This is probably more effective than trying to whack-a-mole on individual residential proxy IPs - we get 300,000+ "Unique Visitors" per day.
CF also makes it easy to block/challenge countries and ASNs, so the traffic from "bad actor" data centres is also massively reduced.

Even then, we still hit our largest number of "visitors online" (15,000+) in the past week.

There's not going to be one solution, it's probably going to take a combination, plus an acceptance that it's not actually a battle we can win - we put content on the web, people will steal it - it's always been the case. Hiding more content behind registration/login will impact SEO and lead to a drop in positive activity.
 
Last edited:
Maybe XF needs to add @Xon's sign up and abuse add on to their software.
Or maybe not. When you travel down this path, you endup where VB suffered their largest loss - they tried to do everything for everyone. It failed spectacularly.

Software has a purpose, a priority. The system has add-on capability, which is what XON's add-on is for. If you need it, use it, otherwise, don't.
 
A bit of a field report: When I looked into the dashboard of IP Thread Monitor routinely this morning something was unusual. Over the last couple of days it had been quiet on the scraping front and the massive amount of blocked countries and ASNs did their part to let my server live an easy life. However, tonight things changed a bit:

Bildschirmfoto 2026-03-18 um 08.48.16.webp

The number of IPs visiting in 24h had again gone up - lately it had been around 1.200, at max maybe 2.000. Typically around 700 get through, the rest is blacklisted. Over the last 30 days the statistics look like this: A deflection rate of more than 80%.

Bildschirmfoto 2026-03-18 um 08.49.18.webp

However, now I saw myself confronted with a 34% rate and 2.400 visits between midnight and six in the morning when only very few genuine visitors come to my forum. There are some, but few.

A check at the proxycheck.io dashboard showed a massive (for my environment) peak of different IPs showing up between one and two o'clock in the morning:

Bildschirmfoto 2026-03-18 um 08.50.10.webp

Clearly not normal traffic. Yet proxycheck.io had only identified a fraction of them as bad and even my country- and ASN blocking had let them through as well. Strange. No peak with registered visitors, so it had to be guests. No peak in the matomo statistics, so these visitors did not trigger the tracking. Time to dive into the rabbit hole.

I sshed into the server and went to analyze the web.log

Code:
 $ grep "18/Mar/2026:01:" web.log | grep " 200" | wc -l

gave me 5662 entries which resulted in a code 200 between 1 and 2 in the morning. Not good. The hour before it was 1631 (and there is the rest of the genuine users before going to bed included), the hour after gave me 287. So clearly, I got the time window.

Let's break it further down and so I did:

1:00 - 1:10 280
1:10 - 1:20 1751
1:20 - 1:30 1767
1:30 - 1:40 1630
1:40 - 1:50 39
1:50 - 2:00 195

So a time window of 30 minutes with way higher traffic than usual. In a bigger forum or one with an international audience, that is distributed though time zones probably no one would have noticed. Again the advantage of running a small local forum in laboratory mode. ;)

My 5562 entries came from 1373 different IP addresses. 1156 of them had just one single entry in the log file, so basically did one single call and another 70 had two entries - clearly not your genuine visitors.

I could already see from the hostnames in the log that most of them came from German DSL providers for private users - clearly resident proxies. Finally they got me: While I do successfully block resident proxies from a lot of countries by country or ASN blocking b/c I don't have regular visitors from there I cannot do that within Germany, as my core audience comes from there.

Havin in mind the claims of providers of resident proxy networks about millions of resident proxies within Germany I was curious which providers they were coming from. The admin's Swiss army knife, the combo of grep, awk, dig, netcat and a little shell scripting let me feed the IPs into the fabulous free service of team cymru to get the ASNs for the IPs in question and aggregate them. Turned out: Not too many surprises: Almost all of the requesting IPs were from private DSL connections while their respective owners enjoyed their sleep. The ASNs sorted by number of different requesting IPs during the timeframe:

355: AS3209 (Vodafone)
232: AS3320 (DeTAG Deutsche Telekom)
226: AS8881 (Versatel)
173: AS6805 (Telefonica Germany)
132: AS3133 (Kabel-Deutschland)
94: AS7922 (COMCAST) - an outlier from the US, traditionally called Spamcast since more than 20 years
92: AS60294 (DE-DGW Deutsche Glasfaser Wholesale)
41: AS46375 (Sonic Telecom LLC, US)
39: AS42184 (TKRZ Stadtwerke GmbH) - a small local provider that I never heard of before
27: AS202208 (teutel GmbH) - another small local provider
22: AS8374 (Plusnet) from Poland
14: AS207790 (SWN Stadtwerke Neumuenster GmbH) - another small local provider

There were a couple below ten IPs as well and these were small ISPs. The order list pretty much reflects market share within German DSL/Cable providers.

So the bad news is: There are indeed resident proxies in Germany and there are many. And I do currently have no tool to keep them out from my forum. Time to get creative.

The good news is: As I have limited guest viewing massively a couple of weeks ago they can scrape a bit, but not very much.

Given that all of that came out of nothing and peaked massively through distributed requests from loads of IPs it is pretty safe to assume that this was one single player, using zombie hosts to scrape my forum. I still can barely believe that those people rented out their internet connection as zombies knowingly.
 
Last edited:
Back
Top Bottom