Crazy amount of guests

For anyone who wants to stop all proxies, without issue or further concern: https://xenforo.com/community/resources/xtr-ip-threat-monitor.10134/
I am using it too and like it. Solid and does work very well since the latest versions. However: I would not sign this:

blocking all proxies and VPN's that are nasty
By default it misses most of the resident proxies. It does catch data center based stuff to a high degree. To block resident proxies you have to block ASNs or countries with all the colateral damage that this may cause.

Also, it get's the data what to block from the proxycheck.io API and to use this does cost money (free up to 1000 queries/day, but even my small forum has more that that). The amount of queries depends a bit from the settings of the add on, however, if you want to block ASNs you have to query every IP.

While I really like the add on there are some things that I would like to see, especially better analytics. Lets hope that development continues as fast as it was in the past.
 
I've seen 10,000's-100,000's of unique IP addresses at times, yeah, that wouldn't work for me.

I'm not surprised that residential proxies walk through it!
 
I've seen 10,000's-100,000's of unique IP addresses at times, yeah, that wouldn't work for me.
The pricing of proxycheck.io is affordable in my opion:


The add on builds up a blocking list and countries are checked via a local maxmind database in the meantime. After a couple of weeks and with the actual version of the add on roughly 2/3 of the IPs visiting my forum are checked against the API
I'm not surprised that residential proxies walk through it!
The initial idea of the add on was flood protection: Loads of requests from one IP within very short time lead first to captcha and then to block. So basically it initially targeted the classic scraper and not at all resident proxies. At least in my forum the classic scraping with loads of requests from a single IP does not happen any more.
ASN blocking as a feature was added after a feature request from my side and this is a life saver for me.
 
I did a bit of reading on the residential proxy issue, and it seems CF does have tools in place to identify them and place them into Labrinth. I guess any IP running a constant stream of activity, rate limited or not, would be identifiable. Unless you can randomise the activity from it, and are limiting the time use daily, randomising that too, then CF could certainly find patterns and send the IP's into neverland. Just another tick for using CF.
 
I guess any IP running a constant stream of activity, rate limited or not, would be identifiable.
I think if I was a huge wealthy company like CF I'd just create some natty little companies that signed up with the proxy companies and plonked traffic through "their" networks to identify a good quantity of the addresses. You might even then not actually do much about them, but you'd be a be able to look at the traffic patterns and see if there were any "tells" to use more generally.
 
I guess any IP running a constant stream of activity, rate limited or not, would be identifiable. Unless you can randomise the activity from it, and are limiting the time use daily, randomising that too, then CF could certainly find patterns and send the IP's into neverland. Just another tick for using CF.
The issue with resident proxies is that they are rotating very fast. Typically, a single IP does only one single request (and within that one single call that leaves out elements of the webpage like JS, tracking or pictures, depending from how the scraper is configured). One can often identify them in hindsight by that and by other patterns, for which one would have to have knowlege about content and structure of the webpage. I.e. there are often requests that target directly a user profile, a single picture, or an older, inactive thread and often you have two or more IPs requesting the same unusual target in parallel.

So there are possible ways to identify residential proxies by behaviour but the number of requests, the user agent or classical finger printing are often useless.

CF does not have the knowlege about the content and structure of the website or about typical behavior, so they lack a lot of options to identify patterns. That's why CF does - as it has been written in this thread for months - often fail to identify those proxies reliably. CF has the advantage of high numbers, so they can identify suspicious IPs by i.e. the behaviour of sending single requests to vastly different websites within short time - but again this does work only after some time.

They may (and hopefully will) have improved their abilities over the last months - but as it has been written in this thread in the past it is not reliable and, within the product range of CF, the free tier will probably not offer protection. The proxy providers advertise a success rate of 99+% and science says only 10% of the resident proxies get identified.

I'd assume most webpage providers do not even recognize this traffic, the more if it comes from countries, where they do have their normal audience. And those who do will have a hard time finding out who is legitimate and who is a scraper before serving content.
 
I think if I was a huge wealthy company like CF I'd just create some natty little companies that signed up with the proxy companies and plonked traffic through "their" networks to identify a good quantity of the addresses.
As written further up in this thread: This is what some companies (like i.e. proxycheck.io) do and it can possibly be assumed that CF does this as well. No doubt that it helps in many ways with diagnosis, however: The IPs change frequently, there is a ton of these proxy providers, they change their methods and behavior frequently and there is a broad range of different tools they use and requests they send. So it is a bit of a Hydra with many heads.

On a sidenote: We start to repeat everything as "new" what has already been mentioned and discussed further up the thread. So it seems the discussion is starting to grind to a halt, get in a loop and repeat itself.
 
Last edited:
CF does not have the knowlege about the content and structure of the website or about typical behavior, so they lack a lot of options to identify patterns. That's why CF does - as it has been written in this thread for months - often fail to identify those proxies reliably. CF has the advantage of high numbers, so they can identify suspicious IPs by i.e. the behaviour of sending single requests to vastly different websites within short time - but again this does work only after some time.

They may (and hopefully will) have improved their abilities over the last months - but as it has been written in this thread in the past it is not reliable and, within the product range of CF, the free tier will probably not offer protection. The proxy providers advertise a success rate of 99+% and science says only 10% of the resident proxies get identified.
Although CF "out-of-the-box" doesn't do anything particularly 'intelligent' with requests from residential proxies, by looking at the web server logs for stuff that is getting through, it's possible to add additional mitigation to Cloudflare rules, even on the free level.

At the moment, a lot of the traffic I'm seeing from residential proxies is extremely simple - in some cases to the point of being a bit bizarre.
For example, looking at our XF "guests" page, a large majority of the guests are "Viewing an error page" - from the web logs, these are typically 404s for requesting non-existent things in the image/link proxy. Similarly, I see lots of requests to attachments, goto/post and reactions with either no referrer or one which cannot be correct (e.g. root URL as referrer when our forum is not in the root). No genuine web browser would make these requests, and generally they're not interesting to genuine search indexers.
Use CF security rules to issue a "managed challenge" to these requests and it takes quite a lot of bot traffic out, with a completion rate of 0.01%
e.g. last 24 hours:

1773314835659.webp
This is probably more effective than trying to whack-a-mole on individual residential proxy IPs - we get 300,000+ "Unique Visitors" per day.
CF also makes it easy to block/challenge countries and ASNs, so the traffic from "bad actor" data centres is also massively reduced.

Even then, we still hit our largest number of "visitors online" (15,000+) in the past week.

There's not going to be one solution, it's probably going to take a combination, plus an acceptance that it's not actually a battle we can win - we put content on the web, people will steal it - it's always been the case. Hiding more content behind registration/login will impact SEO and lead to a drop in positive activity.
 
Last edited:
Back
Top Bottom