Crazy amount of guests

Anthony Parsons · Thursday at 10:20 PM

I was just hit again recently, after about 45 days of nothing. They added about 150k daily uniques onto me, which with everything locked down to CF IP's only, /search/ managed challenged (failure rate went up with their attempts), and XTR Threat Monitor, they gave up after 8 days. That used to be a million+ when unprotected, which can't get through any longer. All I had to do was upgrade proxycheck.io account during the attack to ensure I had the API calls to them, and that was it. XTR Threat Monitor completely covered the residential proxy attack, banning close to all the additional daily attempted traffic from either VPN's or Proxies. Either way, the load on the server had zero change, their software got the message after error page on error page to just quit and move along

I just don't think this is that difficult to stop. Not saying it won't get harder in the future, but right now, my minimal unobtrusive setup allowing the world access (a handful of worst ASN's that are all servers, not ISP's, managed challenged) works against this. Not a single country blocked or managed challenged. I am not a server tech, I use AI to help guide me through logical steps of identifying the issue and then stopping it in a way to not screw real people or legit bots around.

Just pages back in this discussion I was getting 800k / 1M daily uniques, below I have just over 100k daily now (real traffic), 85% of traffic was garbage. I listened to the advice given here about locking down the server to CF, did that, that fixed a large problem, thank you, but the biggest was /search/ and combined with proxycheck.io, they're just nailing their accuracy for blocking residential proxies. /search/ was the majority of load on my server that I was fighting.

(http.request.uri.path contains "/search/" and not http.cookie contains "xf_user=") on a managed challenge is priceless. That removed this massive load of one action having to search, keyword refinement, sorting and ranking results and worse if using a DB vs ES, all before the page can be built and returned. Whilst I use ES, which has already mapped every word to a document and surpasses DB lookup, still a load when compounded per search query by bots attacking it.

Oh... I also fixed my caching in CF, taking it from about 35% to now 60%+ average.

My server is otherwise open to the entire world beyond these minimal measures, so no ISP user is getting stuffed around accessing the site, no cranky users. Pages back I started in a bad place, nearly two months on, server is tight against this problem with no real concern to worry about anything as the server hums along nicely and doesn't cause issues to any site or user on it. Backups are the biggest spike typically in any given day.

My proxycheck account was running in the business plans at one point, which is when I had had enough, and ChatGPT and Claude were engaged to sort the issue out, without affecting users. I could give them a log file or snapshot to assess issues in seconds, and they did just that.

This most recent attack attempt, confirmed actions taken are correct. I was wondering how long they would keep at it, and todays result is back to 100k uniques (not shown below). 8 days and their software gave up / someone looked at a flagged issue and cancelled my site, moved on. This is the confirmation I needed that my server is now in a good place, my users aren't being affected by all this AI scrapers and other nonsense.

Oh, in CF I have every AI bot on allow, I am not blocking any legitimate bot, they just don't access it much.

My point is: This really is a fixable issue. This thread helped me fix my server. I don't have to daily manage ASN's or CIDR's.

Anthony Parsons · 2026-05-08T04:07:01+0100

Update, todays analytics are in, just to demonstrate I am back to 100k now, and added my proxycheck.io stats to demonstrate my new normal is now around 50k uniques daily, the attempted attack I had to upgrade to the next level (160k) to cover the overhead, and now I am back to the 80k plan at $10 a month. Before locking down that /search/ I was at $50 a month for proxycheck, which was ridiculous.

ES Dev Team · 2026-05-08T05:04:40+0100

Are you caching the proxycheck responses?

Could make things a lot cheaper for you, but would require that you keep a database busy at times.

smallwheels · 2026-05-08T07:17:16+0100

Anthony Parsons said:
Update, todays analytics are in, just to demonstrate I am back to 100k now, and added my proxycheck.io stats to demonstrate my new normal is now around 50k uniques daily, the attempted attack I had to upgrade to the next level (160k) to cover the overhead, and now I am back to the 80k plan at $10 a month. Before locking down that /search/ I was at $50 a month for proxycheck, which was ridiculous.

If I get this right you get around 100k unique visitors normally now (as the CF log show) and only about half of them make it to the server (as the proxycheck logs show) b/c the other half is either served by CF completely or deflected by CF?

Anthony Parsons said:
XTR Threat Monitor completely covered the residential proxy attack, banning close to all the additional daily attempted traffic from either VPN's or Proxies.

Hmm, judging from the proxycheck logs over the last days (so during the scraping attack) if we assume the rise in visitors (or rather visiting IPs) are bots and not genuine visitors

CF let quite a bunch of them trough as the proxycheck logs show a rise in visitors from ~50k to 120-140k
also, the additional visitors are tagged "green" in the proxycheck logs which indicates that proxycheck did not block them either

So in opposite to what you say it seems that neither CF nor proxycheck did block the additional traffic successfully? So I dont' see that:

Anthony Parsons said:
This is the confirmation I needed that my server is now in a good place, my users aren't being affected by all this AI scrapers and other nonsense.

From my own proycheck logs I get the perception that proxycheck seems relatively good in blocking proxies and VPNs located in datacenters but I barely see any resident proxies from DSL/Cable networks being blocked. I do see the same rise in "green" traffic from time to time but do block quite a bunch of it through ASN blocking (so is green on proxycheck but still gets blocked) the ever growing IP blacklist and a lot by country blocking (which is done via the local IP database, so does not show up in the proxycheck logs at all).

Anthony Parsons said:
Just pages back in this discussion I was getting 800k / 1M daily uniques, below I have just over 100k daily now (real traffic), 85% of traffic was garbage.

I deflect between 40% and 85% of incoming IPs via proxycheck and IP Threat monitor (plus an unknown relatively small amount that is deflected earlier by .htaccess before it hits IP Threat Monitor). The 85% are during scraping waves, the average on normal days has now settled around 60%-70% deflection rate. However, as written before my absolut numbers are at a way smaller scale than your's.

I do still have a detection gap with resident proxies / scrapers from client networks within Central Europe and see that they pass through proxycheck undetected.

So while yout situation is no doubt way better than a couple of weeks ago I am not so sure that you achieved protection from scraping and bots.

Apart from that there is a conceptional topic in IP Threat Monitor that could/would become an issue if it would detect proxies in client networks successfully: I do add blocked IPs to the blacklist autmatically and this list gets never cleaned out. Saves on API calls but if this would happen with client network IPs it would create an issue as those are reassigned frequently, so over time one would block legitimate users.

Anthony Parsons · 2026-05-08T08:42:00+0100

smallwheels said:
If I get this right you get around 100k unique visitors normally now (as the CF log show) and only about half of them make it to the server (as the proxycheck logs show) b/c the other half is either served by CF completely or deflected by CF?

With around a 200k average on days of attack, and proxy check around the 110k active in a 24hr period, that is pretty solid. It got a good amount of them covered.

smallwheels said:
So in opposite to what you say it seems that neither CF nor proxycheck did block the additional traffic successfully? So I dont' see that:

I'm not saying XTR is bullet proof, but it does a pretty good job of sorting the rubbish from the good, for the most part. What you miss on those graphs, is what XTR picks up and blocks because they exceed rate limiting. Proxycheck says good, rate limiting at the site says bad and blocks them. That doesn't have a graph, and it works effectively.

After I implemented the /search/ block and watched everything fall off a cliff, I disabled it to see what would happen, and it took a week, and traffic slowly increased again after the bots were getting 200 ok on /search/. Enabled it, they went away again. That was probably my single biggest effect to decline the rubbish, as I had 700k+ of bots hitting /search/ to quickly grab all the content they could. Bing was actually one of the worst.

smallwheels · 2026-05-08T10:53:09+0100

Anthony Parsons said:
What you miss on those graphs, is what XTR picks up and blocks because they exceed rate limiting.

This barely happens on my forum as the client based resident proxies typically do just one single request. What causes the rate limiting to be triggered are for the most part pretty weird series of requests for the graphic "apple-touch-icon.png", often with a user agent hinting to what's app sharing (which may be spoofed).

I did have a spike yesterday as well, which got completely blocked:

Bildschirmfoto 2026-05-08 um 11.32.25.webp

At peak it were more than 900 requests in parallel within a single minute. Somewhat unusually they came from datacenters in various countries (for the most part Germany and France), with the IPs all belonging to

AS51167 Contabo GmbH

Proxycheck i.o. discovered quite a bunch of them but by far not all. As I have blocked the ASN for quite some time due to scraping attempts in the past they hit locked doors:

Bildschirmfoto 2026-05-08 um 11.29.57.webp

Bildschirmfoto 2026-05-08 um 11.30.13.webp

Maybe I'll give the Contabo Abuse a try - these were thousands of IPs and Contabo is a German company, so interesting what they have to say...

wolfgangm · 2026-05-08T16:07:16+0100

At sagen.info we have a mechanical HD in the server which beginns to suffer at about 1.000 guests and suffers hard above 2.500 guests. Usually we have about 10 human guests during day.
Together with ChatGPT we created a simple script, that enables UAM at 800 guests for two hours and switches it off again.
Could post it if intrested
Wolfgang

Anthony Parsons · 2026-05-08T22:29:26+0100

smallwheels said:
AS51167 Contabo GmbH

Yer, I have that ASN challenged from memory. Servers. I originally started challenging CIDR blocks from it, but I found they just shifted, so wiped the ASN. Just opening my managed challenge, 10k with a solve rate of 13. I have been really careful with what I add to it, hence not much in it, but what is, its effective. My /search/ challenge, of the 3k on it, 10 solved it. That /search/ challenge events is scattered with random countries, constantly testing it to see if its still there, along with primarily US.

BrettC · 2026-05-08T22:43:03+0100

Curious about something.... who here still has port 80 (HTTP) open with a redirect to port 443 (HTTPS) for their websites? I ask this as I came across a potentially interesting finding due to a misconfiguration booboo on my end this week. I consolidated some configuration files, and apparently listen 800 was obviously not listen 80.

Many of these ResiProxies and Chinese AI/LLM scrapers utilizing forged UAs were hammering port 80 - not port 443. I had only noticed it after I was noticing my upstream servers throwing connection refused errors on a couple near-dead domain names that I manage. When i kicked on port 80 for said websites, Anubis and nginx bad bot blocker had a heyday, all ended up with ConfigServer rules triggering in mass.

If anyone else would like to experiment, please feel free to give it a whirl with port 80 filtering versus abusive scraper bots trying to make their way in.

chillibear · 2026-05-09T00:23:47+0100

Huge numbers of our scrapers hit 80 first, to be honest it's one of the ways I spot them. I've no idea why they hit that first. Still given each IP only does between maybe 4 and 20 requests in a a 24 hour period unless we're having a particularly heavy time of it I don't bother playing whack-a-mole much.

DarkGizmo · 2026-05-09T01:20:17+0100

I just got hit again and my site is inaccessible. I need to upgrade to a VPS sooner rather than later I think.

Anthony Parsons · 2026-05-09T11:36:33+0100

DarkGizmo said:
I just got hit again and my site is inaccessible. I need to upgrade to a VPS sooner rather than later I think.

Your site should easily run on a $12 a month VPS, 1 core 2GB RAM as NGINX.

Anthony Parsons · 2026-05-09T11:39:46+0100

BrettC said:
Curious about something.... who here still has port 80 (HTTP) open with a redirect to port 443 (HTTPS) for their websites? I ask this as I came across a potentially interesting finding due to a misconfiguration booboo on my end this week. I consolidated some configuration files, and apparently listen 800 was obviously not listen 80.

I asked Claude, interesting question.

Yes, you still need port 80 open, and here's why:

The redirect itself has to come from somewhere. When a browser first hits your domain, it sends an unencrypted HTTP request to port 80. Your server needs to be listening on port 80 to receive that request and respond with the 301/302 redirect pointing to HTTPS on port 443. If port 80 is closed, the connection is refused before any redirect can happen — the browser just gets an error.

The typical flow is:

Browser requests http://yourdomain.com → port 80
Server (listening on 80) responds: "301 Moved Permanently → https://yourdomain.com"
Browser follows redirect to port 443
Encrypted connection established

The only scenario where you could close port 80 is if all your users are accessing via HTTPS directly, and you've implemented HSTS (HTTP Strict Transport Security) with a long max-age. Once a browser has seen your HSTS header, it will automatically go straight to 443 on future visits without touching port 80. But:

First-time visitors still need port 80 for the initial redirect
Any user who hasn't visited before, or has cleared their browser data, will hit port 80
Monitoring tools, bots, and various services often probe port 80

Bottom line: Keep port 80 open, have it redirect to 443, and layer HSTS on top for returning visitors. It's the standard setup for good reason.

Crazy amount of guests

Anthony Parsons

Well-known member

Anthony Parsons

Well-known member

ES Dev Team

Well-known member

smallwheels

Well-known member

Anthony Parsons

Well-known member

smallwheels

Well-known member

wolfgangm

Member

Anthony Parsons

Well-known member

BrettC

Active member

chillibear

Well-known member

DarkGizmo

Well-known member

Anthony Parsons

Well-known member

Anthony Parsons

Well-known member

Similar threads

We value your privacy