There have been a few Markov chain type babblers done that generate realistic junk pages to tarpit scrapers and so forth you could investigate and see if any are more than idle experiments. I recall reading about a few some time back.poison the content. Send identified scrapers into a huge mess of false, halftrue or completely made up information and let they scrape it to poison the AI models
VN has been quite a bugger lately lol.I have a small forum, usually a few hundred guests at most. The last days the number of guests have been unusually high, around 3000 and climbing. We now have over 4800 guests. That's not normal. Many "Viewing unknown page" or "Latest content", with a warning sign. IP addresses are from all corners of the world. Many from south America, Brazil mostly. Also saw quite a number from Vietnam of all places, but everywhere else too.
We're on Cloud and I don't know what to do. Could we be under attack? We have been in the past, a few times I think. For reasons I cannot possibly fathom; we're a small photography forum for crying out loud.
I have. But mostly mitigated by cloudflare directly.That's what i'm experiencing too. But it's slower to tail off.
It might be because my 'members online timeout' is set higher than theirs.
Wonder if other cloudflare users also experience this periodic blast.
I'd say you have an unknown-unknown problem: You see what you have blocked - but you don't see the bots within the traffic that you haven't blocked. Four blocked countries will only block parts of the bot traffic and - as has been said numerous times in this thread - Cloudflares bot detection does only detect a very small fraction of the resident proxies.I have. But mostly mitigated by cloudflare directly.
Well, those that come from these countries - but not those that come from other countries. Geoblocking is indeed very effective - probably for the moment the most effective thing one can do if it does not hurt genuine traffic. However: Locking the front door while leaving the backdoor and the windows open does not create safety. So you feel safe (falsely) b/c something is blocked - while in fact your doors are still wide open.I do look at the pass/fails, and blocking certain rogue countries has been very effective in combatting these particularly useless bots.
This is a different kind of issue. These kind of scanning for word press does usually come via big cloud providers, in my forum it was for the most part coming from Microsoft Azure. These do no harm but eat your resources and spoil the 404 logs with garbage which makes proper 404 management basically impossible. You could get rid of those by filtering out these regquests via some regular expressions in .htaccess (which you can't as you are on XF Cloud) or by blocking the ASNs where those requests come from (which I think is possible via Cloudflare). This could bring down those things considerable.Just to laugh it off, I look at my 404's to try to see why,..
And so many of my 404 requests have all kinds of wordpress directories and files in the request string looking for hits on wp files.
I am on a cloud at XF. There is no wordpress on my site.![]()
Of course. Thats why I used the word mitigated, and not the word eliminated. In my attempts to reduce the number of useless bots proving successful, I only focus on the worst offenders. I am not isolating every single bot I dont want. It would be futile.You see what you have blocked - but you don't see the bots within the traffic that you haven't blocked.
And those bots get captured in a 404 on my XF instance, and I see them in my logs. But again, they get blocked locally on the server through XF. (The only unblocked indexers I reject are the likes of ByteDance, etc. using robots.txt. All others flow through. )but you don't see the bots within the traffic that you haven't blocked.
I agree. Like I said, 4 blocked countries representing the significant portion of requests from countries that I do not market to anyway. It is mitigation. And my former useless visitor count has now trimmed substantially and more real now and I do not mind it being a more representative number. (which is the OP's topic point)Four blocked countries will only block parts of the bot traffic and - as has been said numerous times in this thread - Cloudflares bot detection does only detect a very small fraction of the resident proxies.
but you don't see the bots within the traffic that you haven't blocked.
Absolutely not. 404s are requests for something that isn't there. What you see as 404 are denied requests for content that only logged in member would see like i.e. full size images if you have set the permissions to that. Personally I think these should rather be a 403 and opened even a bug for that but was stated that 404 would be totally fine and the better way.And those bots get captured in a 404 on my XF instance,
See: Maybe. Identify: Hard. Deal with them: Even harder. And very timeconsuming.and I see them in my logs.
They only get blocked if your content is behind the registration wall. So the only "safe" way against the KI scrapers ATM is to put everything behind the loging - which will then affect Google ranking and annoy legitimate visitiors (but bring a bunch of additional registrations short hand). You can however not be shure that modern KI would not have already registred in your forum.But again, they get blocked locally on the server through XF. (The only unblocked indexers I reject are the likes of ByteDance, etc. using robots.txt. All others flow through. )
And so many of my 404 requests have all kinds of wordpress directories and files in the request string looking for hits on wp files.
I am on a cloud at XF. There is no wordpress on my site.![]()
We use essential cookies to make this site work, and optional cookies to enhance your experience.