Crazy amount of guests

For anyone still fighting this issue, the non-believers, fixing the issue at the source as I outlined on 31 Mar: https://xenforo.com/community/threads/crazy-amount-of-guests.233649/page-17#post-1777817 and as above, problem solved.

See the results below, rubbish gone, requests down, caching up and steady now to around 65% daily, served data down. IT WORKS! The server is isolated to cloudflare IP's only, so everything has to go through it, no direct IP access. That took my traffic from 1M daily uniques to my real traffic, around 110k - 140k daily now. Very steady, no chasing IP's to ban, ASN's, etc. Tuned server, locked to CF, /search/ manage challenged.

That is a huge improvement in my eyes.

View attachment 336528
This is the way. Welcome back to the ways of sanity!

Once people begin wizening up to this mass AI-scraping garbage, the better. I think the best way to put it is like this: It's your data, on your servers, it shouldn't be a smorgasbord for abusive AI botnets that may/will profit off a community you've shaped and built over the years driving away future growth.

I'm all for legitimate indexing of content. However, not for AI entities to mass ingest content and completely ignore rate limits by what is effectively a botnet. Less the fact that these AI networks not even giving a source/link-back to where the content was ingested from.



In other news, it looks like these forums are also being bombarded with scrapers tonight...
brave-04-20_20-32-1099bab8-7c1f-42f9-92dd-ce089233804c.webp
 
I think the best way to put it is like this: It's your data, on your servers, it shouldn't be a smorgasbord for abusive AI botnets that may/will profit off a community you've shaped and built over the years driving away future growth.
The thing is: With the current situation even as a forum user it is worth thinking about wether a forum is somewhat protected against AI scrapers and if you want your posts to become part of a commercial AI system. You may not want it for either reasons of privacy or to not support commercial systems (that will charge you to use them) for free with your knowledge. So one might limit what one posts on an unprotected forum of possibly stop posting alltogether on such a forum.

So basically protection agains AI scraping bots has become part of resposible forum administration and running today out of respect for the users and to protect their privacy and content. I think any forum admin should take this in consideration, indedpendently from performance or other reasons. It is even something that could be used as a marketing point for the own forum towards the users.

Even a forum like the XF forums here do contain loads of more or less private data that people post over time. I do not feel well with the ignorance that XF shows towards the scraping bots in this forum.
 
The thing is: With the current situation even as a forum user it is worth thinking about wether a forum is somewhat protected against AI scrapers and if you want your posts to become part of a commercial AI system. You may not want it for either reasons of privacy or to not support commercial systems (that will charge you to use them) for free with your knowledge. So one might limit what one posts on an unprotected forum of possibly stop posting alltogether on such a forum.

So basically protection agains AI scraping bots has become part of resposible forum administration and running today out of respect for the users and to protect their privacy and content. I think any forum admin should take this in consideration, indedpendently from performance or other reasons. It is even something that could be used as a marketing point for the own forum towards the users.

Even a forum like the XF forums here do contain loads of more or less private data that people post over time. I do not feel well with the ignorance that XF shows towards the scraping bots in this forum.

You're creating problems you can't overcome!
If your frum is public, bots will scan it and use it to train their AI, and there's nothing you can do about it because it's exactly like a real user learning something from your forum and then applying the news to their knowledge.
 
If your frum is public, bots will scan it and use it to train their AI, and there's nothing you can do about it
Obviously wrong - which you would know if you had read the thread that you are posting to. On top of that you can limit the amount of content that guests are able to see with various methods plus - only effective in theory or hindsight - you can add to your TOS that scraping and using the forum content in AI models is not allowed.
because it's exactly like a real user learning something from your forum and then applying the news to their knowledge.
I've not hat real users coming to my forum in thousands of sessions in parallel, hiding their identity by using resident proxies, scraping my forum at scale and then offering the knowledge they gained from my forum publicly commercially at scale and exploiting the privacy of it's users. So no, it is clearly not the same thing.
 
The specific expression for search is: (http.request.uri.path contains "/search/" and not http.cookie contains "xf_user=")
Maybe it is a dumb question or goes over the top: The "xf_user"-cookie is used by any Xenforo-Forum, so do you specifically check if it is valid for your forum domain (and possibly what the content may be) or just if it exists? Else, at least in theory, one could simply create a cookie with that name and pass any restrictions or anyone who is logged in in any Xenforo-forum would pass the check.

As it seems to work for you everything should be fine, at least for the moment - I am just wondering how easy it would be to bypass or trick the checks.
 
Back
Top Bottom