Crazy amount of guests

There's a few options but check out https://iplists.firehol.org/ which can help you integrate blocklists into a custom solution. Almost always what it is just scrapers and people training AI models. Like previous people indicated, Cloudflare is very good for handling automated traffic, but there are some things you can do manually if you have your own VPS/root access.
 
One common trend we've seen so far is that all the problems are members using their iphones on safari. Nobody else has a problem.

Probably these are then using Apple's privacy relay which is basically a VPN, but limited to Safari on Apple devices. I see this getting used by my users a lot as well. IP Threat monitor has an option to let those pass and - as the IP-lists are publicly available - it should be possible to integrate something like that in CF, too.
 
I think one of the biggest issues not being discussed here is, how much traffic does AI send a site nowadays? If you block the known entities from providing your site as a recommendation because it can't access it, then are you losing out?

Just putting this out there, which is why I have all AI bots on CF as allow, because a lot of people are now turning to AI in place of Google. I get better answers asking Claude or ChatGPT than I often do asking Google. You ask them a specific question and there is no nonsense listings in return, just what fits your question to them.
 
I think one of the biggest issues not being discussed here is, how much traffic does AI send a site nowadays? If you block the known entities from providing your site as a recommendation because it can't access it, then are you losing out?
Cloudflare gather stats on this, crawl-to-refer ratios, here are their stats from the past week:

1778574836212.webp

So Anthropic and OpenAI are pretty horrendous.....
 
I think one of the biggest issues not being discussed here is, how much traffic does AI send a site nowadays? If you block the known entities from providing your site as a recommendation because it can't access it, then are you losing out?

Just putting this out there, which is why I have all AI bots on CF as allow, because a lot of people are now turning to AI in place of Google. I get better answers asking Claude or ChatGPT than I often do asking Google. You ask them a specific question and there is no nonsense listings in return, just what fits your question to them.
Surely, what the best strategy is depends from how exclusive and how high quality the content of your forum is and how eager you are for growth or visitors. In general AI providers grab your content and give nothing back in exchange. Some are worse than others but overall you will lose: If AI has grabbed your content and serves it there is no need to visit your forum for people asking questions, even if there is a backlink.

Personally I do not care too much about new users and I clearly want to protect my forum content as it is pretty high quality and a lot of it is exclusive and cannot be found anywhere. How silly would it be to give that advantage away and even more for free? Plus enabling the AIs to grab all kinds of personal information that forum users may post with all potentially negative effects this may have.

So I do block scrapers and most AI agents. Depending from the AI and how well structured it's bots are it is sometimes possible to let the searh bot through while blocking the training bot. When in doubt I rather block completely. Furthermore I've set hard limitations to visibility for guests: They have always not been able to see some parts of the forum but most of it was freely accessible. I've changed this a while ago and now as a guest you can only see the first post of a thread and on top of that there are even more areas of the forum not accessible to guests.

Until now this has tremendously fostered registrations and at had least in the first months no negative impact on search engine ranking. I did not check this recently because I don't care too much. I do have a working community, I gain new users - what else could I want?

Surely, running a non-commercial forum helps, but I am part of the "lock them out as good as possible" cohort.
 
Apart from cloudfare (cost effective) is there any other way, someone came across blocking AI guests Spam ?
Did you read the thread you are posting to? There are loads of options mentioned including experiences with them.

Funnily enough, not too long ago you claimed in this very thread that CF would easily solve the bot issue

Go for cloudfare free dns and setup your site with Security Rules.
Give a managed challenge for all unverified bots (90% of your issue will be resolved)>

and have been hinted better to read the thread to understand the issue and the options available:

Maybe you should have read more than just the start posting of the thread but rather the 11 pages followig it until now. Then you'd have realized that your ill-led "advice" does not work at all.

Seems you have still not done so (plus, obviously, "SPAM" means active messaging, something which AI scraping bots don't do, so if you have a guest SPAM problem you could simply disallow guest posting on your forums).

So did your advice not work out as you claimed earlier?
 
Last edited:
In general AI providers grab your content and give nothing back in exchange. Some are worse than others but overall you will lose: If AI has grabbed your content and serves it there is no need to visit your forum for people asking questions, even if there is a backlink.
I agree with all of this.

Side note: I did see that Cloudflare is adding a service (I believe it's in a closed beta) where you can offer your data to AI scrapers for a payment. I guess one of the HTTP return codes, 402 Payment Required, is the mechanism they use, and from there they've found a way to implement payment.

But I agree with the moral implications of AI. They are essentially stealing all of our members' content, without permission, without license, without payment, to fuel their AI arms race which essentially are padding the profits of mega-corporations and keeping shareholders happy. The benefits are for them, not for society or end users like us.

In the fields I'm interested in, I have yet to see any AI answer be accurate. Some are so wildly off base that they are a joke. I refuse to use AI. I removed any AI apps on my devices. I turn it off in software where I can. I'm a grown-ass adult who learned how to think, write, research, do my own work, etc. on my own without needing machine assistance from a machine that is inherently faulty. I'll do things by hand, rather than rely on AI crutches.

Not only that, feeding AI scrapers posts from forums, Reddit, etc. is such a fatally flawed concept that the concept of garbage in/garbage out really applies here in full force. We don't see misinformation or inaccuracies as often here in XF's support forums, but go out in the general interest forums. Someone asks a question. They might get ten answers to their question...most of them wild-assed guesses or completely incorrect information. One post might get it right. (Automotive forums are a good example of this, especially when I may have the same question and get so many of the same inaccurate responses on completely different forums.) AI has no critical thinking skills, no common sense, no decision-making ability--it's a machine that spits out the garbage that it's fed. Why should I trust it when so much garbage is being fed into it?

So yeah...I'm blocking AI scrapers on forums. Any of them that I can. And if I can keep it out of the hands of companies doing this through questionable and dishonest means, I'm going to try anything possible to stop them. Ideally I would require everyone get a CF challenge, but whitelist them once they log in. But CF is flawed itself to a point that even when I put such things in place, CF finds some other way to block innocent legitimate users that are beyond my control. Which is what is happening right now.
 
Back
Top Bottom