XF 2.3 AI crawlers causing xf_session_activity table to reach limit

bottiger · Aug 31, 2025

So my forum is now getting hammered by AI crawlers causing xf_session_activity to fill and stop the forum from loading. I've gotten over 10,000 unique ips.

Changing the table type to innodb is just a temporary bandaid.

I already have cloudflare blocking AI crawlers enabled but it doesn't work. The user-agents are all spoofing real browsers and the ips are all residential proxies.

The only thing that can stop them is to enable javascript verification for all users but this greatly annoys users.

Does anyone have a solution?

xml · Sep 1, 2025

Do you have root access? How many rams memory you have on your machine?

smallwheels · Sep 1, 2025

bottiger said:
I already have cloudflare blocking AI crawlers enabled but it doesn't work. The user-agents are all spoofing real browsers and the ips are all residential proxies.

How do you know that these are AI crawlers in the first place (and not something different)? Not saying it ain't AI crawlers - but what makes you so sure?

bottiger said:
Does anyone have a solution?

Did you bother to read the forum or use the forum search? There have been lots of reports about AI bots on the rise, causing all sorts of problems, over the last couple of months here on the forums. Within these threads various ways of dealing with it have been described and also the pitfalls that may occur.

What exactly makes your situation different from that of any other that none of what has been discussed and described in the existing threads does fit your situation so that you have to start another, different thread?

And if it is completely different (so different, that you don't even need to mention the existing ways of dealing with it) - back to the first question: If whatever affects your forum is totally different from the patterns that AI crawlers have shown until now - what makes you sure that it is AI crawlers?

bottiger · Sep 2, 2025

smallwheels said:
How do you know that these are AI crawlers in the first place (and not something different)? Not saying it ain't AI crawlers - but what makes you so sure?

because the guest count went from 10,000 to 20 when I enabled javascript checking

Alvin63 · Sep 2, 2025

Could you just block a few main ones in ht access? Sometimes a lot of them are all from the same place so blocking the main ones can reduce it a lot. I blocked bytedance and bytespider. Although I had nothing like the amount you have. They are bots yes but probably not all AI bots.

bottiger · Sep 2, 2025

Alvin63 said:
Could you just block a few main ones in ht access? Sometimes a lot of them are all from the same place so blocking the main ones can reduce it a lot. I blocked bytedance and bytespider. Although I had nothing like the amount you have. They are bots yes but probably not all AI bots.

There are no "main ones". They are using a rotating residential proxy so they get a new ip each request and they are all spoofing real user-agents.

ES Dev Team · Sep 2, 2025

I use fail2ban to watch apache logs and apply rules.

Today since Xenforo doesn't include this kind of protection against extreme amounts of bots, you either need to use cloudflare or fail2ban ( protection running on your machine )

It's interesting to know that cloudflare isn't holding up.
The scraper bots are very sneaky lately and very covert.
The only thing that works against them is specialized rate limiting and filtering based on what URLs they are repeatedly hitting.
Not sure if you can't implement that in cloudflare. I won't use cloudflare because it often inconveniences users and i don't like giving all this information about my web traffic away for free when i don't have to.

For a temporary expansion of your sessions table, put this in your mysql configuration:
/etc/mysql/mysql.conf.d/mysqld.cnf:
#xenforo uses in-memory tables; default is 16M and these tend to fill up quick, so we need to boost their size. - DS
tmp_table_size = 64M
max_heap_table_size = 64M

Alvin63 · Sep 2, 2025

bottiger said:
There are no "main ones". They are using a rotating residential proxy so they get a new ip each request and they are all spoofing real user-agents.

There could well be a main one or two behind it all - block those and the rest don't work. Just my two pennorth. It could be worth a try and see what happens.

I used the code from this thread. A lot of "guests" can be part of one main bad bot. Block one and loads of others disappear.

A

Thread 'How to block Robot ByteDance'

Jun 14, 2025

I've mentioned this on a couple of other threads but Robot ByteDance is hogging my site and appearing multiple times every 60 seconds or so.

It ignores robots.txt
It also ignored Cloudflare rules so it must be bypassing Cloudflare
I've been unable to whitelist Cloudflare IP's in my shared hosting (it won't accept the IP ranges, only individual IP's and there are thousands)

Any other suggestions? Block it in htaccess?

ES Dev Team · Sep 2, 2025

About 90% the bots that hit my site don't identify themselves and pretend to be a normal browser.

AndyB · Sep 3, 2025

I have found that most of these bots are coming from Brazil. So I block the entire country of Brazil and bad AI bot issue is resolved.

I use two add-ons:

The first one is Access log v2.1.

https://xenforo.com/community/resources/access-log.8787/

This allows me to see which AI bot's are hammering the forum. I then use this add-on to block the offending country:

https://xenforo.com/community/resources/block-country.10042/

ES Dev Team · Sep 3, 2025

Wish i had the option to do that, our forum is super international. But if you can do it - it's a slam dunk.

Alvin63 · Sep 3, 2025

Also just to add - Cloudflare can't solve everything sometimes, because some of them just bypass Cloudflare - and may know your origin IP address if they can do that.

Jja · Sep 3, 2025

Alvin63 said:
Also just to add - Cloudflare can't solve everything sometimes, because some of them just bypass Cloudflare - and may know your origin IP address if they can do that.

Not a problem if you have server access - you can configure to allow only cloudlfare ip list.

Alvin63 · Sep 3, 2025

Jja said:
Not a problem if you have server access - you can configure to allow only cloudlfare ip list.

I wasn't able to do that on shared hosting

So the only option was block in HT access

nodle · Sep 3, 2025

Just use Cloudflare's Ai blocking tools, you should be able to use them on shared hosting as well, since everything is managed though Cloudflare's dashboard. They have a new blocking method that modifies your robots.txt file for you as well.

Control content use for AI training with Cloudflare’s managed robots.txt and blocking for monetized content

Cloudflare is making it easier for publishers and content creators of all sizes to prevent their content from being scraped for AI training by managing robots.txt on their behalf, and allowing targeted blocking of AI crawling on sites that serve ads.

blog.cloudflare.com

pegasus · Sep 4, 2025

There were 2 main issues I found causing the AI bots (mainly from Brazil) to bring my site to a halt:
#1 xf_session needed to be changed to InnoDB

This prevented the site from crashing around ~25k concurrent bots.
But it also allowed more bots to come. Within hours, the site crashed around ~60k bots.

#2 There were multiple code bottlenecks on the site, not noticeable under normal use (maybe the page loads in 0.4 seconds instead of 0.1 seconds). By reviewing the URLs the bots were mostly repeatedly hitting, and comparing to slow query logs, I was able to find what a majority of bots were doing, and rewrite the code to be more efficient or block guests from doing it entirely.

This prevented the site from crashing around ~60k concurrent bots.
It also allowed more bots to come. The bots spiked over ~120k bots, but could not crash the site anymore, but/so then they left after a few hours and have not been back since.

ES Dev Team · Sep 4, 2025

Thanks for the technical report.

Over here, the fail2ban setup, with a combination of standard and very specific custom rate limiters, has consistently reduced my bot count down to 2023 levels. I consider this a success because it is a 'admin sleeps well at night' solution. It just requires some occasional tuning to account for new attacks.

Most of my recent karate chops to bots involve:

a hits per day rule, where the threshold is >5x larger than the big user, designed to fend off massively distributed scraping
insta ban on certain pages that attract bots ( honeypot )
a couple dozen user agent strings are instantly permabanned
entire ranges of AWS Singapore permabanned

I'm considering building a next generation version of fail2ban because in the future, you will need a more sophisticated defense that thinks about behavioral characteristics of an IP address, and also have an easy way to connect to the app so the app can send hints to the security mechanism.

I feel like most general things you can slap on to a xenforo install are going to be inadequate soon and something you can cater to Xenforo better will eventually become a mandatory technology. This high volume of bot garbage threatens the internet's existence and all you can do is up your defense budget and sophistication.

We have considered moving to a Hetzner dedicated server so that we have a huge amount of computational overhead against this at a very low cost. This means that we need to re-engineer some things from AWS, which makes it a pain in the buttocks.

smallwheels · Sep 4, 2025

ebikes.com said:
I feel like most general things you can slap on to a xenforo install are going to be inadequate soon and something you can cater to Xenforo better will eventually become a mandatory technology. This high volume of bot garbage threatens the internet's existence and all you can do is up your defense budget and sophistication.

Fully agree and - to put this in a different perspective: Defense against AI bots will become a major feature for forum software and therefor could be also a business opportunity for XenForo. As this cannot be a turnkey solution but needs constant adjustment they could (in theory) offer a very well working solution as part of their cloud plans - which would be a very huge argument for using the XF-cloud and therefor a massive business case. In practice, sadly, XF with high probability lacks the human and financial resources to implement something like that. On a positive note that leaves me self-hosted then and safes on money.

Btw.: XF itself seems also to suffer from bots and have no working way against them (or they simply don't care), judging from the ratio between guests and users on the forum and the massive rise in guests in that statistic over the last months. Pic from early morning today:

Bildschirmfoto 2025-09-04 um 08.45.47.webp

ebikes.com said:
entire ranges of AWS Singapore permabanned

ebikes.com said:
moving to a Hetzner

ebikes.com said:
we need to re-engineer some things from AWS, which makes it a pain in the buttocks.

Doesn't it feel a bit strange to pay money to the very same company for hosting that you have to fight against b/c they do provide the infrastructure for those that are about to destroy your forum and earn money from them? Hetzner is in the same boat btw: Loads of scraping attempts coming from them and a totally useless abuse desk.

There should be a white hat provider available - I see no benefit in feeding a company that bites you.

ES Dev Team · Sep 6, 2025

Yes, i very much don't like the fact that i'm buying hosting from a company that's not regulating their own network. We expect AWS to be a little more classy than that.

Btw.: XF itself seems also to suffer from bots and have no working way against them (or they simply don't care), judging from the ratio between guests and users on the forum and the massive rise in guests in that statistic over the last months. Pic from early morning today:

Yeah. My forum ( https://endless-sphere.com ) is within 1% of the post count of Xenforo so it's a good comparison.

Here's whatever Xenforo is using versus my set of fail2ban rules:

My number tends to be in the 2000's lately, but it was 20000+ a month ago before the last round of tuning.

Someone really should sell a bot protection package. Or at least describe a working cloudflare setup.
Something that can give non-expert server administrators a headstart.

I had 2 companies paid me to advise on fail2ban, so on my end, i'd be willing to sell the IP + advisement to anyone for a very reasonable price.
I'd love to give out my recipe online but i don't want AI bots training on it and therefore making it useless and fast forward to the time i have to develop a more advanced security mechanism.

smallwheels · Sep 6, 2025

ebikes.com said:
Yeah. My forum ( https://endless-sphere.com ) is within 1% of the post count of Xenforo so it's a good comparison.

Here's whatever Xenforo is using versus my set of fail2ban rules:

My number tends to be in the 2000's lately, but it was 20000+ a month ago before the last round of tuning.

My forum is a tiny baby in comparison, the most guests I ever had was 1800 back in April this year. This was because of a scraping attack coming from Hetzner. In March this year I had started a quite intensive journey of investigations, experiments and counter measures against scrapers, costing endless hours. As I am on shared hosting and do not use cloud flare (and do not want to) my points of leverage are pretty limited. But sucessful, as it seems. Typically my statistic now looks like that:

Bildschirmfoto 2025-09-06 um 11.25.33.webp

I hardly ever have more than 80 guests and/or at best 15 robots at peak and the ratio members/guests is barely ever beyond 1:5. Before that I had a couple of hundreds of guests quite regularly along with 40-100 robots.

It was a lot of work over the first weeks and months and still needs a limited level of continuous maintenance and adjustment, but overall it is pretty stable now. I've described the earlier parts of my journey here:

S

Thread 'Learnings: Identifying and getting rid of unwanted traffic'

Apr 6, 2025

I've recently spent some time getting rid of unwanted traffic on my forums and thought maybe the learnings might have value for someone else, so I am writing them up. This is not intended as a tutorial or even advice - it is just a couple of finding that you may find useful or not. Also, there are many ways to Rome, depending from your situation, needs and abilities. So take it with a grain of salt.

Important race conditions for my actions: My forum is pretty small (currently ~2.000 registered users), runs on shared hosting (which limits my possibilities in terms of configuration), I...

A lot has happened since then and an update is overdue. However, as a rough conclusion: While being far from perfect and a lot of time involved it is not all too difficult to bring down bots massively. There clearly is a limit to my attempt the more clever the bots behave - so a certain degree of them surely are undetected now already.

However: Having the luck to have started my attempts kind of early enough when the high amount of reports about a massive rise in guests/scrapers/AI bots in the forum here started in about early July (and has not stopped ever since) I do not see anything like that on my forum. It may be just luck, but maybe also part of the measures I implemented.

ebikes.com said:
Someone really should sell a bot protection package. Or at least describe a working cloudflare setup.
Something that can give non-expert server administrators a headstart.

Absolutely. Too many forum administrators still focus on registrations and so do most of the add ons. In the meantime, guests aka scraping bots have become the bigger threat in my opinion. It should be relatively easy to create an add on that can be configured to block access to the forum (and not just to the registration) on a country, ASN or IP level (and possible other criteria as well) along with the option of proper logging and filtering the logs. This would help a lot already w/o any need for intelligence in the add on needed - it would ease up what I did on the shell with traditional unix tools dramatically.

Obviously, there are many ways for a more fancy and more intelligent solution, but XF does lack even the plain basics, so having an easy to access toolkit for them would already improve the situation dramatically.

XF 2.3 AI crawlers causing xf_session_activity table to reach limit

Active member

Active member

Well-known member

Active member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Similar threads

We value your privacy