XF 2.3 AI crawlers causing xf_session_activity table to reach limit

bottiger

Active member
So my forum is now getting hammered by AI crawlers causing xf_session_activity to fill and stop the forum from loading. I've gotten over 10,000 unique ips.

Changing the table type to innodb is just a temporary bandaid.

I already have cloudflare blocking AI crawlers enabled but it doesn't work. The user-agents are all spoofing real browsers and the ips are all residential proxies.

The only thing that can stop them is to enable javascript verification for all users but this greatly annoys users.

Does anyone have a solution?
 
I already have cloudflare blocking AI crawlers enabled but it doesn't work. The user-agents are all spoofing real browsers and the ips are all residential proxies.
How do you know that these are AI crawlers in the first place (and not something different)? Not saying it ain't AI crawlers - but what makes you so sure?

Does anyone have a solution?
Did you bother to read the forum or use the forum search? There have been lots of reports about AI bots on the rise, causing all sorts of problems, over the last couple of months here on the forums. Within these threads various ways of dealing with it have been described and also the pitfalls that may occur.

What exactly makes your situation different from that of any other that none of what has been discussed and described in the existing threads does fit your situation so that you have to start another, different thread?

And if it is completely different (so different, that you don't even need to mention the existing ways of dealing with it) - back to the first question: If whatever affects your forum is totally different from the patterns that AI crawlers have shown until now - what makes you sure that it is AI crawlers?
 
Could you just block a few main ones in ht access? Sometimes a lot of them are all from the same place so blocking the main ones can reduce it a lot. I blocked bytedance and bytespider. Although I had nothing like the amount you have. They are bots yes but probably not all AI bots.
 
Could you just block a few main ones in ht access? Sometimes a lot of them are all from the same place so blocking the main ones can reduce it a lot. I blocked bytedance and bytespider. Although I had nothing like the amount you have. They are bots yes but probably not all AI bots.

There are no "main ones". They are using a rotating residential proxy so they get a new ip each request and they are all spoofing real user-agents.
 
I use fail2ban to watch apache logs and apply rules.

Today since Xenforo doesn't include this kind of protection against extreme amounts of bots, you either need to use cloudflare or fail2ban ( protection running on your machine )

It's interesting to know that cloudflare isn't holding up.
The scraper bots are very sneaky lately and very covert.
The only thing that works against them is specialized rate limiting and filtering based on what URLs they are repeatedly hitting.
Not sure if you can't implement that in cloudflare. I won't use cloudflare because it often inconveniences users and i don't like giving all this information about my web traffic away for free when i don't have to.

For a temporary expansion of your sessions table, put this in your mysql configuration:
/etc/mysql/mysql.conf.d/mysqld.cnf:
#xenforo uses in-memory tables; default is 16M and these tend to fill up quick, so we need to boost their size. - DS
tmp_table_size = 64M
max_heap_table_size = 64M
 
Last edited:
There are no "main ones". They are using a rotating residential proxy so they get a new ip each request and they are all spoofing real user-agents.
There could well be a main one or two behind it all - block those and the rest don't work. Just my two pennorth. It could be worth a try and see what happens.

I used the code from this thread. A lot of "guests" can be part of one main bad bot. Block one and loads of others disappear.

 
Just use Cloudflare's Ai blocking tools, you should be able to use them on shared hosting as well, since everything is managed though Cloudflare's dashboard. They have a new blocking method that modifies your robots.txt file for you as well.

 
There were 2 main issues I found causing the AI bots (mainly from Brazil) to bring my site to a halt:
#1 xf_session needed to be changed to InnoDB
  • This prevented the site from crashing around ~25k concurrent bots.
  • But it also allowed more bots to come. Within hours, the site crashed around ~60k bots.

#2 There were multiple code bottlenecks on the site, not noticeable under normal use (maybe the page loads in 0.4 seconds instead of 0.1 seconds). By reviewing the URLs the bots were mostly repeatedly hitting, and comparing to slow query logs, I was able to find what a majority of bots were doing, and rewrite the code to be more efficient or block guests from doing it entirely.
  • This prevented the site from crashing around ~60k concurrent bots.
  • It also allowed more bots to come. The bots spiked over ~120k bots, but could not crash the site anymore, but/so then they left after a few hours and have not been back since.
 
Thanks for the technical report.

Over here, the fail2ban setup, with a combination of standard and very specific custom rate limiters, has consistently reduced my bot count down to 2023 levels. I consider this a success because it is a 'admin sleeps well at night' solution. It just requires some occasional tuning to account for new attacks.

Most of my recent karate chops to bots involve:
  • a hits per day rule, where the threshold is >5x larger than the big user, designed to fend off massively distributed scraping
  • insta ban on certain pages that attract bots ( honeypot )
  • a couple dozen user agent strings are instantly permabanned
  • entire ranges of AWS Singapore permabanned

I'm considering building a next generation version of fail2ban because in the future, you will need a more sophisticated defense that thinks about behavioral characteristics of an IP address, and also have an easy way to connect to the app so the app can send hints to the security mechanism.

I feel like most general things you can slap on to a xenforo install are going to be inadequate soon and something you can cater to Xenforo better will eventually become a mandatory technology. This high volume of bot garbage threatens the internet's existence and all you can do is up your defense budget and sophistication.

We have considered moving to a Hetzner dedicated server so that we have a huge amount of computational overhead against this at a very low cost. This means that we need to re-engineer some things from AWS, which makes it a pain in the buttocks.
 
I feel like most general things you can slap on to a xenforo install are going to be inadequate soon and something you can cater to Xenforo better will eventually become a mandatory technology. This high volume of bot garbage threatens the internet's existence and all you can do is up your defense budget and sophistication.
Fully agree and - to put this in a different perspective: Defense against AI bots will become a major feature for forum software and therefor could be also a business opportunity for XenForo. As this cannot be a turnkey solution but needs constant adjustment they could (in theory) offer a very well working solution as part of their cloud plans - which would be a very huge argument for using the XF-cloud and therefor a massive business case. In practice, sadly, XF with high probability lacks the human and financial resources to implement something like that. On a positive note that leaves me self-hosted then and safes on money. :)

Btw.: XF itself seems also to suffer from bots and have no working way against them (or they simply don't care), judging from the ratio between guests and users on the forum and the massive rise in guests in that statistic over the last months. Pic from early morning today:

Bildschirmfoto 2025-09-04 um 08.45.47.webp


  • entire ranges of AWS Singapore permabanned

moving to a Hetzner

we need to re-engineer some things from AWS, which makes it a pain in the buttocks.
Doesn't it feel a bit strange to pay money to the very same company for hosting that you have to fight against b/c they do provide the infrastructure for those that are about to destroy your forum and earn money from them? Hetzner is in the same boat btw: Loads of scraping attempts coming from them and a totally useless abuse desk.

There should be a white hat provider available - I see no benefit in feeding a company that bites you.
 
Last edited:
Back
Top Bottom