XF 2.3 AI crawlers causing xf_session_activity table to reach limit

bottiger

Active member
So my forum is now getting hammered by AI crawlers causing xf_session_activity to fill and stop the forum from loading. I've gotten over 10,000 unique ips.

Changing the table type to innodb is just a temporary bandaid.

I already have cloudflare blocking AI crawlers enabled but it doesn't work. The user-agents are all spoofing real browsers and the ips are all residential proxies.

The only thing that can stop them is to enable javascript verification for all users but this greatly annoys users.

Does anyone have a solution?
 
I already have cloudflare blocking AI crawlers enabled but it doesn't work. The user-agents are all spoofing real browsers and the ips are all residential proxies.
How do you know that these are AI crawlers in the first place (and not something different)? Not saying it ain't AI crawlers - but what makes you so sure?

Does anyone have a solution?
Did you bother to read the forum or use the forum search? There have been lots of reports about AI bots on the rise, causing all sorts of problems, over the last couple of months here on the forums. Within these threads various ways of dealing with it have been described and also the pitfalls that may occur.

What exactly makes your situation different from that of any other that none of what has been discussed and described in the existing threads does fit your situation so that you have to start another, different thread?

And if it is completely different (so different, that you don't even need to mention the existing ways of dealing with it) - back to the first question: If whatever affects your forum is totally different from the patterns that AI crawlers have shown until now - what makes you sure that it is AI crawlers?
 
Could you just block a few main ones in ht access? Sometimes a lot of them are all from the same place so blocking the main ones can reduce it a lot. I blocked bytedance and bytespider. Although I had nothing like the amount you have. They are bots yes but probably not all AI bots.
 
Could you just block a few main ones in ht access? Sometimes a lot of them are all from the same place so blocking the main ones can reduce it a lot. I blocked bytedance and bytespider. Although I had nothing like the amount you have. They are bots yes but probably not all AI bots.

There are no "main ones". They are using a rotating residential proxy so they get a new ip each request and they are all spoofing real user-agents.
 
I use fail2ban to watch apache logs and apply rules.

Today since Xenforo doesn't include this kind of protection against extreme amounts of bots, you either need to use cloudflare or fail2ban ( protection running on your machine )

It's interesting to know that cloudflare isn't holding up.
The scraper bots are very sneaky lately and very covert.
The only thing that works against them is specialized rate limiting and filtering based on what URLs they are repeatedly hitting.
Not sure if you can't implement that in cloudflare. I won't use cloudflare because it often inconveniences users and i don't like giving all this information about my web traffic away for free when i don't have to.

For a temporary expansion of your sessions table, put this in your mysql configuration:
/etc/mysql/mysql.conf.d/mysqld.cnf:
#xenforo uses in-memory tables; default is 16M and these tend to fill up quick, so we need to boost their size. - DS
tmp_table_size = 64M
max_heap_table_size = 64M
 
Last edited:
There are no "main ones". They are using a rotating residential proxy so they get a new ip each request and they are all spoofing real user-agents.
There could well be a main one or two behind it all - block those and the rest don't work. Just my two pennorth. It could be worth a try and see what happens.

I used the code from this thread. A lot of "guests" can be part of one main bad bot. Block one and loads of others disappear.

 
Also just to add - Cloudflare can't solve everything sometimes, because some of them just bypass Cloudflare - and may know your origin IP address if they can do that.
 
Just use Cloudflare's Ai blocking tools, you should be able to use them on shared hosting as well, since everything is managed though Cloudflare's dashboard. They have a new blocking method that modifies your robots.txt file for you as well.

 
There were 2 main issues I found causing the AI bots (mainly from Brazil) to bring my site to a halt:
#1 xf_session needed to be changed to InnoDB
  • This prevented the site from crashing around ~25k concurrent bots.
  • But it also allowed more bots to come. Within hours, the site crashed around ~60k bots.

#2 There were multiple code bottlenecks on the site, not noticeable under normal use (maybe the page loads in 0.4 seconds instead of 0.1 seconds). By reviewing the URLs the bots were mostly repeatedly hitting, and comparing to slow query logs, I was able to find what a majority of bots were doing, and rewrite the code to be more efficient or block guests from doing it entirely.
  • This prevented the site from crashing around ~60k concurrent bots.
  • It also allowed more bots to come. The bots spiked over ~120k bots, but could not crash the site anymore, but/so then they left after a few hours and have not been back since.
 
Thanks for the technical report.

Over here, the fail2ban setup, with a combination of standard and very specific custom rate limiters, has consistently reduced my bot count down to 2023 levels. I consider this a success because it is a 'admin sleeps well at night' solution. It just requires some occasional tuning to account for new attacks.

Most of my recent karate chops to bots involve:
  • a hits per day rule, where the threshold is >5x larger than the big user, designed to fend off massively distributed scraping
  • insta ban on certain pages that attract bots ( honeypot )
  • a couple dozen user agent strings are instantly permabanned
  • entire ranges of AWS Singapore permabanned

I'm considering building a next generation version of fail2ban because in the future, you will need a more sophisticated defense that thinks about behavioral characteristics of an IP address, and also have an easy way to connect to the app so the app can send hints to the security mechanism.

I feel like most general things you can slap on to a xenforo install are going to be inadequate soon and something you can cater to Xenforo better will eventually become a mandatory technology. This high volume of bot garbage threatens the internet's existence and all you can do is up your defense budget and sophistication.

We have considered moving to a Hetzner dedicated server so that we have a huge amount of computational overhead against this at a very low cost. This means that we need to re-engineer some things from AWS, which makes it a pain in the buttocks.
 
I feel like most general things you can slap on to a xenforo install are going to be inadequate soon and something you can cater to Xenforo better will eventually become a mandatory technology. This high volume of bot garbage threatens the internet's existence and all you can do is up your defense budget and sophistication.
Fully agree and - to put this in a different perspective: Defense against AI bots will become a major feature for forum software and therefor could be also a business opportunity for XenForo. As this cannot be a turnkey solution but needs constant adjustment they could (in theory) offer a very well working solution as part of their cloud plans - which would be a very huge argument for using the XF-cloud and therefor a massive business case. In practice, sadly, XF with high probability lacks the human and financial resources to implement something like that. On a positive note that leaves me self-hosted then and safes on money. :)

Btw.: XF itself seems also to suffer from bots and have no working way against them (or they simply don't care), judging from the ratio between guests and users on the forum and the massive rise in guests in that statistic over the last months. Pic from early morning today:

Bildschirmfoto 2025-09-04 um 08.45.47.webp


  • entire ranges of AWS Singapore permabanned

moving to a Hetzner

we need to re-engineer some things from AWS, which makes it a pain in the buttocks.
Doesn't it feel a bit strange to pay money to the very same company for hosting that you have to fight against b/c they do provide the infrastructure for those that are about to destroy your forum and earn money from them? Hetzner is in the same boat btw: Loads of scraping attempts coming from them and a totally useless abuse desk.

There should be a white hat provider available - I see no benefit in feeding a company that bites you.
 
Last edited:
Yes, i very much don't like the fact that i'm buying hosting from a company that's not regulating their own network. We expect AWS to be a little more classy than that.

Btw.: XF itself seems also to suffer from bots and have no working way against them (or they simply don't care), judging from the ratio between guests and users on the forum and the massive rise in guests in that statistic over the last months. Pic from early morning today:

Yeah. My forum ( https://endless-sphere.com ) is within 1% of the post count of Xenforo so it's a good comparison.

Here's whatever Xenforo is using versus my set of fail2ban rules:

1757112909409.webp

My number tends to be in the 2000's lately, but it was 20000+ a month ago before the last round of tuning.

Someone really should sell a bot protection package. Or at least describe a working cloudflare setup.
Something that can give non-expert server administrators a headstart.

I had 2 companies paid me to advise on fail2ban, so on my end, i'd be willing to sell the IP + advisement to anyone for a very reasonable price.
I'd love to give out my recipe online but i don't want AI bots training on it and therefore making it useless and fast forward to the time i have to develop a more advanced security mechanism.
 
Yeah. My forum ( https://endless-sphere.com ) is within 1% of the post count of Xenforo so it's a good comparison.

Here's whatever Xenforo is using versus my set of fail2ban rules:

1757112909409.webp


My number tends to be in the 2000's lately, but it was 20000+ a month ago before the last round of tuning.
My forum is a tiny baby in comparison, the most guests I ever had was 1800 back in April this year. This was because of a scraping attack coming from Hetzner. In March this year I had started a quite intensive journey of investigations, experiments and counter measures against scrapers, costing endless hours. As I am on shared hosting and do not use cloud flare (and do not want to) my points of leverage are pretty limited. But sucessful, as it seems. Typically my statistic now looks like that:


Bildschirm­foto 2025-09-06 um 11.25.33.webp

I hardly ever have more than 80 guests and/or at best 15 robots at peak and the ratio members/guests is barely ever beyond 1:5. Before that I had a couple of hundreds of guests quite regularly along with 40-100 robots.

It was a lot of work over the first weeks and months and still needs a limited level of continuous maintenance and adjustment, but overall it is pretty stable now. I've described the earlier parts of my journey here:


A lot has happened since then and an update is overdue. However, as a rough conclusion: While being far from perfect and a lot of time involved it is not all too difficult to bring down bots massively. There clearly is a limit to my attempt the more clever the bots behave - so a certain degree of them surely are undetected now already.

However: Having the luck to have started my attempts kind of early enough when the high amount of reports about a massive rise in guests/scrapers/AI bots in the forum here started in about early July (and has not stopped ever since) I do not see anything like that on my forum. It may be just luck, but maybe also part of the measures I implemented.

Someone really should sell a bot protection package. Or at least describe a working cloudflare setup.
Something that can give non-expert server administrators a headstart.

Absolutely. Too many forum administrators still focus on registrations and so do most of the add ons. In the meantime, guests aka scraping bots have become the bigger threat in my opinion. It should be relatively easy to create an add on that can be configured to block access to the forum (and not just to the registration) on a country, ASN or IP level (and possible other criteria as well) along with the option of proper logging and filtering the logs. This would help a lot already w/o any need for intelligence in the add on needed - it would ease up what I did on the shell with traditional unix tools dramatically.

Obviously, there are many ways for a more fancy and more intelligent solution, but XF does lack even the plain basics, so having an easy to access toolkit for them would already improve the situation dramatically.
 
Back
Top Bottom