XF 2.3 AI crawlers causing xf_session_activity table to reach limit

bottiger

Active member
So my forum is now getting hammered by AI crawlers causing xf_session_activity to fill and stop the forum from loading. I've gotten over 10,000 unique ips.

Changing the table type to innodb is just a temporary bandaid.

I already have cloudflare blocking AI crawlers enabled but it doesn't work. The user-agents are all spoofing real browsers and the ips are all residential proxies.

The only thing that can stop them is to enable javascript verification for all users but this greatly annoys users.

Does anyone have a solution?
 
I already have cloudflare blocking AI crawlers enabled but it doesn't work. The user-agents are all spoofing real browsers and the ips are all residential proxies.
How do you know that these are AI crawlers in the first place (and not something different)? Not saying it ain't AI crawlers - but what makes you so sure?

Does anyone have a solution?
Did you bother to read the forum or use the forum search? There have been lots of reports about AI bots on the rise, causing all sorts of problems, over the last couple of months here on the forums. Within these threads various ways of dealing with it have been described and also the pitfalls that may occur.

What exactly makes your situation different from that of any other that none of what has been discussed and described in the existing threads does fit your situation so that you have to start another, different thread?

And if it is completely different (so different, that you don't even need to mention the existing ways of dealing with it) - back to the first question: If whatever affects your forum is totally different from the patterns that AI crawlers have shown until now - what makes you sure that it is AI crawlers?
 
Could you just block a few main ones in ht access? Sometimes a lot of them are all from the same place so blocking the main ones can reduce it a lot. I blocked bytedance and bytespider. Although I had nothing like the amount you have. They are bots yes but probably not all AI bots.
 
Could you just block a few main ones in ht access? Sometimes a lot of them are all from the same place so blocking the main ones can reduce it a lot. I blocked bytedance and bytespider. Although I had nothing like the amount you have. They are bots yes but probably not all AI bots.

There are no "main ones". They are using a rotating residential proxy so they get a new ip each request and they are all spoofing real user-agents.
 
I use fail2ban to watch apache logs and apply rules.

Today since Xenforo doesn't include this kind of protection against extreme amounts of bots, you either need to use cloudflare or fail2ban ( protection running on your machine )

It's interesting to know that cloudflare isn't holding up.
The scraper bots are very sneaky lately and very covert.
The only thing that works against them is specialized rate limiting and filtering based on what URLs they are repeatedly hitting.
Not sure if you can't implement that in cloudflare. I won't use cloudflare because it often inconveniences users and i don't like giving all this information about my web traffic away for free when i don't have to.

For a temporary expansion of your sessions table, put this in your mysql configuration:
/etc/mysql/mysql.conf.d/mysqld.cnf:
#xenforo uses in-memory tables; default is 16M and these tend to fill up quick, so we need to boost their size. - DS
tmp_table_size = 64M
max_heap_table_size = 64M
 
Last edited:
There are no "main ones". They are using a rotating residential proxy so they get a new ip each request and they are all spoofing real user-agents.
There could well be a main one or two behind it all - block those and the rest don't work. Just my two pennorth. It could be worth a try and see what happens.

I used the code from this thread. A lot of "guests" can be part of one main bad bot. Block one and loads of others disappear.

 
Just use Cloudflare's Ai blocking tools, you should be able to use them on shared hosting as well, since everything is managed though Cloudflare's dashboard. They have a new blocking method that modifies your robots.txt file for you as well.

 
Back
Top Bottom