XF 2.3 Site suddenly VERY slow. Massive updates didnt help.

Codelizard

Member
Licensed customer
So, I've been running a XF 1.5 site FOREVER (10+ years). Yeah, I should have upgraded it sooner, I know. But, life happens.

So, it was running on a dedicated server with Intel Xeon E3 1230 V3 cpu (8 cores) and 32gb of ram. The forum ran here since 2013 with zero issues, with a load average never even approaching 1.

Then about a month ago, that changed. The pages wouldnt load, and the load average on the server was peaking over 900. Memory never did swap.

So, my 13 years of bliss were running apache with mod_php with elasticsearch and mariadb all on the same server. So I made changes.

Got a new server just for the DB and had a private network installed between them.
Move apache away from prefork and moved to php-fpm.
Installed Redis and added front end caching.
Installed opcache.

Each time I did one of these things, the site would become more responsive and load will drop. But then, within 24 hours, it would be completely dead again.

So I switched tactics. I moved to a much faster dedicated server. AMD Ryzen 9 with 24 cores, 128gb ram, and nvme ssd. I moved everything over but didnt switch DNS yet. I upgraded to Xenforo 2.3, and also updated the resource manager and xenporta. I "closed" the site and moved cloudflare DNS to point to the new server (i hide the servers true ip behind cloudlfare). Im now running nginx, mariadb, php8.3 fpm, etc.

For 3 days, it ran perfectly. Its closed, so only admins could see the whole site, but it was crisp, responsive, everything was nice. Then I get home today that was changed. I now cannot even log into the site its so slow. The site is still off, but not even logged in, it takes 20-30 seconds to load the "We are closed" screen. I cant even log in now.

In the meantime, the db server is FULL of the same query:

| 7130081 | xfuser | 10.0.0.5:59560 | x_forum | Execute | 0 | Writing to net | SELECT data_key, data_value
FROM xf_data_registry
WHERE data_key IN ('addOns', 'addOnsComposer | 0.000 |

Literally hundreds of the same query constantly.

The DB server is another dedicated server, 8 core, Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz, with 64gb ram and nmve disk. Its load is as .05.

The new web server sits at .3 load average.

Plenty of unused ram on both machines, iostat shows disk at 98% idle.

I upped php-fpm up to 300 threads, just to see if it would use them all, and it did immediately.

So, Im at a loss. I went from a 13 year happy forum on a quaint dedicated server and now im on a beefy server with the DB offloaded to another host and the forum is DEAD with it "closed" after 3 days of running blissfully.

Help?
 
I think you're possibly fixing the wrong things - throwing more hardware at it may not help.

I'd say you most likely have a bot issue - lots of sites are being hammered by bots in recent months and increasing resources doesn't really help the issue - it just lets the bots crawl your site more quickly.

You seem to be using Cloudflare - do you have bot mitigation turned on? I'd look at your analytics first and identify where all your traffic is coming from.

It could also be that your traffic patterns have changed (again, likely due to bots) and your configuration is no longer appropriate. It could be your database response time is too slow - or it could be php-fpm which is struggling.

I've had the same thing on one of my sites running XF 1.5 - I've spend a lot of time recently looking at low-level analytics for my server and fine-tuning my DB layer and my PHP-FPM to try and cope with the load generated by bots.

If you don't have in-depth analytics for your server, perhaps look at setting that up first. I'm using Zabbix to monitor my sites - but there are other options that do the same thing.
 
I think you're possibly fixing the wrong things - throwing more hardware at it may not help.

I'd say you most likely have a bot issue - lots of sites are being hammered by bots in recent months and increasing resources doesn't really help the issue - it just lets the bots crawl your site more quickly.

You seem to be using Cloudflare - do you have bot mitigation turned on? I'd look at your analytics first and identify where all your traffic is coming from.

It could also be that your traffic patterns have changed (again, likely due to bots) and your configuration is no longer appropriate. It could be your database response time is too slow - or it could be php-fpm which is struggling.

I've had the same thing on one of my sites running XF 1.5 - I've spend a lot of time recently looking at low-level analytics for my server and fine-tuning my DB layer and my PHP-FPM to try and cope with the load generated by bots.

If you don't have in-depth analytics for your server, perhaps look at setting that up first. I'm using Zabbix to monitor my sites - but there are other options that do the same thing.

I think Sim is onto something.

Are you behind Cloudflare? If you are, look at the Cloudflare stats and see who's connecting to your server, from where, how many, etc.

If not, run a command like this:

netstat -anp |grep 'tcp\|udp' | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n

That should tell you which IP's are connected to your server and how many connections each IP has. If you have something over like 50 on any of the IP addresses, it's likely bots hammering your site. Do an nslookup on the IP's and see where they come from, if they are a bot, etc. and start null routing or blocking them. You might be getting hammered by a single IP address.

You need a lot more info before you can solve this problem well.. If you don't have a ton of technical knowledge I'd start by putting the site behind Cloudflare. You will need to pass the real IP to XF1 though as it doesn't have built in cloudflare detection and you won't get the users real IP without changes there.
 
Was there any recent modifications/changes done to either the forum or server side before the issues starting to happen?

On a side note, have you considered xenforo 's Cloud hosting?

So first, no. No changes to the forum, or the server it was hosted on. It went from being responsive to being 20+ second load times seemingly overnight. Load averages grew in the beginning from < 1 to 300+. Over time, it got higher, surpassing 900. Someone else asked if I had turned on bot mitigation in cloudflare. I did, and that helped temporarily.

And no, havent considered the cloud hosting. We had a bunch of customizations in 1.5 which we want to add back in 2.3, plus, we run a site that could be considered controversial so id rather not crawl down that rabbit hole.
 
So first, no. No changes to the forum, or the server it was hosted on. It went from being responsive to being 20+ second load times seemingly overnight. Load averages grew in the beginning from < 1 to 300+. Over time, it got higher, surpassing 900. Someone else asked if I had turned on bot mitigation in cloudflare. I did, and that helped temporarily.

And no, havent considered the cloud hosting. We had a bunch of customizations in 1.5 which we want to add back in 2.3, plus, we run a site that could be considered controversial so id rather not crawl down that rabbit hole.

If you are behind Cloudflare, enable all the AI Bot Blocking, including their robots.txt additions. Essentially, tell Cloudflare to block all AI scrapers. You can leave the regular search engines with access.

Then, start looking at access to your site by country. Most of the really bad bots for us are coming from China, Brazil, Vietnam, Singapore, and a few other countries. We just block them entirely as we are primarily US/Canada based traffic. You can set those up to do a CAPTCHA if you don't want to block them entirely.

That will likely fix the issue and it's ******** traffic that won't monetize anyway.
 
If you are behind Cloudflare, enable all the AI Bot Blocking, including their robots.txt additions. Essentially, tell Cloudflare to block all AI scrapers. You can leave the regular search engines with access.

Then, start looking at access to your site by country. Most of the really bad bots for us are coming from China, Brazil, Vietnam, Singapore, and a few other countries. We just block them entirely as we are primarily US/Canada based traffic. You can set those up to do a CAPTCHA if you don't want to block them entirely.

Yup, I did that a few weeks ago (the ai bot setting). And last night, I saw a high volume of my traffic was coming from Singapore, so i blocked it completely. Didnt help much.
 
Ok, paid for cloudflare logs and found the issue. I was being hammered by an iprange in APNIC, all of the requests having no referrer and really odd urls (to me). Like, they were parts of several other urls, sometimes repeating. Like:

/ps3-news/1662/tags/henkaku/tags/homebrew/posts/11551/posts/12449/members/jayglass.110318/

Anyways, I blocked the range and i now have 3 db connections (instead of hundreds) and the site is quick and stable.

Thanks for everyone that responded.
 
Back
Top Bottom