Higher than normal server load due to increased http requests - how to discover what/who?


Formerly CyclingTribe
I don't think Google realises I'm not on a fancy shmancy server ... I've turned the crawl rate down after seeing this:


I shouldn't complain really if Google wants to index us at a 55 page per minute average - but my five year old server isn't the most powerful little box in the server farm and I'd prefer my members to have a good, fast experience over having the latest posts indexed in Google in minutes.

We'll see how it goes over the next couple of days - there may be other things going on too. (y)

lol ... and I'll keep saving up for a new server. :D


Formerly CyclingTribe
Well things are a little better this morning - loads are lower and my sites are loading faster. However, I'm still seeing spikes in apache processes being spawned so I don't think Googlebot is the real problem, just a symptom of it.

I shall keep investigating. (y)


Formerly CyclingTribe
I think I've found the source, and if I'm correct (I'm leaving it to soak for a couple of days) it's internal - which is why it wasn't showing up when I was checking all the incoming (external) connections! ;)


Formerly CyclingTribe
What was it? (Could you PM if you don't want to say publicly?)
I don't mind saying publicly - it's not an XF issue BTW ;) - but I want to let it soak for a while, let people's caches expire/refresh, monitor the CycleChat logs etc. just so I can be sure I've hit the nail on the head.

I think I have - I made the changes late last night and so far today the load is much lower, connections are greatly reduced, and when I "tail" the acces_log I'm not seeing the multitude of requests that I was - but my busiest time is around 9pm in the evening so I want to see what happens tonight before I call it "confirmed". :D


Formerly CyclingTribe
Yes. (y)

I spent weeks looking at external requests/connections, blacklisting selected IPs, finding crawlers I didn't even know existed (which belonged to companies who sell data on my site that they get from using my bandwidth - well, not anymore - they made the blacklist in double-quick time ... lol), turning Google's crawler down and down, trying any number of netstat combinations to breakdown the top requesting IP addresses and then individually checking each one to see whether it was from a legit source or not (and surprisingly they all were) and basically getting nowhere - I just couldn't see "who" was making all the requests.

I then turned to Apache. I hadn't done much of anything with Apache for five years (when my server had a handful of very small sites and just 2GB RAM). It now has 8GB and is much busier so when I looked at the Apache processes I found a lot were being spawned at quite a rate so I adjusted StartServers and SpareServers and bumped them up by quite a large number (5 to 40 - and - 5 to 20 respectively) so that there were plenty of Apache process available. This helped and lowered the load a bit and I also adjusted KeepAliveTimeout from 1 second to 3 seconds as a good compromise between spawning lots of connections at a high rate and keeping them alive a little longer for when people were navigating the site (on the assumption that when you're navigating a familiar site you do so fairly quickly, but not as quick as 1 second per click).

It didn't solve the problem though and I was still seeing quite high loads and lots of running/spawning Apache processes and higher than normal total processes (which still had me thinking it was extra external requests). So I next looked at MySQL on the basis that it might be a bottleneck there which was causing Apache to back up and have lots open/waiting processes hanging around. I used the usual tools to check how it was doing (mysqltuner and tuningprimer) and couldn't see anything radically out of place, but decided to assess each setting, do a bit of research, and tweaked a few things; the main one being an increase in logfile size from 512M to 1GB and a reduction in size of some of the per-thread buffers.

Again this seemed to help a little but didn't fix it and only really alleviated the problem.

I was really drawing a blank. I knew that there were lots more processes running. I knew that the extra ones were mainly Apache. I knew, sadly, that pretty much all of the incoming (external) traffic was genuine, so it was really driving me nuts.

The one thing I didn't think about, because it never occurred to me to check, was that the extra requests might be coming from internal connections. By chance I remembered that my server had Analog installed so I checked the previous week's report and found that one particular file had been request 2.5 million times!! WHAT?!!!

So I tailed the CycleChat access_log with a grep for the file in question and lo and behold there they were - hundreds of requests per minute - for a CometChat file!!!! It was bloody internal. Arrgghhh!!! How could I not have spotted it??!!!

I'd installed CometChat for a few members who'd previously enjoyed the "live" chat we have with IP.Board. As part of the setup for live chat I'd added a usergroup promotion so that anyone who had been registered for more than three months was added to the chat usergroup (2,200 members). CometChat was therefore loading the chat toolbar and polling the chat service for every request made by the hundreds of regulars who visit CycleChat each day and at peak periods there were masses of these polling requests adding an overhead to the server load.

When I looked at the CometChat logs I found that only 64 people had actually tried it, and even fewer still were using it regularly, so I flipped it around. I removed everyone from the chat usergroup, told all the members what was happening and why, and then asked the people who wanted to use it to let me know and subscribed just those few - so far only 35 people.

Overnight the load reduced a little but I still had quite a lot of requests appearing in the logs. However, over the coming days as people's cache expired the requests for the file lowered to the point where the server load, even at our busiest time was below 1.00 - YES!!! Finally!!!

So, it was pretty much an oversight on my part subscribing so many members to CometChat on top of everything else the server was doing (runs a number of other sites too) and causing an overhead of wasted connections (98%+) that took me a while to find.

Load is even better than before now with the Apache and MySQL tweaks so I has happy. :D

Shaun :D