XF 1.5 Forum getting hammered

Codelizard

Member
Hi people,

So my xf isnt too big. About 250k posts, 20k threads. It's been doing pretty good for awhile with no complaints.

Starting a couple of weeks ago, I was told that the forum is running REALLY slow. Each time i heard this, i took a peek, and everything seemed to be running ok. So I didnt worry about it.

Today, I logged in and its slower than dirt. So I started investigating. Now, I already have the forum hidden behind cloudflare. I dont know what this means nowadays, but they used to do a decent job of mitigating issues even on the free accounts.

My server (which is overkill. 24 processors, 24gb of ram) is running at a load of 40. And has been for days, apparently. I checked my disk io, and I'm only at a 2% utilization.

Looking in TOP, I can see that the major CPU hog in MySQL at 60% cpu. Apache is using CPU on several threads, but not as much as MySQL.

Now, I did find that my xf_search table was RIDICULOUS. It was (no exaggeration) 40gb in size. Clearing that was an adventure in itself. Whatever is doing this is filling that table fast enough that the daily cron fails to delete it. I ended up stopping apache and truncating it directly under no load.

While that freed up disk, it made no difference. As soon as apache started, load picked right back up immediately.

So then I installed apachetop, and I see this:

REQS REQ/S KB KB/S URL
12 0.80 96.2 6.4*/css.php
5 0.38 0.1 0.0 /deferred.php
3 0.33 0.2 0.0 /login/csrf-token-refresh
2 0.33 0.1 0.0 /forum/cometchat/cometchat_receive.php
2 0.25 4.6 0.6 /data/avatars/m/22/22544.jpg
2 0.20 24.4 2.4 /threads/stuck-in-formatting-loop.30285/
2 0.67 34.4 11.5 /forum/archive/index.php/f-5.html
1 0.07 0.0 0.0 /find-new/124893312/posts
1 0.07 16.4 1.1 /find-new/738719/posts
1 0.07 15.8 1.1 /find-new/738673/profile-posts
1 0.07 16.4 1.1 /find-new/738710/posts
1 0.07 0.0 0.0 /find-new/28028413/posts
1 0.07 10.4 0.7 /members/survivalinstinct.34459/
1 0.07 0.0 0.0 /find-new/22207552/posts
1 0.07 15.8 1.1 /find-new/738447/profile-posts
1 0.07 0.0 0.0 /find-new/77705426/posts
1 0.07 16.4 1.1 /find-new/738749/posts
1 0.07 16.4 1.1 /find-new/738734/posts
1 0.07 0.0 0.0 /find-new/22518003/profile-posts
1 0.07 0.0 0.0 /find-new/98576/profile-posts
1 0.07 0.0 0.0 /find-new/23597314/profile-posts
1 0.07 16.4 1.1 /find-new/738752/posts
1 0.07 0.0 0.0 /find-new/116226179/posts
1 0.07 16.4 1.1 /find-new/738743/posts
1 0.07 0.0 0.0 /find-new/10969598/profile-posts
1 0.07 0.0 0.0 /find-new/26424799/posts
1 0.07 16.4 1.1 /find-new/738711/posts
1 0.07 16.4 1.1 /find-new/738709/posts
1 0.07 0.0 0.0 /find-new/70344158/posts
1 0.07 0.0 0.0 /find-new/123710024/posts
1 0.07 0.0 0.0 /find-new/71623961/posts
1 0.07 0.0 0.0 /find-new/77265739/posts
1 0.07 16.4 1.2 /find-new/738722/posts
1 0.07 16.4 1.2 /find-new/738689/posts
1 0.07 0.0 0.0 /find-new/11461655/posts
1 0.07 0.0 0.0 /find-new/90225/profile-posts
1 0.07 16.4 1.2 /find-new/738625/posts
1 0.07 0.0 0.0 /find-new/70258774/profile-posts
1 0.07 16.4 1.2 /find-new/738803/posts
1 0.07 0.0 0.0 /find-new/26533205/posts
1 0.07 16.4 1.2 /find-new/738781/posts
1 0.07 16.4 1.2 /find-new/738723/posts
1 0.07 0.0 0.0 /find-new/73366087/posts
1 0.07 1.5 0.1 /data/avatars/s/26/26317.jpg
1 0.07 16.4 1.1 /find-new/738740/posts
1 0.07 15.8 1.1 /find-new/738759/profile-posts
1 0.07 16.4 1.1 /find-new/738733/posts
1 0.07 0.0 0.0 /find-new/125368850/posts
1 0.07 0.0 0.0 /find-new/115415279/posts
1 0.07 16.4 1.2 /find-new/738762/posts
1 0.07 16.4 1.2 /find-new/738758/posts
1 0.07 16.4 1.2 /find-new/738537/posts
1 0.07 16.4 1.2 /find-new/738725/posts
1 0.07 15.8 1.1 /find-new/738665/profile-posts

MOST of the posts hitting me constantly are /find-new/ posts. And Im getting thousands of them. My 'xf_search' table is growing by a thousand every few seconds.
So, I figured hey, lets pay for the ES plugin, and offload the searches from MySQL to Elasticsearch.

So I did that. Dropped the coin, installed the plugin, installed ES, and built the index.

No. Difference. At. All.

So, a little more data:

TOP
top - 19:27:32 up 396 days, 22:32, 2 users, load average: 33.56, 33.48, 35.07
Tasks: 745 total, 29 running, 716 sleeping, 0 stopped, 0 zombie
Cpu(s): 81.1%us, 3.5%sy, 0.0%ni, 15.2%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 24596152k total, 24182356k used, 413796k free, 179196k buffers
Swap: 10296316k total, 25268k used, 10271048k free, 13947244k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20964 mysql 20 0 25.5g 3.7g 6212 S 61.5 15.9 137:11.81 mysqld
22726 apache 20 0 345m 17m 4544 S 18.0 0.1 7:55.09 httpd
22445 apache 20 0 349m 20m 5548 S 17.7 0.1 8:03.49 httpd
22665 apache 20 0 345m 17m 4784 S 17.3 0.1 8:04.45 httpd
23021 apache 20 0 345m 17m 4768 S 15.7 0.1 8:00.84 httpd
22751 apache 20 0 356m 27m 5448 S 15.0 0.1 7:58.32 httpd

Just shows my cpu usage.

Apachetop
last hit: 23:18:10 atop runtime: 0 days, 00:00:15 23:18:11
All: 1619 reqs ( 107.9/sec) 11.6M ( 789.3K/sec) 7487.9B/req
2xx: 700 (43.2%) 3xx: 917 (56.6%) 4xx: 2 ( 0.1%) 5xx: 0 ( 0.0%)
R ( 15s): 1619 reqs ( 107.9/sec) 11.6M ( 789.3K/sec) 7487.9B/req
2xx: 700 (43.2%) 3xx: 917 (56.6%) 4xx: 2 ( 0.1%) 5xx: 0 ( 0.0%)
The above shows Im consistantly getting about 107 requests per seconds.

I have checked, and ALL of these are coming thru cloudflare as they should.

I went to cloudflare and changed the setting to "I am being attacked", which puts up a javascript page to users to prove that their human before letting them to the site. Did not change anything. Actually, the load on the servers went up 5 points when I did this. Probably coincidence, but it did not help anything.

Im currently using:

Xenforo 1.15.15
Mysql 5.5.49
Elasticsearch 6.8.10
Php 5.5.36
Centos 6.8
Apache 2.2.15

So, I know some of those versions are behind, and I do plan on upgrading to XF 2.0 some time in the very near future, but am looking for some may to try to mitigate the issue I'm having now. Any assistance would be appreciated :)
 
Try disabling search for guests to see if that makes any difference.

Other than that you may need to use iptables to block at the server level.
 
Thanks, I'll try that disabling search.

I'm already running IPTABLEs scripts that only allows http/https connections from cloudflare, so Im not sure what else to do there.
 
Ok, so disabling the search for visitors did nothing. Take a look at this:

mysql> select distinct(search_type), count(1) from xf_search group by search_type;
+-------------------+----------+
| search_type | count(1) |
+-------------------+----------+
| | 3 |
| new-profile-posts | 16449 |
| recent-posts | 82486 |
| user | 5 |
+-------------------+----------+
4 rows in set (0.50 sec)

Thats my xf_search table after 20 minutes of activity. My forum is not THAT popular.

Could it be because of the "new posts" sidebar that i have on forum home?
 
You should look at your logs, and see what's hitting you. If it is whats-new that's being hit so often, you can see if it's a single ip, or a bot, or something like that.

In Cloudflare you can deny access to that url for now to see if that fixes it.

arn
 
If you look at my first post, its not just on url, its these:

1 0.07 0.0 0.0 /find-new/10969598/profile-posts
1 0.07 0.0 0.0 /find-new/26424799/posts
1 0.07 16.4 1.1 /find-new/738711/posts
1 0.07 16.4 1.1 /find-new/738709/posts
1 0.07 0.0 0.0 /find-new/70344158/posts
1 0.07 0.0 0.0 /find-new/123710024/posts
1 0.07 0.0 0.0 /find-new/71623961/posts
1 0.07 0.0 0.0 /find-new/77265739/posts
1 0.07 16.4 1.2 /find-new/738722/posts
1 0.07 16.4 1.2 /find-new/738689/posts
1 0.07 0.0 0.0 /find-new/11461655/posts
1 0.07 0.0 0.0 /find-new/90225/profile-posts

And theyre all coming thru cloudflare. I cant tell them from legit traffic, yet.
 
So, I figured it out. Finally. I installed mod_cloudflare so i could identify exactly which hosts were hitting us. And... i found the culprit. And blocked it in cloudflare. And my the load on my server is down less than 1. I blocked range: 66.249.0.0/16 in cloudflare.

So, the problem, is that it was googlebot. But why googlebot is hitting us 100,000 times in 20 mins constantly and repeatedly for days on end is the mystery.

It almost seems like someone has figured out how to weaponize the googlebot. Which is a scary thought.

So good news:
- Issue is mitigated.

Bad news:
- Im blocking googlebot, and google is one of our best sources of traffic.

So, Im wondering how I can solve the issue from google's end, so googlebot works correctly.

Thoughts, anyone?
 
there can be fake google bots which you can use Cloudflare WAF (pro plan or higher) and CF Firewall rules to differentiate (via either cf threat score, cf known bot or Enterprise plan Bot Management scores)


How does Firewall Rules handle traffic from known bots?​

Caution about potentially blocking bots​

When you create a firewall rule with a Block, Challenge (Captcha), or JS Challenge action, you might unintentionally block traffic from known bots. Specifically, this might affect search engine optimization (SEO) and website monitoring when trying to enforce a mitigation action based on URI, path, host, ASN, or country.

See How do I create an exception to exclude certain requests from being blocked or challenged?

Bots currently detected​

The table below lists known bots that Firewall Rules currently detects. When traffic comes from any of these bots, the cf.client.bot field is set to true.
 
If it is googlebot, you should look at webmaster tools to see if it's complaining about anything.

If it is Googlebot, robots block /find-new/ and that would prevent it from hitting you so hard while you figure it out
 
Are you sure it's googlebot?
66.249.0.0 shows as
WHOIS Source: ARIN
IP Address: 66.249.0.0
Country:
us
USA - Massachusetts
Network Name: BIZLAND-FC01
Owner Name: The Endurance International Group, Inc.
CIDR: 66.249.0.0/19
From IP: 66.249.0.0
To IP: 66.249.31.255
Allocated: Yes
Contact Name: The Endurance International Group, Inc.
Address: 10 Corporate Drive, Suite 300, Burlington
Email: *******@endurance.com
Abuse Email: eig-abuse@endurance.com
Phone: +1-877-659-6181
 
Actual IPs were close to:

66.249.65.125 ... When i did a look:

Source Registry ARIN Net
Range 66.249.64.0 - 66.249.95.255
CIDR 66.249.64.0/19
Name GOOGLE

So I suppose I could have used a more accurate CIDR, just in the middle of the night the /16 looked fine.

I'm going to update to the above.
 
Top Bottom