XF 1.5 Googlebot eating bandwidth crawling /index.php?find-new/

ElgrandOC

Member
Guys,

I had my website account suspended due to excessive resource usage yesterday. I've just been in and checked my cPanel account today and CPU usage is constantly over 80% and bandwidth consumed is 22.6gb in 6 days (yes, GB not MB).

When I've looked into it, it's all been consumed by Googlebot.

I've been into my "Recent Visitors" log to see what IP addresses have been accessing what web pages on my server, and I'd say 95% of the hits are like this one "/index.php?find-new/1915589/posts&page=6"

Can I use robots.txt to disallow /index.php?find-new/ ???

I've had to block the IP of all Googlebots for now to stop it consuming my bandwidth or earning me another account suspension but I don't want to keep Google blocked eternally.

If the above would not work, what can I do to stop it crawling the "new posts" URL as this is obviously going to be constantly changing.
 
Can I use robots.txt to disallow /index.php?find-new/ ???
Yes.
Code:
User-agent: *
Disallow: /account/
Disallow: /admin.php
Disallow: /attachments/
Disallow: /conversations*
Disallow: /cron.php
Disallow: /find-new/
Disallow: /goto/
Disallow: /login*
Disallow: /logout*
Disallow: /lost-password*
Disallow: /members/
Disallow: /online/
Disallow: /posts/
Disallow: /proxy.php*
Disallow: /resources/*/download
Disallow: /search*
 
I just wanted to add, we went from 5-7K pages crawled by Googlebot/ day to 1.9M yesterday on a forum with ~36K indexed pages. That was around 20GB worth of data transfer.

The other major issue is that the nginx logs were pushing about 20MB/ hour. Unless you are on a low-end SSD, that is not too much. What you can run into is where the writes will fill up a disk if you are not rotating logs or using a remote syslog.

Once the disk fills, your forum will go down.

It seems like this is new Googlebot behavior March/ April 2017.
 
Top Bottom