XF 1.4 Googlebot filling 'search' table with tons of data, leading to a database crash

mm55mm

Member
Googlebot has been actively accessing our Recent Posts page (multiple times per second). This translates to a heavy load on our database, because xenForo adds a record to the 'search' table each time Googlebot hits the page.

xenforo-search.png

We migrated to a new server in order to cope with this load. However, Google has been continuously increasing the rate of scraping, averaging more than 10 times per second recently and this presented a new problem. As the 'Daily Clean Up' cron tries to purge the old entries from the 'search' table, it executes a query needing more than 100 seconds to complete. When this query hits, either the Elastic Search service fails or even worse, our whole database crashes.

How to deal with this issue? Is it possible to disable the continuous addition of new entries in the 'search' table in Googlebot's case?
 

Chris D

XenForo developer
Staff member
You'll want to add a robots.txt file.

Here's an example: https://xenforo.com/robots.txt

Note we have the following:
Code:
Disallow: /community/find-new/
That should prevent it.

It's safe to empty/truncate the xf_search table using PhpMyAdmin to get around the Cron problem.
 

mm55mm

Member
Thank you for the prompt answer!

We truncated the xf_search table, but it is growing fast.

Does disallowing the new posts page with robots.txt negatively affect Google indexing the site in any way? Do you have any info on why does Google crawl this page so aggressively? I wouldn't want to hurt our search rankings.
 

Chris D

XenForo developer
Staff member
It's not the first time I've seen it recently, but it's also not something we've had reported or notice with any regularity before. Something has changed, likely on Google's end.

There's no logical reason for Google to be crawling the New Posts / Recent Posts page so disallowing the find-new link shouldn't have any adverse effects.
 

mm55mm

Member
A brief follow-up:
The 'Daily Clean Up' cron runs swiftly after truncating the xf_search table. Thanks!
 
Top