Crawl Budget Stats in Google Search Console

arn

Well-known member
Crawl budget has been a point of discussion for forums. Essentially, how much of Google's crawl of your site is "wasted" on non Response 200 results.

I'm curious what people's stats are for Xenforo forums specifically.

If you go to Google Search Console and go to Settings -> Crawl Stats, you should see this breakdown. So, of all our crawls on our forums, 69% are proper results. Whereas 14% are 301 redirects.

I don't believe 304 is a big deal.

Screen Shot 2020-11-30 at 4.52.06 PM.webp
 
I have a ton due to https switch over, as well as using www's in the past. So all those old internal links get redirected, sometimes twice :(

a query to update that is pending, but i don't see it as being a massive determent

crawlstats.webp.


I should also mention that I have a LOT of 301's where it looks like the bot is just trying IDs, eg

.com/threads/nnnnn/

hits which of course redirect to

.com/threads/title-here-nnnnnn/
 
@arn your crawl budget changed over time?

Im trying to evaluate ours with 20% 4XX errors... for me 20% 4XX is a lot only for Xenforo. But maybe it isn't. Suggestions?

googlecrawl.webp
 
@arn your crawl budget changed over time?

Im trying to evaluate ours with 20% 4XX errors... for me 20% 4XX is a lot only for Xenforo. But maybe it isn't. Suggestions?

View attachment 250477

That's a lot, imo. Are you pointing to private / member-only sections? That would be permissions errors. Maybe also make sure your sitemap isn't pointing to stuff that's not publicly crawl able.

Here's my latest:

Screen Shot 2021-04-18 at 3.12.11 PM.png

200 actually went down. Not sure why.

But 301 is better, which was intentional.

Not sure if there's anything to do about 304s.

arn
 
That's a lot, imo. Are you pointing to private / member-only sections? That would be permissions errors. Maybe also make sure your sitemap isn't pointing to stuff that's not publicly crawl able.

Ok, ill check, thanks.

Not sure if there's anything to do about 304s.

304 are "non modified content" so cached versions that Google can use, nothing to worry about.

I saw you changed your sitemap, in the end it helped with the Crawling Stats? and how did you do that?
 
Im trying to evaluate ours with 20% 4XX errors... for me 20% 4XX is a lot only for Xenforo. But maybe it isn't. Suggestions?
I had about 10%, now it's 1%.
I tried that new "reply before registering" feature. Then I turned it off. But Google picked already "reply" links, and a crawler recieved 4xx.
Also if you have "no" for "can see a member profile" for unregistered, that would return 4xx to a crawler, too.
 
I had about 10%, now it's 1%.
I tried that new "reply before registering" feature. Then I turned it off. But Google picked already "reply" links, and a crawler recieved 4xx.
Also if you have "no" for "can see a member profile" for unregistered, that would return 4xx to a crawler, too.

I use the reply before registering feature. Yes my members profile are private by default.

Should be possible to optimize and clean up this report so i can see if there are problems on our site.
 
Replying to this thread to remind everyone to check their robots.txt file regularly and be sure that it's up to date.

While checking this report I noticed that Google was crawling literally millions of /forum/search/* pages on my site: a complete waste of resources and something that was clearly destroying my crawl budget. The problem came from an outdated robots.txt that I had taken for granted and had not updated since before I upgraded to XenForo 2.0. Such a stupid oversight on my part.

Since making the change a few hours ago, it has been interesting to monitor the server. At first, Googlebot requests dropped sharply, now they have steadily risen again to meet and even exceed previous volume. Watching the logs shows that it is actually looking at threads, media files, and old WordPress articles, as it should (instead of worthless search results pages).

Anyway... here are my Crawl Requests. I will need to check back here in a few weeks to see how this changes as it re-crawls my site.

Screen Shot 2022-01-11 at 9.02.47 PM.webp
 
Top Bottom