Decrease in Google indexed pages

Joe Blow

Active member
My forum always used to rank pretty well. However, things have changed in the last few months and I'm trying to figure out if it's just me if if it is more widespread.

I've experienced a dramatic decrease in the amount of pages indexed by Google (as shown in Google Search Console) and I'm wondering if other Xenforo forum owners have been experiencing similar declines?

There was a mid-December algorithm update that seemed to hit my forum fairly hard. I migrated from vBulletin 4 to XF in December 2016 and did all the 301 redirects correctly. The forum hasn't changed that much, so the decline seems a bit inexplicable. Google is a bit inexplicable itself so I understand I may just be a victim of an algorithm tweak.

I'd be interested in the experiences of other Xenforo forum owners, as well as any advice about what may be causing this.

Thanks!
 

Joe Blow

Active member
Sounds like that sitemap bug
I thought that was fixed. I uploaded the patched file and Google search console seemed to then accept the sitemap. However, I must admit that the problems first started when the sitemap problem first happened (in August?) and it hasn't gotten any better since then in spite of the patch.
 

423

Member
I can confirm this too. I only upgraded my production site within the past week, and was running the same version of XF 1.5 for most of last year. In August, I began a steep decline in indexed pages and I'm at next to nothing at the moment. Sitemaps are working fine and there are no errors to speak of. This is the case for both XF sites of mine.
 

jgaulard

Active member
Hi - I don't know what your robots.txt file looks like, but my sites have been suffering from a low page indexation rate since I installed XenForo (2019 - transitioning from another CMS). I had a hunch that it was because of Google crawling a whole lot of unnecessary pages and I think I was right. I have two articles for you to refer to:


Apparently, crawl budget is big these days, even on smaller websites.

Up until mid November 2021, I didn't block much in the robots.txt file, but on November 9, 2021, I decided to give some blocking a try. I blocked the /member/ and /attachment/ pages. Those are very important to block. They used to give 403 errors (because I don't allow guests to see them), but now they aren't crawled at all. Other very important directories to block are /whats-new/, /search/, /goto/, and /posts/. I also blocked the /threads//post and /thread//latest URLs.

This is what I'm seeing so far. It's only been a month and some change.

Crawl rate is beginning to pick up. These are the 200 pages.

crawl-requests.gif

Crawled - Currently Not Indexed pages are going down, which is good.

currently-not-indexed.gif

And most importantly, valid pages are increasing.

valid-pages.gif

This same thing is occurring across eight of my forums. I did the same thing to all of them.

Check this one out. This is a different site that's been languishing for years. Take a look at the valid pages jump.

valid-pages-2.gif

If you're interested, this is what I blocked in my robots.txt files.

User-agent: *
Disallow: /forum/account/
Disallow: /forum/admin.php
Disallow: /forum/attachments/
Disallow: /forum/conversations/
Disallow: /forum/find-threads/
Disallow: /forum/forums/*/create-thread
Disallow: /forum/forums/*/post-thread
Disallow: /forum/goto/
Disallow: /forum/job.php
Disallow: /forum/login/
Disallow: /forum/logout/
Disallow: /forum/lost-password/
Disallow: /forum/members/
Disallow: /forum/misc/
Disallow: /forum/online/
Disallow: /forum/posts/
Disallow: /forum/profile-posts/
Disallow: /forum/register/
Disallow: /forum/search/
Disallow: /forum/search-forums/location
Disallow: /forum/threads/*/add-reply
Disallow: /forum/threads/*/approve
Disallow: /forum/threads/*/draft
Disallow: /forum/threads/*/latest
Disallow: /forum/threads/*/post
Disallow: /forum/threads/*/reply
Disallow: /forum/threads/*/unread
Disallow: /forum/whats-new/

My goal was to not allow Googlebot to crawl any page that I didn't want in the index.

I hope this information helps you. If you're using noindex on some pages and are allowing them to be crawled by Google, I can tell you that I've never had any luck doing that. I've always seen ranking drops when I allowed pages like that to be crawled. It's a waste of crawl budget. In my opinion, it's always better to block in the robots.txt file. In this case, I haven't seen any ranking increases, but I'm hopeful that will happen during the next few Google updates.
 

jgaulard

Active member
I just wanted to make an update to the above post. Google Search Console was refreshed a day or two ago and it seems as though the positive trend continues. Take a look at these graphics for one of my sites. Again, I blocked many directories that shouldn't be crawled and it appears that valid pages are being indexed at a rapid rate because of that. I can't be certain of any of this because Google is a strange beast, but there certainly does seem to be a relationship between blocking pages in robots.txt that don't need to be crawled and Google crawling and indexing the good pages. I'm beginning to think that crawl budget is a real thing.

Take a look at last week's Valid Pages graph.

valid-pages-2.gif

Now take a look at this week's.

valid-pages.gif

Take a look at the Blocked by Robots.txt graph.

blocked-by-robots.gif

And finally, check out the Discovered - Currently Not Indexed graph.

discovered-not-indexed.gif

I've long suspected that the more lousy pages Google crawls, the more pages Google tosses out of the index. So basically (in my opinion - based on my incessant testing), if you allow Google to crawl lots of internal 301 redirects, noindexed pages, and 403 pages, even if it does crawl your good pages, they won't be indexed because Google feels like your site isn't worth having those pages indexed. That's just my take on things. I may be completely off here, but it's hard to argue with these graphs.
 
I just wanted to make an update to the above post. Google Search Console was refreshed a day or two ago and it seems as though the positive trend continues. Take a look at these graphics for one of my sites. Again, I blocked many directories that shouldn't be crawled and it appears that valid pages are being indexed at a rapid rate because of that. I can't be certain of any of this because Google is a strange beast, but there certainly does seem to be a relationship between blocking pages in robots.txt that don't need to be crawled and Google crawling and indexing the good pages. I'm beginning to think that crawl budget is a real thing.

Take a look at last week's Valid Pages graph.

View attachment 262507

Now take a look at this week's.

View attachment 262508

Take a look at the Blocked by Robots.txt graph.

View attachment 262509

And finally, check out the Discovered - Currently Not Indexed graph.

View attachment 262510

I've long suspected that the more lousy pages Google crawls, the more pages Google tosses out of the index. So basically (in my opinion - based on my incessant testing), if you allow Google to crawl lots of internal 301 redirects, noindexed pages, and 403 pages, even if it does crawl your good pages, they won't be indexed because Google feels like your site isn't worth having those pages indexed. That's just my take on things. I may be completely off here, but it's hard to argue with these graphs.
Very good results, can we have your robots.txt ? :)
 

JoyFreak

Well-known member
Hi - I don't know what your robots.txt file looks like, but my sites have been suffering from a low page indexation rate since I installed XenForo (2019 - transitioning from another CMS). I had a hunch that it was because of Google crawling a whole lot of unnecessary pages and I think I was right. I have two articles for you to refer to:


Apparently, crawl budget is big these days, even on smaller websites.

Up until mid November 2021, I didn't block much in the robots.txt file, but on November 9, 2021, I decided to give some blocking a try. I blocked the /member/ and /attachment/ pages. Those are very important to block. They used to give 403 errors (because I don't allow guests to see them), but now they aren't crawled at all. Other very important directories to block are /whats-new/, /search/, /goto/, and /posts/. I also blocked the /threads//post and /thread//latest URLs.

This is what I'm seeing so far. It's only been a month and some change.

Crawl rate is beginning to pick up. These are the 200 pages.

View attachment 262379

Crawled - Currently Not Indexed pages are going down, which is good.

View attachment 262380

And most importantly, valid pages are increasing.

View attachment 262381

This same thing is occurring across eight of my forums. I did the same thing to all of them.

Check this one out. This is a different site that's been languishing for years. Take a look at the valid pages jump.

View attachment 262382

If you're interested, this is what I blocked in my robots.txt files.

User-agent: *
Disallow: /forum/account/
Disallow: /forum/admin.php
Disallow: /forum/attachments/
Disallow: /forum/conversations/
Disallow: /forum/find-threads/
Disallow: /forum/forums/*/create-thread
Disallow: /forum/forums/*/post-thread
Disallow: /forum/goto/
Disallow: /forum/job.php
Disallow: /forum/login/
Disallow: /forum/logout/
Disallow: /forum/lost-password/
Disallow: /forum/members/
Disallow: /forum/misc/
Disallow: /forum/online/
Disallow: /forum/posts/
Disallow: /forum/profile-posts/
Disallow: /forum/register/
Disallow: /forum/search/
Disallow: /forum/search-forums/location
Disallow: /forum/threads/*/add-reply
Disallow: /forum/threads/*/approve
Disallow: /forum/threads/*/draft
Disallow: /forum/threads/*/latest
Disallow: /forum/threads/*/post
Disallow: /forum/threads/*/reply
Disallow: /forum/threads/*/unread
Disallow: /forum/whats-new/

My goal was to not allow Googlebot to crawl any page that I didn't want in the index.

I hope this information helps you. If you're using noindex on some pages and are allowing them to be crawled by Google, I can tell you that I've never had any luck doing that. I've always seen ranking drops when I allowed pages like that to be crawled. It's a waste of crawl budget. In my opinion, it's always better to block in the robots.txt file. In this case, I haven't seen any ranking increases, but I'm hopeful that will happen during the next few Google updates.
He posted his robots.txt here.
 

jgaulard

Active member
Yeah I know, I asked in case he has changed something.
Nothing has changed. I'm just waiting for more Search Console updates to see what happens. As @JoyFreak mentioned, robots.txt is above. You can ignore the:

Disallow: /forum/search-forums/location

Line. That's my own setup that's not standard with XenForo.
 

MySiteGuy

Well-known member
I just wanted to make an update to the above post. Google Search Console was refreshed a day or two ago and it seems as though the positive trend continues. Take a look at these graphics for one of my sites. Again, I blocked many directories that shouldn't be crawled and it appears that valid pages are being indexed at a rapid rate because of that. I can't be certain of any of this because Google is a strange beast, but there certainly does seem to be a relationship between blocking pages in robots.txt that don't need to be crawled and Google crawling and indexing the good pages. I'm beginning to think that crawl budget is a real thing.

Take a look at last week's Valid Pages graph.

View attachment 262507

Now take a look at this week's.

View attachment 262508

Take a look at the Blocked by Robots.txt graph.

View attachment 262509

And finally, check out the Discovered - Currently Not Indexed graph.

View attachment 262510

I've long suspected that the more lousy pages Google crawls, the more pages Google tosses out of the index. So basically (in my opinion - based on my incessant testing), if you allow Google to crawl lots of internal 301 redirects, noindexed pages, and 403 pages, even if it does crawl your good pages, they won't be indexed because Google feels like your site isn't worth having those pages indexed. That's just my take on things. I may be completely off here, but it's hard to argue with these graphs.

Not so about the 301 redirects. Google treats those the same as if they were the actual new page. They've said numerous times in the past few years that 301 redirects have no penalty.
 

jgaulard

Active member
Not so about the 301 redirects. Google treats those the same as if they were the actual new page. They've said numerous times in the past few years that 301 redirects have no penalty.
Like I said, I'm no expert. Far from it. I will tell you though that over the past two years, I've blocked the internal 301 redirects (threads/thread-name/post-123 & /goto/) on various sites and when I did that, my Valid Pages increased steadily in the Google Search Console. I distinctly recall one instance where I allowed those pages to be crawled again after about six months. Immediately following, my Valid Pages dropped a fair amount.

I am all about accepting the fact that everything I'm sharing may be pure coincidence. There are many folks who know more than I do. If for some reason my graphs go in reverse, I'll share that here too. I just thought it would be helpful for some people who are having trouble to see some new information. I have no idea why Google acts the way it does. Very frustrating.
 
Top