[XF 2.0] Google will only index about 30% of XenForo URLs

Dan Blather

Member
The site: https://www.cyburbia.org

I started the site in 1994 (!), and the message board has been online since 1996. Change from vBulletin to XenForo: 5 January 2019.

Problem: since I switched the message board software from vBulletin 4 to XenForo 2.0, and from http to https, Google removed a lot of forum threads from its index. According to Google Search Console, about 14,000 https URLs are now indexed, with a decreasing number of new pages with every pass. About 28,000 URLs are "Discovered - currently not indexed / Status: Excluded". The vast majority of the excluded pages are content-rich -- they're not low-content post padding parties. Google still has about 8,000 old http vBulletin URLs iin its index.

  • Canonical URLs = on.
  • Old vBulletin > new XenForo URL redirect is working.
  • Sitemaps = on. Google reads it every few days.
  • Anything that is visible to an unregistered visitor can be indexed. Noindex pages - member profiles.
  • PageSpeed Insights: mobile = 95, desktop = 99. Server load is consistently under 1.0.
  • Daily posting activity is down about 75% from its historic peak in the mid-2000s, but there's been a jump and slow rise since the changeover.
  • Mobile usability: no issues, but only about 800 valid URLs, and decreasing from a peak of about 1,500 a month ago.
In your experience, will Google eventually get to these "excluded" unindexed pages?

197125
 
Last edited:
In your experience, will Google eventually get to these "excluded" unindexed pages?
Maybe, no one knows. It can take Google a long time to update a site's entire profile especially if you've made a big change. I have a large forum and Google for unknown reasons decided to exclude near 10% of my pages.
 
Are you certain it's threads removed and not other miscellaneous VB URLs? VB has many canonical issues with URLs as well.
 
The same problem With my forum more than 8 million Excluded

The other problem when I search in Google by site:mydomin.com

Shows About 205,000 results (number of seconds: 0.24)

I lost a lot of pages and visitors, I do not know how to overcome the problem
myweb.gif
 
With millions of URLs, I can almost guarantee even a site with no canonical issues will be very lucky to have even half of them indexed, unless the site has a very high page rank.

Many pages they simply do not index due to thin content, irrelevancy (for instance political discussions on my automotive forums tend not to be indexed), etc. Member profile pages, for instance. Many thread URL's with "goto" post numbers on the end since they are identical content to the URL without the post number.

Google assigns a crawl budget for every site based on it's speed, rank, search traffic, etc. You want as little of that crawling budget to be spent on things they aren't going to index, so make sure you have a robots.txt file in place.

And, also consider that this may not be 100% related to a software change. Google had a major search update rollout a few weeks ago, and many sites have seen large positive or negative changes in traffic. Three of my forums are seeing 30%-50% increases (and still climbing), one of them dropped in half, the others are the same.

I'd also check your Nginx or Apache server logs, look for 404 errors, determine why they are happening and resolve them.
 
^^^ what @MySiteGuy said

You can try contacting Google but this is not a Xenforo problem. This is a Google issue. As @MySiteGuy pointed out, we have already had several major Google updates in 2019. These happen all the time but there have been some big ones going back to the fall of 2018 and this has resulted in some major indexing and ranking changes for many sites.
 
For me, I've experienced this same change when I moved to https (Jan 2017). Use to have 800-900k pages indexed. After https, we are down to about 200k. I'm currently working on figuring out where the problem is and it seems for me, that many times Google is getting 5xx errors so just migrated to a new server and changed from Sucuri to CF to see if things settle down.

RsioXYx.png


I have a buddy running different forum software and he's seen the same exact thing going on with his forum since he switched to https (same time I changed). So I think it's something with https somehow. I have the correct 301's in place. The only thing I know we have a problem with is the mixed site content since we couldn't get the image proxy to properly work and mask our server IP. Planning on working on this when we move to xf2.1 in combination with some CF settings/CDN. Hoping the timeline for that is ~1 month but we'll see.

I hide my general discussion section and exclude it from the sitemap. Everything that is in the sitemap (which you can see is about 1.5M above) is all relative to my site. I learned a long time ago that I wanted Google to fully understand what kind of content was on the site so I didn't want the general discussion forum viewable to SE's (and guests).

But it is interesting to see others experiencing the same issues I'm seeing with my site. Will be focusing on figuring out these issues:
0C9Tqd0.png


I think with the few conversions I've made in the past, that many of those pages with redirect are ones that were posted on the old platforms linking to threads on the site. I've slowly gone through some old threads and removed the old URLs and added the current link but you can see it's been a straight line so I gotta bump those efforts up.
 
Same with our forum. We had lost 60% traffic after convert VBB to XFF.
When was this? Depending on the timing, Google has really changed their algorithms and has effected a lot of forums. And then if you changed to https, I think that's been a major contributor to changes in traffic.
I think with the few conversions I've made in the past, that many of those pages with redirect are ones that were posted on the old platforms linking to threads on the site.
If this was the issue for all my redirects, I didn't find any in the first couple pages. Seems to be all /posts/ or /goto/ or users changing their thread titles, which makes me wonder if that's why vbulletin always performed so well for us in the past, we didn't have to worry about the URL changing and Google making claims like the above and losing crawl budget upon each redirect.
 
We've had similar issues with this, as in Google rankings (we've not moved from another platform FWIW) and have spoken to some people about it. It's mostly about content just not being good enough by members. It's hard but when you do a site: Google search, it shows a lot.. but..
I recall going to a very long standing member here to ask their advice and they had been hit too.
Would love to know the answer to it. Our stats are up & down like a yo-yo. Huge drops and we've tried everything to remain on top of P.1 in Google for specific search terms etc; we do for so long then it drops again and traffic plummets.
Our Alexa rank last year was at a very high number.. (Which I can't really say tbh) This year we've plummeted. It's odd.


I've even contemplated adding the Wordpress bridge for extra content as WP is the King of SEO/SEM scraping.
 
Hmm I am wondering if robots.txt is being problematic. I have just re-checked ours and 87k+ pages have been left out even though they're available for indexing.
Anyone else seeing this too?
 
Highly unlikely. Post your robots.txt file here.
Yes, I thought not. Am using a pretty stripped back one now, following from a thread way back when.
Mine is literally;
Code:
User-agent: *

Disallow: /login/
Disallow: /admin.php
Disallow: /misc/style
Allow: /
After I stripped it back from this;
Code:
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /admin.php
Disallow: /misc/style
Allow: /

So I don't think this would be the issue, but Google states - not indexed but it can appear in Google. I have thousands of them and all are viewable when crawling and when viewing logged out/as guest.
Those of us who are having the same issues, can't be a fluke surely?
Thanks for the reply.. :)
 
Your original one was fine. So is the new one but the old one was better.

Also, if robots.txt was an issue your Google account would say "blocked by robots.txt".

And again, it's not a fluke. It's just Google adjusting what is indexed based on what people search for versus the content of most forum posts.
 
And again, it's not a fluke. It's just Google adjusting what is indexed based on what people search for versus the content of most forum posts.
Thank you and yes, you are probably right.
So, is it the fact we're simply forums (loosely said of course) and not being heavily content specific such as a site selling specific products?
Or is there perhaps a deeper issue (sorry) from within XF itself with SEO etc OOB?
That's the £1m question I suppose.
 
Pretty much. What Google wants is authoritative informational content corresponding to actual searches, not the sort of chit-chat that makes up most forum threads. Forums whose purpose is providing information and problem-solving (e.g., forums about technology, Windows, etc.) tend to do better.
 
View attachment 199621

I had moved to Xenforo on July 2013 and drop traffic on August 2013.
Did you switch to https at the same time?
So I don't think this would be the issue, but Google states - not indexed but it can appear in Google. I have thousands of them and all are viewable when crawling and when viewing logged out/as guest.
Those of us who are having the same issues, can't be a fluke surely?
Thanks for the reply.. :)
Check the ones not indexed. I did that last night and found most of them were 0-1 comments in the thread (typically). So the big thing would be providing more content on the ones that hasn't been indexed...if you want them indexed. I'm sure Google changed their algorithms to make sure, especially forum threads with little content doesn't get indexed.

I'm thinking one of the solutions would be to combine similar threads so there is more content in a single thread if they are all related. If your site is like mine, we get a lot of similar threads and questions posted that could be combined. My next step will be informing the moderators about that and to merge topics.
Pretty much. What Google wants is authoritative informational content corresponding to actual searches, not the sort of chit-chat that makes up most forum threads. Forums whose purpose is providing information and problem-solving (e.g., forums about technology, Windows, etc.) tend to do better.
We've done well with specific games on our site and as I mentioned above, we don't allow Google to see the general chit chat (which would add an additional 100k threads if we did include it. I think some sites problems are that they have a general forum and Google ranks blog posts/articles higher for the topics that the users are probably talking about on certain forums. Such as the OP, I would likely either "noindex" or hide this section from guests/bots. Dilutes the content and takes away from the crawl budget Google has for the site. And who wants to rank for an "intro" thread?
 
Check the ones not indexed. I did that last night and found most of them were 0-1 comments in the thread (typically). So the big thing would be providing more content on the ones that hasn't been indexed...if you want them indexed. I'm sure Google changed their algorithms to make sure, especially forum threads with little content doesn't get indexed.

I'm thinking one of the solutions would be to combine similar threads so there is more content in a single thread if they are all related. If your site is like mine, we get a lot of similar threads and questions posted that could be combined. My next step will be informing the moderators about that and to merge topics.
Agreed and have done so, but they're all mostly very varied. I don't know what it is. Can't quite put my finger on it.
We all (mostly) know that content is king et al, but this is weird. Some of our content is not being indexed and registered even though it's open & without duplicates.
I think they're (Google) hitting forums for some reason possibly? Maybe it's the structure - mobile Vs /... who knows?
Just don't know why. When checking, everything comes back fine. But not indexed.
Grrr..! :D
Duplicate content could very well be an issue though.
I wonder if "abc.com/event/blah blah " would be seen as duplicate content?
I mean is as much as would similar threads under one category, started the same way, would be viewed as duplicate content? My initial thoughts would be yes, but the content would be slightly different wouldn't it?
 
Back
Top Bottom