XF 1.3 Google indexing pages with showthread.php from vBulletin migration, from 2010?

cmeinck

Well-known member
I've been researching what Google is indexing and noticed a ton of indexed urls that look like old vBulletin pages. It's incredibly odd because the source code on the page has the correct canonical URL and some of these pages were visited by Googlebot as recently as February, 2014. I migrated from vBulletin to XenForo in 2010.

I'm not sure how many pages are in Google's index. In February, I migrated my forums to a subdomain, but Google has yet to pick up all the 301s. What's odd is that these pages with showthread.php should have never been in the index. I'm seeing Google index the new page and retain the old thread with showthread.php -- they are duplicates, other than the URL. Clicking on the older showthread link redirects to a 'showthread.php' link on the subdomain, which resolves to the correct URL. That's only 3 hops.

It shows me 93, but there could be more. My short term fix is to block them using robots.txt and remove using the URL removal tool. I don't think blocking robots.txt is a good move long-term, since I need Googlebot to flow threw older URLs to the new ones and hopefully pick those up.

Seeing this sort of stuff indexed really makes me question Googlebot. How after years of sitemaps and canonicals are they still showing old URLs? And who knows if this is creating duplicate content issues as well?
 
Got an example of an old URL? Make sure it's doing a 301 to the new URL.

I checked, all of them redirect. Use a redirect checker as well. Here's one:

Code:
http://www.everythingicafe.com/forum/showthread.php?t=50158

If you Google site search command with the first sentence, you'll see both URLs are indexed.
 
It's 301'ing properly. You are good.

Why does Google have both URLs at this point? I dunno.

Thanks. It's really disappointing that Google has not been respecting the 3o1. After doing some additional digging, I found a ton of vB /archive/ URLs.

In 2010, these would have resulted in 3 hops to the final canonical URL. After moving to the subdomain, it's 4 hops, still well within Googlebots capabilities.

I'm wondering what other vB URLs I might find with creative searching.
 
During my vBulletin days, I ran vBSEO on my old forum. The URLs on the old site would have been different and would not have used showthread in thread URLs.

I'm finding tons of URLs that are indexed that are formed like this:

/forum/showthread.php?t=26083

and

/forum/showthread.php?t=23337&p=17984

While they seem to redirect to the correct page on my subdomain, I don't have any faith in Google picking up the 301. They had 3 years to pick them up originally. I've seen some of these URLs cached recently (Dec, January, February), showing a XenForo page with the vBulletin URL. Since these were never actual vBSEO URLs on my live vB site, at this point, I'd like to return a 410. Maybe Googlebot will get the hint and drop them.

Is there an easy way via .htaccess to 410 any URL with a prefix /forum/showthread.php?
 
I tried using a modified version of your code snippet provided for handling _debug errors, but it still 301 redirects. I tried this in /forum/ of my root domain. I tested with a few of the threads with showthread, but they all 301 redirect.

Code:
RewriteEngine On
RewriteCond %{QUERY_STRING} showthread.php
RewriteRule .* - [G,L]

Another question, is there a way to simplify the 301 redirect so there are less hops. So if I have problematic URL A, can I set up a 301 that is one hop to the correct URL.
 
It's possibly old links from other sites/pages/forums and Google just keeps picking them up again and again as it goes back and forth doing re-crawls.

Have you tried tracking down where your old page links are on other people's sites and asking them to remove or update them?
 
It's possibly old links from other sites/pages/forums and Google just keeps picking them up again and again as it goes back and forth doing re-crawls.

Have you tried tracking down where your old page links are on other people's sites and asking them to remove or update them?

These were never visible URLs, which is what's odd. I searched my own forum just to be sure and there were no URLs within my site referenced with this URL format. These are formatted: /forum/showthread.php?t=###. I used vBSEO, so all of my URLs were SEO friendly and ended in .html. Granted, there were a fair amount of those as well.

The trouble is that Google's search results only show a fraction of the index. Clearly there are an abundance of these URLs from my site clogging their index, creating duplicate content issues. Subsequently, my site has suffered a massive loss in traffic over the past 12-18 months.

If I can implement 410s, I might have a better chance at them dropping out of the index naturally.

So essentially, there are three URLs.

1a. vB URL w/ showthread.php
1b. XenForo forum URL when site existed at root
2. Current XenForo URL at subdomain

I've looked at the redirects of 1a. and they go to 2. If I search Google for what would have been 1b., it's been dropped. So Google handled the 1b to 2 correctly. My problem now is dropping all of the URLs that meet the criteria of 1a. and doing this without destroying all of my older, existing 301 directs from vBSEO'd pages to the new XF pages.

As a test, I put 5 of the 1a. URLs on my homepage. Kind of raising my hand and asking Googlebot to take notice. Since these were never public URLs, returning 410s could be a proper fix, since for whatever reason, it's not dropping them despite a 301 redirect in place.

It's scary to see this stuff and when you do, you want it gone from the index -- knowing that it's most likely the reason for your decline in traffic.
 
Can you give me an example of archive urls being properly redirected? probably has a structure like: http://www.everythingicafe.com/forum/archive/index.php/t-123.html

Since those are in a directory, I've removed those from Google's index.

My issue remains the threads with showthread.php.

Look at the bottom right of my homepage for a few sample links. You'll see they all redirect properly. The thing is, these have been around since I converted years ago to XF. They were never live pages. If you looked at my site on waybackarchive, you can see the URL formats.

I think the strongest message I can send the crawlers would be to 410 any page with showthread.php. I'm not sure if this possible without fouling up the existing redirects.
 
Maybe if the archive pages where active on your site, then these may have pointed to the related showthread.php pages. If thats where google gets the showthread references from then thats could be the issue.
The only thing I can think of is to block showthread.php in robots.txt but that would also kill incoming linkjuice from old links on other sites.
 
Maybe if the archive pages where active on your site, then these may have pointed to the related showthread.php pages. If thats where google gets the showthread references from then thats could be the issue.
The only thing I can think of is to block showthread.php in robots.txt but that would also kill incoming linkjuice from old links on other sites.

If you block showthread.php in robots.txt, it won't remove them from the index. I tried a few 410 directives in .htaccess, but the redirect seems to supersede any changes I make.
 
I've decided to remove all of my existing vBulletin redirect scripts. As a result, it will break all vB URLs, including and likely, older backlinks which were valuable. I figure at some point, I could possibly reintroduce the redirect once this mess is cleaned up.

So here's my question. Right now, pages are still redirected and in turn, the header is responding with a 404. @Jake Bunce , how would I make any page that either ended in .html or included showthread.php to return a 410. Ideally, I'd like this to happen before the redirect to the subdomain. My thinking is a quick, swift 410 response to Googlebot is best at this point.

I'm curious to see the fallout in GWT -- as the lingering pages will show up in crawl errors -- as they should.
 
I've decided to remove all of my existing vBulletin redirect scripts. As a result, it will break all vB URLs, including and likely, older backlinks which were valuable. I figure at some point, I could possibly reintroduce the redirect once this mess is cleaned up.

So here's my question. Right now, pages are still redirected and in turn, the header is responding with a 404. @Jake Bunce , how would I make any page that either ended in .html or included showthread.php to return a 410. Ideally, I'd like this to happen before the redirect to the subdomain. My thinking is a quick, swift 410 response to Googlebot is best at this point.

I'm curious to see the fallout in GWT -- as the lingering pages will show up in crawl errors -- as they should.

Add these rules to the top of the .htaccess file in whatever directory contained those URLs:

Code:
RewriteEngine On

RewriteRule (\.html|showthread\.php)$ - [G,L]

"G" returns HTTP 410.
 
Top Bottom