cmeinck
Well-known member
I've been researching what Google is indexing and noticed a ton of indexed urls that look like old vBulletin pages. It's incredibly odd because the source code on the page has the correct canonical URL and some of these pages were visited by Googlebot as recently as February, 2014. I migrated from vBulletin to XenForo in 2010.
I'm not sure how many pages are in Google's index. In February, I migrated my forums to a subdomain, but Google has yet to pick up all the 301s. What's odd is that these pages with showthread.php should have never been in the index. I'm seeing Google index the new page and retain the old thread with showthread.php -- they are duplicates, other than the URL. Clicking on the older showthread link redirects to a 'showthread.php' link on the subdomain, which resolves to the correct URL. That's only 3 hops.
It shows me 93, but there could be more. My short term fix is to block them using robots.txt and remove using the URL removal tool. I don't think blocking robots.txt is a good move long-term, since I need Googlebot to flow threw older URLs to the new ones and hopefully pick those up.
Seeing this sort of stuff indexed really makes me question Googlebot. How after years of sitemaps and canonicals are they still showing old URLs? And who knows if this is creating duplicate content issues as well?
I'm not sure how many pages are in Google's index. In February, I migrated my forums to a subdomain, but Google has yet to pick up all the 301s. What's odd is that these pages with showthread.php should have never been in the index. I'm seeing Google index the new page and retain the old thread with showthread.php -- they are duplicates, other than the URL. Clicking on the older showthread link redirects to a 'showthread.php' link on the subdomain, which resolves to the correct URL. That's only 3 hops.
It shows me 93, but there could be more. My short term fix is to block them using robots.txt and remove using the URL removal tool. I don't think blocking robots.txt is a good move long-term, since I need Googlebot to flow threw older URLs to the new ones and hopefully pick those up.
Seeing this sort of stuff indexed really makes me question Googlebot. How after years of sitemaps and canonicals are they still showing old URLs? And who knows if this is creating duplicate content issues as well?