• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Google Indexing /goto/ links

cmeinck

Well-known member
#1
I noticed a jump in my indexation last week. Despite having /goto/ blocked in robots.txt, Google indexed 20K URLs. I looked here at XenForo and the same has happened.

Searching Google with site:xenforo.com inurl:/goto/ reveals over 6K URLs.

You can remove these using directory removal, but your indexation numbers will be incorrect.

Thoughts?
 

AlexT

Well-known member
#3
I noticed a jump in my indexation last week. Despite having /goto/ blocked in robots.txt, Google indexed 20K URLs. I looked here at XenForo and the same has happened.

Searching Google with site:xenforo.com inurl:/goto/ reveals over 6K URLs.
Is it of practical relevance though? I don't think it matters in terms of SEO. Notice those /goto/ links were omitted by default in Google search, and if I click to see the omitted results, it says under each goto link that Google was blocked by robots.txt. So Google knows that there are links pointing at /goto/, but it doesn't index them.
 

cmeinck

Well-known member
#7
Is it of practical relevance though? I don't think it matters in terms of SEO. Notice those /goto/ links were omitted by default in Google search, and if I click to see the omitted results, it says under each goto link that Google was blocked by robots.txt. So Google knows that there are links pointing at /goto/, but it doesn't index them.
If you monitor your indexation, the numbers reported in WMT can be clouded by these URLs. In theory, Google shouldn't index them, but they are indexing them. Does it affect your site from an SEO perspective? Probably not, but it certainly impacts your ability to correctly assess your indexation numbers. I'd prefer to have WMT reports be as close as possible to my actual indexation.
 

AlexT

Well-known member
#8
In theory, Google shouldn't index them, but they are indexing them.
How so? At least when I do the search you pointed to above, under each omitted Goto link in the Google results, it states, e.g.:

https://xenforo.com/community/goto/post?id=825496
A description for this result is not available because of this site's robots.txt – learn more.

From the Google robots.txt help page (emphasis mine):

While Google won't crawl or index the content blocked by robots.txt, we might still find and index information about disallowed URLs from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results. You can stop your URL from appearing in Search results completely by using your robots.txt in combination with other URL blocking methods, such as password-protecting the files on your server, or inserting meta tags into your HTML.
 

cmeinck

Well-known member
#9
My emphasis:

While Google won't crawl or index the content blocked by robots.txt, we might still find and index information about disallowed URLs from other places on the web.

Point being, we control our sites and we should be able to control what's being indexed. They aren't finding these links from other places on the web. If someone were to blog and put a link to /goto/, then I could see it finding its way into the index. Blocking a directory from Googlebot, should prevent widespread indexation.

I agree, these likely have little affect on your site's SEO. I'm just in favor of having an indexation number in WMT that is a true representation of your content.
 

Rudy

Well-known member
#10
I just noticed those yesterday. I'm unclear as to where XF uses goto links in its URLs--where are they coming from?
 

Mike

XenForo developer
Staff member
#11
The goto/ handler will just 301 redirect to the correct location (or an appropriate error code if necessary). If you take it out of robots.txt, then you can let Google figure out what to do with them. It doesn't mean they'll disappear though; Google doesn't seem to instantly follow 301 or deindex pages with error statuses.

I just noticed those yesterday. I'm unclear as to where XF uses goto links in its URLs--where are they coming from?
See the arrow on this quote.
 

cmeinck

Well-known member
#13
Wouldn't it make sense to not block these in robots.txt? Technically, Googlebot should follow the 301 redirect to the correct URL (appended with #post). The canonical should prevent indexation of those URLs, correct?
 

Mike

XenForo developer
Staff member
#14
It'd be your call. XenForo doesn't ship with a robots.txt and I've never claimed that what we have in ours is "correct" or "ideal". We may run experiments at times as well.

If you don't block it, then it could lead to Google requesting the links unnecessarily, as you know it's never going to lead to actual content. (Note that even when a page 301s, it may still appear in Google temporarily before it follows it through; I've seen it happen with redirects. Thus, the URLs may still appear to be indexed anyway.)