Google Indexing /goto/ links

cmeinck · Oct 15, 2014

I noticed a jump in my indexation last week. Despite having /goto/ blocked in robots.txt, Google indexed 20K URLs. I looked here at XenForo and the same has happened.

Searching Google with site:xenforo.com inurl:/goto/ reveals over 6K URLs.

You can remove these using directory removal, but your indexation numbers will be incorrect.

Thoughts?

imthebest · Oct 19, 2014

Imho this is a big SEO problem that @Mike should address before 2.0

AlexT · Oct 21, 2014

cmeinck said:
I noticed a jump in my indexation last week. Despite having /goto/ blocked in robots.txt, Google indexed 20K URLs. I looked here at XenForo and the same has happened.

Searching Google with site:xenforo.com inurl:/goto/ reveals over 6K URLs.

Is it of practical relevance though? I don't think it matters in terms of SEO. Notice those /goto/ links were omitted by default in Google search, and if I click to see the omitted results, it says under each goto link that Google was blocked by robots.txt. So Google knows that there are links pointing at /goto/, but it doesn't index them.

dieketzer · Oct 21, 2014

this xf only has one indexed, despite minimal seo effort:
site:gotvirtual.net inurl:/goto/
i think you all overthink google.

DRaver · Oct 21, 2014

That is the reason why I have an addon ,what displays for guests only direct links.

Mr Lucky · Oct 21, 2014

Pardon my ignorance, what is a goto link?

I suspect that Google does not always obey a robots.txt noindex, as opposed to a metatag noindex

See here:

http://www.cre8asiteforums.com/forums/topic/91129-google-ignoring-robotstxt/?p=339488

cmeinck · Oct 21, 2014

AlexT said:
Is it of practical relevance though? I don't think it matters in terms of SEO. Notice those /goto/ links were omitted by default in Google search, and if I click to see the omitted results, it says under each goto link that Google was blocked by robots.txt. So Google knows that there are links pointing at /goto/, but it doesn't index them.

If you monitor your indexation, the numbers reported in WMT can be clouded by these URLs. In theory, Google shouldn't index them, but they are indexing them. Does it affect your site from an SEO perspective? Probably not, but it certainly impacts your ability to correctly assess your indexation numbers. I'd prefer to have WMT reports be as close as possible to my actual indexation.

AlexT · Oct 21, 2014

cmeinck said:
In theory, Google shouldn't index them, but they are indexing them.

How so? At least when I do the search you pointed to above, under each omitted Goto link in the Google results, it states, e.g.:

https://xenforo.com/community/goto/post?id=825496
A description for this result is not available because of this site's robots.txt – learn more.

From the Google robots.txt help page (emphasis mine):

While Google won't crawl or index the content blocked by robots.txt, we might still find and index information about disallowed URLs from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results. You can stop your URL from appearing in Search results completely by using your robots.txt in combination with other URL blocking methods, such as password-protecting the files on your server, or inserting meta tags into your HTML.

cmeinck · Oct 21, 2014

My emphasis:

While Google won't crawl or index the content blocked by robots.txt, we might still find and index information about disallowed URLs from other places on the web.

Point being, we control our sites and we should be able to control what's being indexed. They aren't finding these links from other places on the web. If someone were to blog and put a link to /goto/, then I could see it finding its way into the index. Blocking a directory from Googlebot, should prevent widespread indexation.

I agree, these likely have little affect on your site's SEO. I'm just in favor of having an indexation number in WMT that is a true representation of your content.

Wildcat Media · Oct 21, 2014

I just noticed those yesterday. I'm unclear as to where XF uses goto links in its URLs--where are they coming from?

Mike · Oct 21, 2014

The goto/ handler will just 301 redirect to the correct location (or an appropriate error code if necessary). If you take it out of robots.txt, then you can let Google figure out what to do with them. It doesn't mean they'll disappear though; Google doesn't seem to instantly follow 301 or deindex pages with error statuses.

Rudy said:
I just noticed those yesterday. I'm unclear as to where XF uses goto links in its URLs--where are they coming from?

See the arrow on this quote.

Wildcat Media · Oct 21, 2014

Mike said:
See the arrow on this quote.

OK, I see it now. Thanks!

cmeinck · Oct 26, 2014

Wouldn't it make sense to not block these in robots.txt? Technically, Googlebot should follow the 301 redirect to the correct URL (appended with #post). The canonical should prevent indexation of those URLs, correct?

Mike · Oct 26, 2014

It'd be your call. XenForo doesn't ship with a robots.txt and I've never claimed that what we have in ours is "correct" or "ideal". We may run experiments at times as well.

If you don't block it, then it could lead to Google requesting the links unnecessarily, as you know it's never going to lead to actual content. (Note that even when a page 301s, it may still appear in Google temporarily before it follows it through; I've seen it happen with redirects. Thus, the URLs may still appear to be indexed anyway.)

Tobias Honscha · Jun 23, 2015

But there is a permission bug in the /goto/ function. Google can´t visit the page: http://www.emuenzen.de/forum/goto/post?id=474726
although guests can view this (the same!) page : http://www.emuenzen.de/forum/threads/frankreich-und-dann.42642/#post-474726
How can I fix this problem ?

Mike · Jun 23, 2015

There's no post 474726 on that page.

Tobias Honscha · Jun 23, 2015

You are right

Thanks for the help - this is a deleted post.

jgaulard · Aug 18, 2020

Mike said:
If you don't block it, then it could lead to Google requesting the links unnecessarily, as you know it's never going to lead to actual content. (Note that even when a page 301s, it may still appear in Google temporarily before it follows it through; I've seen it happen with redirects. Thus, the URLs may still appear to be indexed anyway.)

Would this apply to links like this:

https://xenforo.com/community/threads/google-indexing-goto-links.85161/post-948960

Links with the "/post-xxxx" at the end of them?

Also with the "/latest" at the end of them?

I'm seeing a lot of these original 301 redirected pages stay in the Google index for years. It's like the redirect never actually canonicalizes in the index.

Thanks.

JoyFreak · Dec 21, 2021

Does anyone know if there's a way to bulk remove indexed goto URLs? Currently, they are blocked in our robots.txt but google search console is throwing a "Indexed, though blocked by Robots.txt" error.

jgaulard · Dec 21, 2021

Unfortunately, there is no way to bulk remove anything. You can bulk remove the URLs from appearing in the search results, but you can't get them out of the index manually. If they're already blocked in your robots.txt file, they should slowly fall out of the index naturally. How long have they been blocked?

Google Indexing /goto/ links

Well-known member

Well-known member

Well-known member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

XenForo developer

Well-known member

Well-known member

XenForo developer

Member

XenForo developer

Member

Active member

Well-known member

Active member

Similar threads

We value your privacy