1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Google Indexing /goto/ links

Discussion in 'General XenForo Discussion and Feedback' started by cmeinck, Oct 15, 2014.

  1. cmeinck

    cmeinck Well-Known Member

    I noticed a jump in my indexation last week. Despite having /goto/ blocked in robots.txt, Google indexed 20K URLs. I looked here at XenForo and the same has happened.

    Searching Google with site:xenforo.com inurl:/goto/ reveals over 6K URLs.

    You can remove these using directory removal, but your indexation numbers will be incorrect.

    imthebest and AlexT like this.
  2. imthebest

    imthebest Formerly Super120

    Imho this is a big SEO problem that @Mike should address before 2.0
  3. AlexT

    AlexT Well-Known Member

    Is it of practical relevance though? I don't think it matters in terms of SEO. Notice those /goto/ links were omitted by default in Google search, and if I click to see the omitted results, it says under each goto link that Google was blocked by robots.txt. So Google knows that there are links pointing at /goto/, but it doesn't index them.
  4. dieketzer

    dieketzer Well-Known Member

    this xf only has one indexed, despite minimal seo effort:
    site:gotvirtual.net inurl:/goto/
    i think you all overthink google.
    Steve F likes this.
  5. DRaver

    DRaver Active Member

    That is the reason why I have an addon ,what displays for guests only direct links.
  6. Mr Lucky

    Mr Lucky Well-Known Member

  7. cmeinck

    cmeinck Well-Known Member

    If you monitor your indexation, the numbers reported in WMT can be clouded by these URLs. In theory, Google shouldn't index them, but they are indexing them. Does it affect your site from an SEO perspective? Probably not, but it certainly impacts your ability to correctly assess your indexation numbers. I'd prefer to have WMT reports be as close as possible to my actual indexation.
  8. AlexT

    AlexT Well-Known Member

    How so? At least when I do the search you pointed to above, under each omitted Goto link in the Google results, it states, e.g.:

    A description for this result is not available because of this site's robots.txt – learn more.

    From the Google robots.txt help page (emphasis mine):

  9. cmeinck

    cmeinck Well-Known Member

    My emphasis:

    While Google won't crawl or index the content blocked by robots.txt, we might still find and index information about disallowed URLs from other places on the web.

    Point being, we control our sites and we should be able to control what's being indexed. They aren't finding these links from other places on the web. If someone were to blog and put a link to /goto/, then I could see it finding its way into the index. Blocking a directory from Googlebot, should prevent widespread indexation.

    I agree, these likely have little affect on your site's SEO. I'm just in favor of having an indexation number in WMT that is a true representation of your content.
  10. Rudy

    Rudy Well-Known Member

    I just noticed those yesterday. I'm unclear as to where XF uses goto links in its URLs--where are they coming from?
  11. Mike

    Mike XenForo Developer Staff Member

    The goto/ handler will just 301 redirect to the correct location (or an appropriate error code if necessary). If you take it out of robots.txt, then you can let Google figure out what to do with them. It doesn't mean they'll disappear though; Google doesn't seem to instantly follow 301 or deindex pages with error statuses.

    See the arrow on this quote.
    Rudy, AlexT and Mr. Goodie2Shoes like this.
  12. Rudy

    Rudy Well-Known Member

    OK, I see it now. Thanks!
  13. cmeinck

    cmeinck Well-Known Member

    Wouldn't it make sense to not block these in robots.txt? Technically, Googlebot should follow the 301 redirect to the correct URL (appended with #post). The canonical should prevent indexation of those URLs, correct?
  14. Mike

    Mike XenForo Developer Staff Member

    It'd be your call. XenForo doesn't ship with a robots.txt and I've never claimed that what we have in ours is "correct" or "ideal". We may run experiments at times as well.

    If you don't block it, then it could lead to Google requesting the links unnecessarily, as you know it's never going to lead to actual content. (Note that even when a page 301s, it may still appear in Google temporarily before it follows it through; I've seen it happen with redirects. Thus, the URLs may still appear to be indexed anyway.)
  15. Mike

    Mike XenForo Developer Staff Member

    There's no post 474726 on that page.
  16. You are right ;)
    Thanks for the help - this is a deleted post.

Share This Page