1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Robots Disallow - Why Allow at End

Discussion in 'XenForo Questions and Support' started by XenBurger, Jun 17, 2016.

  1. XenBurger

    XenBurger Member

    Noticed on this websites robots.txt that their disallow ended with an "Allow /"

    User-agent: *
    Disallow: /community/find-new/
    Disallow: /community/account/
    Disallow: /community/attachments/
    Disallow: /community/goto/
    Disallow: /community/posts/
    Disallow: /community/login/
    Disallow: /community/admin.php
    Allow: /


    Is this typical process for a disallow list? I know very little about all this, but I thought disallow lists simply ended with nothing after them.

    Yet I have seen this at the end of numerous XenForo robots files.

    Whats it there for?
     
  2. XenBurger

    XenBurger Member

    Question 2:

    You may have a disallow list, but if you've also got a sitemap, those elements may very well be in the sitemap, and therefore *will* get indexed. This is what I am told at least. I see that you have a link to your sitemap in your robots.txt ... does this in any way mitigate this problem? Probably not, but really - there should be some way to indicate "noindex" tags for all the things listed in the disallow list. The admin only allows you to remove a couple things from the sitemap. The disallow list is fairly long.

    But I have no way to do what Google clearly tells us to do (put a noindex at the top). So I am stressing out a bit, because we are hours away from putting the new forums live, and we have 600,000 posts, and I dont' want a whole mass of unwanted content indexed, because its very difficult to get it removed.
     
  3. Mike

    Mike XenForo Developer Staff Member

    See this for allow: https://en.wikipedia.org/wiki/Robots_exclusion_standard#Allow_directive It's generally superfluous here though.

    Our disallow list doesn't include anything in the sitemap. But if you do include something in the sitemap, it seems to me like robots.txt or a page-level noindex directive would override that.

    I'm not sure what you're trying to noindex specifically, but you can do this directly in the requisite templates:
    Code:
    <xen:container var="$head.robots"><meta name="robots" content="noindex" /></xen:container>
     

Share This Page