1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

robots.txt question

Discussion in 'XenForo Questions and Support' started by Ryan Kent, May 20, 2011.

  1. Ryan Kent

    Ryan Kent Well-Known Member

    I have an RSS feed which automatically posts in a forum. This forum is used exclusively for this feed. I want to block this forum from being crawled. What is the best method?

    I tried adding the path to the node to my robots.txt file and that did not help. I realize the path to all threads is /threads. Can I use a wildcard with robot.txt inline?

    What I mean is all the threads begin as:
    www.mysite.com/threads/tweet-from-

    Can I add www.mysite.com/threads/tweet-from-* to the robots.txt file?
     
  2. Steve F

    Steve F Well-Known Member

    Have you tried this?

    Disallow: /threads/tweet-from-

    Edit: Looking at your site I see where you talking about, try

    Disallow: /forums/twitter/
     
  3. Ryan Kent

    Ryan Kent Well-Known Member

    I had that code already. To the best of my knowledge that would block the actual main forum page, but none of the threads contained within the forums. I like the XF URL structure, but this is one of the drawbacks.

    /threads/anything.... if not contained within /forums/twitter
     
  4. Steve F

    Steve F Well-Known Member

    That should ONLY block whats under "/forums/twitter/". This would block the forums completely: Disallow: /forums/

    For reference check out the robots.txt here on the XF site.

    http://xenforo.com/robots.txt

    Code:
    User-agent: *
    Disallow: /community/find-new/
    Disallow: /community/forums/-/
    Disallow: /community/account/
    Disallow: /community/attachments/
    Disallow: /community/goto/
    Disallow: /community/posts/
    Disallow: /community/login/
    Disallow: /community/admin.php
    Allow: /
     
  5. Ryan Kent

    Ryan Kent Well-Known Member

    I understand all of those entries except /community/forums/-/. What is that designed to block?
     
  6. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    If you disallow access to that forum for guests then search engines won't have access:

    Admin CP -> Users -> Node Permission
     
    Oracle likes this.
  7. Brogan

    Brogan XenForo Moderator Staff Member

    Oracle likes this.
  8. Ryan Kent

    Ryan Kent Well-Known Member

    @Jake, I have taken your suggestion. Outside of the admin forums, our site is open to all. I don't wish to restrict public access to any sections if reasonably possible. So there isn't a means to use a wild card in the thread title like www.mysite.com/threads/tweet-from-*?

    @anyone, if I use the below, then all of my attachments would presumably be blocked. At times I do Google Image searches and see files there. I presume those images would then all be blocked for my site?

    Disallow: /community/attachments/
     
  9. Brogan

    Brogan XenForo Moderator Staff Member

    Including a directory in robots.txt doesn't mean the content is blocked as such, it just prevents crawlers from accessing it.

    However, only crawlers which abide by the convention will heed the robots.txt file.
    A lot of them don't.
     
    Brett Peters likes this.
  10. Ryan Kent

    Ryan Kent Well-Known Member

    That's fine. I really only care about Google + Bing. Together they account for most of the traffic. I presume (perhaps wrongly so) any good smaller search engine would follow the same standards. If a search engine decides to go rogue, they probably don't have all that much traffic anyway.
     
  11. Brogan

    Brogan XenForo Moderator Staff Member

    All of the well known search engines should comply with the standards.
     
  12. Forsaken

    Forsaken Well-Known Member

    Robots.txt isn't a method of stopping them from accessing pages, but a way to stop them for telling them not to index them.
     
  13. Brogan

    Brogan XenForo Moderator Staff Member

    Yes, poor choice of words on my part.
     
  14. Forsaken

    Forsaken Well-Known Member

    I was just correcting it because most people believe it does stop bots from accessing pages.
     
  15. Ryan Kent

    Ryan Kent Well-Known Member

    any reason not to block /misc/ ?

    I notice the /misc/quic-navigation-menu? URLs, but I am not sure if there are other more useful URLs
     

Share This Page