1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Whats your robots.txt file look like? - Xenforo

Discussion in 'Forum Management' started by tommydamic68, Jan 12, 2014.

  1. tommydamic68

    tommydamic68 Well-Known Member

    Just trying to get some ideas of what anyone is currently using for their robots.txt on their Xenforo site.

    Here is mine:

    Code:
    # robots.txt file for Sphynxlair
    # The Largest Sphynx Cat Community in the world!
    
    User-agent: Mediapartners-Google*
    Disallow:
    
    User-agent: *
    Disallow: /community/find-new/
    Disallow: /community/account/
    Disallow: /community/attachments/
    Disallow: /community/goto/
    Disallow: /community/register/
    Disallow: /community/posts/
    Disallow: /community/login/
    Disallow: /community/admin.php
    Disallow: /community/ajax/
    Disallow: /community/misc/contact/
    Disallow: /community/data/
    Disallow: /community/forums/-/
    Disallow: /community/forums/tweets/
    Disallow: /community/conversations/
    Disallow: /community/events/birthdays/
    Disallow: /community/events/monthly/
    Disallow: /community/events/weekly/
    Disallow: /community/find-new/
    Disallow: /community/help/
    Disallow: /community/internal_data/
    Disallow: /community/js/
    Disallow: /community/library/
    Disallow: /community/search/
    Disallow: /community/styles/
    Disallow: /community/login/
    Disallow: /community/lost-password/
    Disallow: /community/online/
    Disallow: /credits/
    
    Allow: /
    
    Sitemap: http://www.sphynxlair.com/community/sitemap/sitemap.xml.gz
    
     
    Last edited: Jan 12, 2014
    jamalfree likes this.
  2. MattW

    MattW Well-Known Member

    I *borrowed* some of mine from @Brogan

    Code:
    User-agent: Baiduspider
    Disallow: /
    User-agent: *
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /login/
    Disallow: /admin.php
    Disallow: /conversations/
    Allow: /
     
    tommydamic68 likes this.
  3. Brogan

    Brogan XenForo Moderator Staff Member

    Thief! :D
     
    MattW and Lone Wolf like this.
  4. MattW

    MattW Well-Known Member

    I saw google hammering my Find New links, so had a look at what you were doing for them ;)
     
  5. Brogan

    Brogan XenForo Moderator Staff Member

    Mine needs an update, I'll do it in a bit.
     
    MattW likes this.
  6. RoldanLT

    RoldanLT Well-Known Member

    tommydamic68 likes this.
  7. nodle

    nodle Well-Known Member

    I was using the default XenForo one.

    Code:
    User-agent: *
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /attachments/
    Disallow: /goto/
    Disallow: /posts/
    Disallow: /login/
    Disallow: /search/
    Disallow: /admin.php
    Allow: /
     
  8. kezako

    kezako Active Member

    mine :
    Code:
    User-agent: Mediapartners-Google
    Disallow:
    
    User-agent: Baiduspider
    Disallow: /
    
    User-agent: Baiduspider-video
    Disallow: /
    
    User-agent: Baiduspider-image
    Disallow: /
    
    User-agent: Yandex
    Disallow: /
    
    User-agent: *
    Disallow: /account*
    Disallow: /help*
    Disallow: /misc/style*
    Disallow: /misc/quick-navigation-menu*
    Disallow: /login*
    Disallow: /logout*
    Disallow: /lost-password*
    Disallow: /register*
    Disallow: /reports*
    Disallow: /search*
    Disallow: /conversations*
    Disallow: /css.php
    Disallow: /cron.php
    Disallow: /admin.php
    Disallow: /js
    Disallow: /styles
    Disallow: /members/*
    Disallow: /profile-posts/*
    Disallow: /online/*
    Disallow: /recent-activity/*
    
    Sitemap: http://mywebsite.com/sitemap/sitemap.xml.gz
    I use Sitemap add-on http://xenforo.com/community/resources/sitemap-for-xenforo-1-2-compatible.67/
     
  9. tommydamic68

    tommydamic68 Well-Known Member

    I am getting a crawl error for example: community/member/johndoe/234 - should I throw /community/member/ in my disallow robots.txt?
     
  10. Martok

    Martok Well-Known Member

    Out of interest, why does Xenforo's own robots.txt file, along with many other Xenforo site (including @Brogan's) disallow /find-new/ ?

    I must confess that I had based mine on these but I've noticed that Adsense is giving me crawler errors for doing this. It says

    It's flagging it up because there are adverts on the find-new page (as there are on other people's sites).

    Removing the disallow from robots.txt would solve this but I don't want to do that if there's a good reason that it's in everyone else's.
     
    tommydamic68 likes this.
  11. RoldanLT

    RoldanLT Well-Known Member

    Add this on top of your robots.txt
    Code:
    User-agent: Mediapartners-Google
    Disallow:
    Just like this.
     
    Martok likes this.
  12. Martok

    Martok Well-Known Member

    Thanks! I didn't think to add this when I began with Adsense a little while back, even though I've seen it used on some sites! Anyway, it's now been added and should sort out this issue. :D
     
    RoldanLT likes this.
  13. tommydamic68

    tommydamic68 Well-Known Member

    I do have that @RoldanLT and still get similar errors from webmaster tools. And member profiles give me 1000's of errors -It's due to not allowing viewing of memberprofiles unless logged in I guess.
     
  14. Mouth

    Mouth Well-Known Member

    Code:
    Crawl-delay: 20
    
    User-agent: BoardReader
    User-agent: BoardTracker
    User-agent: Gigabot
    User-agent: Twiceler
    User-agent: dotbot
    User-agent: Baidu
    User-agent: Baiduspider
    User-agent: Baiduspider-video
    User-agent: Baiduspider-image
    User-agent: NaverBot
    User-agent: Sosospider
    User-agent: Yandex
    User-agent: YoudaoBot
    User-agent: Yeti
    Disallow: /
    
    User-agent: Mediapartners-Google*
    Disallow:
    
    User-agent: *
    Disallow: /admin.php
    Disallow: /account/
    Disallow: /attachments/
    Disallow: /conversations/
    Disallow: /find-new/
    Disallow: /goto/
    Disallow: /login/
    Disallow: /search/
    
     
    kontrabass likes this.
  15. CyclingTribe

    CyclingTribe Well-Known Member

    Code:
    User-agent: proximic
    Disallow: /
    
    User-agent: Baiduspider
    Disallow: /
    
    User-agent: magpie-crawler
    Disallow: /
    
    User-agent: MJ12bot
    Disallow: /
    
    User-agent: Mediapartners-Google
    Disallow:
    
    User-agent: *
    Disallow: /account/
    Disallow: /admin.php
    Disallow: /attachments/
    Disallow: /chat/
    Disallow: /conversations/
    Disallow: /find-new/
    Disallow: /goto/
    Disallow: /js/
    Disallow: /login/
    Disallow: /logos/
    Disallow: /members/
    Disallow: /search/
    
     
  16. jamalfree

    jamalfree Active Member

    its very important thread, i hope professional staff of xenforo help us to create the best robots for search engine
    thanks
     
  17. tommydamic68

    tommydamic68 Well-Known Member

    I just found something odd. My site 301 redirect is on point. How can I get two different robots.txt? On my server there is only one file "robots.txt" - it's a www - non www issue I imagine?

    Screen Shot 2014-03-01 at 7.26.00 AM.png

    Screen Shot 2014-03-01 at 7.26.12 AM.png
     
  18. CyclingTribe

    CyclingTribe Well-Known Member

    Do you have two "sites" on your server for www and non-www? If so, look at the root directory for each to see if it's the same?
     
  19. tommydamic68

    tommydamic68 Well-Known Member

    no.
     
  20. CyclingTribe

    CyclingTribe Well-Known Member

    The same file won't contain/serve different content, so logic dictates that this is two different robots.txt files.

    And from what I can see Apache is delivering two different files. Look in your directory structure for your site and see if you can find the two different robots.txt files (and their respective directories) and see how those directories relate to the site configs in Apache (and correct it). (y)
     

Share This Page