Whats your robots.txt file look like? - Xenforo

tommydamic68

Well-known member
Just trying to get some ideas of what anyone is currently using for their robots.txt on their Xenforo site.

Here is mine:

Code:
# robots.txt file for Sphynxlair
# The Largest Sphynx Cat Community in the world!

User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Disallow: /community/find-new/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/register/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/admin.php
Disallow: /community/ajax/
Disallow: /community/misc/contact/
Disallow: /community/data/
Disallow: /community/forums/-/
Disallow: /community/forums/tweets/
Disallow: /community/conversations/
Disallow: /community/events/birthdays/
Disallow: /community/events/monthly/
Disallow: /community/events/weekly/
Disallow: /community/find-new/
Disallow: /community/help/
Disallow: /community/internal_data/
Disallow: /community/js/
Disallow: /community/library/
Disallow: /community/search/
Disallow: /community/styles/
Disallow: /community/login/
Disallow: /community/lost-password/
Disallow: /community/online/
Disallow: /credits/

Allow: /

Sitemap: http://www.sphynxlair.com/community/sitemap/sitemap.xml.gz
 
Last edited:
I *borrowed* some of mine from @Brogan

Code:
User-agent: Baiduspider
Disallow: /
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /login/
Disallow: /admin.php
Disallow: /conversations/
Allow: /
 
I was using the default XenForo one.

Code:
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /search/
Disallow: /admin.php
Allow: /
 
mine :
Code:
User-agent: Mediapartners-Google
Disallow:

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: *
Disallow: /account*
Disallow: /help*
Disallow: /misc/style*
Disallow: /misc/quick-navigation-menu*
Disallow: /login*
Disallow: /logout*
Disallow: /lost-password*
Disallow: /register*
Disallow: /reports*
Disallow: /search*
Disallow: /conversations*
Disallow: /css.php
Disallow: /cron.php
Disallow: /admin.php
Disallow: /js
Disallow: /styles
Disallow: /members/*
Disallow: /profile-posts/*
Disallow: /online/*
Disallow: /recent-activity/*

Sitemap: http://mywebsite.com/sitemap/sitemap.xml.gz

I use Sitemap add-on http://xenforo.com/community/resources/sitemap-for-xenforo-1-2-compatible.67/
 
I am getting a crawl error for example: community/member/johndoe/234 - should I throw /community/member/ in my disallow robots.txt?
 
Out of interest, why does Xenforo's own robots.txt file, along with many other Xenforo site (including @Brogan's) disallow /find-new/ ?

I must confess that I had based mine on these but I've noticed that Adsense is giving me crawler errors for doing this. It says

Our crawler was unable to access your page to determine its content and display relevant ads. When our crawler can’t access the content, often we won’t show ads resulting in lower revenue and coverage. Other times, we’ll show ads that are irrelevant resulting in lower CTR. Follow the links in the ‘How to fix.’ column to correct these errors and improve AdSense performance

It's flagging it up because there are adverts on the find-new page (as there are on other people's sites).

Removing the disallow from robots.txt would solve this but I don't want to do that if there's a good reason that it's in everyone else's.
 
Out of interest, why does Xenforo's own robots.txt file, along with many other Xenforo site (including @Brogan's) disallow /find-new/ ?

I must confess that I had based mine on these but I've noticed that Adsense is giving me crawler errors for doing this. It says



It's flagging it up because there are adverts on the find-new page (as there are on other people's sites).

Removing the disallow from robots.txt would solve this but I don't want to do that if there's a good reason that it's in everyone else's.
Add this on top of your robots.txt
Code:
User-agent: Mediapartners-Google
Disallow:
Just like this.
 
Out of interest, why does Xenforo's own robots.txt file, along with many other Xenforo site (including @Brogan's) disallow /find-new/ ?

I must confess that I had based mine on these but I've noticed that Adsense is giving me crawler errors for doing this. It says



It's flagging it up because there are adverts on the find-new page (as there are on other people's sites).

Removing the disallow from robots.txt would solve this but I don't want to do that if there's a good reason that it's in everyone else's.

Add this on top of your robots.txt
Code:
User-agent: Mediapartners-Google
Disallow:
Just like this.

I do have that @RoldanLT and still get similar errors from webmaster tools. And member profiles give me 1000's of errors -It's due to not allowing viewing of memberprofiles unless logged in I guess.
 
Code:
Crawl-delay: 20

User-agent: BoardReader
User-agent: BoardTracker
User-agent: Gigabot
User-agent: Twiceler
User-agent: dotbot
User-agent: Baidu
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
User-agent: NaverBot
User-agent: Sosospider
User-agent: Yandex
User-agent: YoudaoBot
User-agent: Yeti
Disallow: /

User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Disallow: /admin.php
Disallow: /account/
Disallow: /attachments/
Disallow: /conversations/
Disallow: /find-new/
Disallow: /goto/
Disallow: /login/
Disallow: /search/
 
Code:
User-agent: proximic
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: magpie-crawler
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /account/
Disallow: /admin.php
Disallow: /attachments/
Disallow: /chat/
Disallow: /conversations/
Disallow: /find-new/
Disallow: /goto/
Disallow: /js/
Disallow: /login/
Disallow: /logos/
Disallow: /members/
Disallow: /search/
 
its very important thread, i hope professional staff of xenforo help us to create the best robots for search engine
thanks
 
I just found something odd. My site 301 redirect is on point. How can I get two different robots.txt? On my server there is only one file "robots.txt" - it's a www - non www issue I imagine?

Screen Shot 2014-03-01 at 7.26.00 AM.webp

Screen Shot 2014-03-01 at 7.26.12 AM.webp
 
Do you have two "sites" on your server for www and non-www? If so, look at the root directory for each to see if it's the same?
 
The same file won't contain/serve different content, so logic dictates that this is two different robots.txt files.

And from what I can see Apache is delivering two different files. Look in your directory structure for your site and see if you can find the two different robots.txt files (and their respective directories) and see how those directories relate to the site configs in Apache (and correct it). (y)
 
Top Bottom