• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Whats your robots.txt file look like? - Xenforo

tommydamic68

Well-known member
#1
Just trying to get some ideas of what anyone is currently using for their robots.txt on their Xenforo site.

Here is mine:

Code:
# robots.txt file for Sphynxlair
# The Largest Sphynx Cat Community in the world!

User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Disallow: /community/find-new/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/register/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/admin.php
Disallow: /community/ajax/
Disallow: /community/misc/contact/
Disallow: /community/data/
Disallow: /community/forums/-/
Disallow: /community/forums/tweets/
Disallow: /community/conversations/
Disallow: /community/events/birthdays/
Disallow: /community/events/monthly/
Disallow: /community/events/weekly/
Disallow: /community/find-new/
Disallow: /community/help/
Disallow: /community/internal_data/
Disallow: /community/js/
Disallow: /community/library/
Disallow: /community/search/
Disallow: /community/styles/
Disallow: /community/login/
Disallow: /community/lost-password/
Disallow: /community/online/
Disallow: /credits/

Allow: /

Sitemap: http://www.sphynxlair.com/community/sitemap/sitemap.xml.gz
 
Last edited:

MattW

Well-known member
#2
I *borrowed* some of mine from @Brogan

Code:
User-agent: Baiduspider
Disallow: /
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /login/
Disallow: /admin.php
Disallow: /conversations/
Allow: /
 

nodle

Well-known member
#7
I was using the default XenForo one.

Code:
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /search/
Disallow: /admin.php
Allow: /
 

kezako

Active member
#8
mine :
Code:
User-agent: Mediapartners-Google
Disallow:

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: *
Disallow: /account*
Disallow: /help*
Disallow: /misc/style*
Disallow: /misc/quick-navigation-menu*
Disallow: /login*
Disallow: /logout*
Disallow: /lost-password*
Disallow: /register*
Disallow: /reports*
Disallow: /search*
Disallow: /conversations*
Disallow: /css.php
Disallow: /cron.php
Disallow: /admin.php
Disallow: /js
Disallow: /styles
Disallow: /members/*
Disallow: /profile-posts/*
Disallow: /online/*
Disallow: /recent-activity/*

Sitemap: http://mywebsite.com/sitemap/sitemap.xml.gz
I use Sitemap add-on http://xenforo.com/community/resources/sitemap-for-xenforo-1-2-compatible.67/
 

Martok

Well-known member
#10
Out of interest, why does Xenforo's own robots.txt file, along with many other Xenforo site (including @Brogan's) disallow /find-new/ ?

I must confess that I had based mine on these but I've noticed that Adsense is giving me crawler errors for doing this. It says

Our crawler was unable to access your page to determine its content and display relevant ads. When our crawler can’t access the content, often we won’t show ads resulting in lower revenue and coverage. Other times, we’ll show ads that are irrelevant resulting in lower CTR. Follow the links in the ‘How to fix.’ column to correct these errors and improve AdSense performance
It's flagging it up because there are adverts on the find-new page (as there are on other people's sites).

Removing the disallow from robots.txt would solve this but I don't want to do that if there's a good reason that it's in everyone else's.
 

RoldanLT

Well-known member
#11
Out of interest, why does Xenforo's own robots.txt file, along with many other Xenforo site (including @Brogan's) disallow /find-new/ ?

I must confess that I had based mine on these but I've noticed that Adsense is giving me crawler errors for doing this. It says



It's flagging it up because there are adverts on the find-new page (as there are on other people's sites).

Removing the disallow from robots.txt would solve this but I don't want to do that if there's a good reason that it's in everyone else's.
Add this on top of your robots.txt
Code:
User-agent: Mediapartners-Google
Disallow:
Just like this.
 

tommydamic68

Well-known member
#13
Out of interest, why does Xenforo's own robots.txt file, along with many other Xenforo site (including @Brogan's) disallow /find-new/ ?

I must confess that I had based mine on these but I've noticed that Adsense is giving me crawler errors for doing this. It says



It's flagging it up because there are adverts on the find-new page (as there are on other people's sites).

Removing the disallow from robots.txt would solve this but I don't want to do that if there's a good reason that it's in everyone else's.
Add this on top of your robots.txt
Code:
User-agent: Mediapartners-Google
Disallow:
Just like this.
I do have that @RoldanLT and still get similar errors from webmaster tools. And member profiles give me 1000's of errors -It's due to not allowing viewing of memberprofiles unless logged in I guess.
 

Mouth

Well-known member
#14
Code:
Crawl-delay: 20

User-agent: BoardReader
User-agent: BoardTracker
User-agent: Gigabot
User-agent: Twiceler
User-agent: dotbot
User-agent: Baidu
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
User-agent: NaverBot
User-agent: Sosospider
User-agent: Yandex
User-agent: YoudaoBot
User-agent: Yeti
Disallow: /

User-agent: Mediapartners-Google*
Disallow:

User-agent: *
Disallow: /admin.php
Disallow: /account/
Disallow: /attachments/
Disallow: /conversations/
Disallow: /find-new/
Disallow: /goto/
Disallow: /login/
Disallow: /search/
 

CTXMedia

Formerly CyclingTribe
#15
Code:
User-agent: proximic
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: magpie-crawler
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /account/
Disallow: /admin.php
Disallow: /attachments/
Disallow: /chat/
Disallow: /conversations/
Disallow: /find-new/
Disallow: /goto/
Disallow: /js/
Disallow: /login/
Disallow: /logos/
Disallow: /members/
Disallow: /search/
 

tommydamic68

Well-known member
#17
I just found something odd. My site 301 redirect is on point. How can I get two different robots.txt? On my server there is only one file "robots.txt" - it's a www - non www issue I imagine?

Screen Shot 2014-03-01 at 7.26.00 AM.png

Screen Shot 2014-03-01 at 7.26.12 AM.png
 

CTXMedia

Formerly CyclingTribe
#18
Do you have two "sites" on your server for www and non-www? If so, look at the root directory for each to see if it's the same?
 

CTXMedia

Formerly CyclingTribe
#20
The same file won't contain/serve different content, so logic dictates that this is two different robots.txt files.

And from what I can see Apache is delivering two different files. Look in your directory structure for your site and see if you can find the two different robots.txt files (and their respective directories) and see how those directories relate to the site configs in Apache (and correct it). (y)