Whats your robots.txt file look like? - Xenforo

The same file won't contain/serve different content, so logic dictates that this is two different robots.txt files.

And from what I can see Apache is delivering two different files. Look in your directory structure for your site and see if you can find the two different robots.txt files (and their respective directories) and see how those directories relate to the site configs in Apache (and correct it). (y)
thats the odd part, If I search "all" directories on my server, I get one robots.txt file.
 
You have just one robots.txt, hard refresh the two pages.
Or visit it on Incognito Mode.
 
There's definitely two different robots.txt files showing for the www and non-www version of your site.

Are you using caching of some sort that could be serving an older version on the www URL? I'm assuming this is older as it has less in it.
Have you searched through all of your files to see if there is more than one robots.txt file?
Have you checked your .htaccess file to make sure you have nothing in there pointing to robots.txt?
Does your hosting control panel have a facility to add a robots.txt file that might be causing the second version?
 
There's definitely two different robots.txt files showing for the www and non-www version of your site.

Are you using caching of some sort that could be serving an older version on the www URL? I'm assuming this is older as it has less in it.
Have you searched through all of your files to see if there is more than one robots.txt file?
Have you checked your .htaccess file to make sure you have nothing in there pointing to robots.txt?
Does your hosting control panel have a facility to add a robots.txt file that might be causing the second version?
Cache - not that I am aware of.
Robots.txt - only one in the public_html
no other option in my Cpanel to place a robots.txt file
.htaccess file - The only thing with robots.txt in it is this .htaccess file that i think is default with Xenforo - I did not place it there and its in the public_html/community folder. "rewriteRule ^(data/|js/|styles/|install/|favicon\.ico|crossdomain\.xml|robots\.txt) - [NC,L]"

Code:
#    Mod_security can interfere with uploading of content such as attachments. If you
#    cannot attach files, remove the "#" from the lines below.
#<IfModule mod_security.c>
#    SecFilterEngine Off
#    SecFilterScanPOST Off
#</IfModule>

ErrorDocument 401 default
ErrorDocument 403 default
ErrorDocument 404 default
ErrorDocument 500 default


RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www\.sphynxlair\.com)?$
RewriteRule ^(.*)$ http://sphynxlair.com/community/$1 [R=301,L]

     

<IfModule mod_rewrite.c>
    RewriteEngine on
           
     

    #    If you are having problems with the rewrite rules, remove the "#" from the
    #    line that begins "RewriteBase" below. You will also have to change the path
    #    of the rewrite to reflect the path to your XenForo installation.
    #RewriteBase /xenforo

    #    This line may be needed to enable WebDAV editing with PHP as a CGI.
    #RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

    RewriteCond %{REQUEST_FILENAME} -f [OR]
    RewriteCond %{REQUEST_FILENAME} -l [OR]
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteRule ^.*$ - [NC,L]
    RewriteRule ^(data/|js/|styles/|install/|favicon\.ico|crossdomain\.xml|robots\.txt) - [NC,L]
        RewriteRule ^.*$ index.php [NC,L]
</IfModule>
 
Hi everyone, having a question please.
Can someone explain why we should disallow find-new ?
Is it not better if search engines can go in recent posts to see what is new and index new things faster?

Thanks in advance for your feedback
 
I've updated mine recently to block a few more scrapers/crawlers:

Code:
User-agent: AhrefsBot
Disallow: /

User-agent: Baidu
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: /

User-agent: Cliqzbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: EasouSpider
Disallow: /

User-agent: Exabot
Disallow: /

User-agent: linkdexbot
Disallow: /

User-agent: linkdexbot-mobile
Disallow: /

User-agent: magpie-crawler
Disallow: /

User-agent: meanpathbot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: NaverBot
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: proximic
Disallow: /

User-agent: Rogerbot
Disallow: /

User-agent: SiteBot
Disallow: /

User-agent: sogou
Disallow: /

User-agent: sogou spider
Disallow: /

User-agent: Sogou web spider
Disallow: /

User-agent: spbot
Disallow: /

User-agent: trendictionbot
Disallow: /

User-agent: Twiceler
Disallow: /

User-agent: URLAppendBot
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: YoudaoBot
Disallow: /

User-agent: Yeti
Disallow: /

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /account/
Disallow: /admin.php
Disallow: /attachments/
Disallow: /conversations/
Disallow: /find-new/
Disallow: /goto/
Disallow: /login/
Disallow: /logos/
Disallow: /posts/

Interestingly I had someone from Yandex visit CycleChat and sent a contact message to ask me to unblock their crawler from my robots file. I've blocked it on the basis that it's a Russian search engine, my site is English, and I'm not aware of having very many (if any) English speaking Russian's as members - so why give up crawling bandwidth for seemingly little benefit; however I wonder if anyone else here blocks / allows Yandex and why?

Cheers,
Shaun :D
 
I was using the default XenForo one.

Code:
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /search/
Disallow: /admin.php
Allow: /
There's a default robots.txt included with XenForo? That's weird because I can't find any on my installation :unsure: Or do you mean the one at https://xenforo.com/robots.txt?
 
Nop, there is not default robots.txt file. You have to create one yourself and put it on server
 
I've updated mine recently to block a few more scrapers/crawlers:

Code:
User-agent: AhrefsBot
Disallow: /

User-agent: Baidu
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: /

User-agent: Cliqzbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: EasouSpider
Disallow: /

User-agent: Exabot
Disallow: /

User-agent: linkdexbot
Disallow: /

User-agent: linkdexbot-mobile
Disallow: /

User-agent: magpie-crawler
Disallow: /

User-agent: meanpathbot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: NaverBot
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: proximic
Disallow: /

User-agent: Rogerbot
Disallow: /

User-agent: SiteBot
Disallow: /

User-agent: sogou
Disallow: /

User-agent: sogou spider
Disallow: /

User-agent: Sogou web spider
Disallow: /

User-agent: spbot
Disallow: /

User-agent: trendictionbot
Disallow: /

User-agent: Twiceler
Disallow: /

User-agent: URLAppendBot
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: YoudaoBot
Disallow: /

User-agent: Yeti
Disallow: /

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /account/
Disallow: /admin.php
Disallow: /attachments/
Disallow: /conversations/
Disallow: /find-new/
Disallow: /goto/
Disallow: /login/
Disallow: /logos/
Disallow: /posts/

Interestingly I had someone from Yandex visit CycleChat and sent a contact message to ask me to unblock their crawler from my robots file. I've blocked it on the basis that it's a Russian search engine, my site is English, and I'm not aware of having very many (if any) English speaking Russian's as members - so why give up crawling bandwidth for seemingly little benefit; however I wonder if anyone else here blocks / allows Yandex and why?

Cheers,
Shaun :D

I have Baidu blocked via robots.txt yet they still visit my site, have you checked to see if they no longer crawl your site?

Screen Shot 2014-09-13 at 6.30.59 AM.webp

Code:
User-agent: Mediapartners-Google
Disallow:

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: *
Disallow: /community/find-new/
Disallow: /community/conversations/
Disallow: /community/members/
Disallow: /community/media/users/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/register/
Disallow: /community/posts/
Disallow: /community/js/
Disallow: /community/gallery/
Disallow: /community/media/
Disallow: /community/login/
Disallow: /community/admin.php
Disallow: /community/credits/
Disallow: /blog.php
Disallow: /calendar.php
Disallow: /tags.php
Disallow: /album.php
Disallow: /search.php
Disallow: /announcement.php
Disallow: /community/ishop/
Allow: /

Sitemap: http://www.mysite.com/community/sitemap.php
 
Hi everyone, having a question please.
Can someone explain why we should disallow find-new ?
Is it not better if search engines can go in recent posts to see what is new and index new things faster?

I would like to know this as well.

This is my current robots.txt

Code:
User-agent: Mediapartners-Google
Disallow:

User-agent: Adsbot-Google
Disallow:

User-agent: Googlebot-Mobile
Disallow:

User-agent: AhrefsBot
Disallow: /

User-agent: Baidu
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: /

User-agent: Cliqzbot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: EasouSpider
Disallow: /

User-agent: Exabot
Disallow: /

User-agent: linkdexbot
Disallow: /

User-agent: linkdexbot-mobile
Disallow: /

User-agent: magpie-crawler
Disallow: /

User-agent: meanpathbot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: NaverBot
Disallow: /

User-agent: omgilibot
Disallow: /

User-agent: proximic
Disallow: /

User-agent: Rogerbot
Disallow: /

User-agent: SiteBot
Disallow: /

User-agent: sogou
Disallow: /

User-agent: sogou spider
Disallow: /

User-agent: Sogou web spider
Disallow: /

User-agent: spbot
Disallow: /

User-agent: trendictionbot
Disallow: /

User-agent: Twiceler
Disallow: /

User-agent: URLAppendBot
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: YoudaoBot
Disallow: /

User-agent: Yeti
Disallow: /

User-Agent: *
Disallow: /?page=
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /admin.php
Disallow: /members/
Disallow: /conversations/
Allow: /

Sitemap: http://mysite.co.uk/sitemap.php
 
Back
Top Bottom