Robots.txt

Quick question ... is this information found in this thread still relevant for version 1.4.x of Xenforo? I am trying to find a way for search engines to stop searching member profiles. I have already set the permissions for the user group Unregistered / Unconfirmed to Not Set (No) ... do I need to do more than that to stop search engines from searching member profiles?
Here is my current robot.txt - I believe this is the one your looking for in bold.

Disallow: /community/members/

Code:
User-agent: Mediapartners-Google
Disallow:

User-agent: Baiduspider
Disallow: /

User-agent: baiduspider
Disallow: /

User-agent: Baiduspider+
Disallow: /

User-agent: Baiduspider-video
Disallow: /

User-agent: Baiduspider-image
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: *
Disallow: /community/find-new/
Disallow: /community/conversations/
Disallow: /community/members/
Disallow: /community/media/users/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/register/
Disallow: /community/posts/
Disallow: /community/js/
Disallow: /community/gallery/
Disallow: /community/media/
Disallow: /community/login/
Disallow: /community/admin.php
Disallow: /community/credits/
Disallow: /blog.php
Disallow: /calendar.php
Disallow: /tags.php
Disallow: /album.php
Disallow: /search.php
Disallow: /announcement.php
Disallow: /community/ishop/
Allow: /

Sitemap: http://www.sphynxlair.com/community/sitemap.php
 
If you have guest searching should you have /search/ disallowed?

Here is what I have and somehow Google is reporting I have over 1M pages blocked

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /search/
Disallow: /conversations/
Disallow: /admin.php

Sitemap: https://www.physicsforums.com/sitemap.php
 
If you have guest searching should you have /search/ disallowed?

Disallowing something in robots.txt only stops the spidering of the content found within that link, and thus creating entries in Google for it. It doesn't prevent anyone from actually going to the link or using it.

IMO, /search/ is not useful because it's only returning content on your site that is already indexed and already has an entry. Similar to /goto/ which is just jumping directly to a post that will already be indexed.
 
Last edited:
I'm guessing this

User-agent: Mediapartners-Google
Disallow:

means Google IS allowed? Or is not? I want Google to find everything from images, videos, text, etc.
 
I'm guessing this

User-agent: Mediapartners-Google
Disallow:

means Google IS allowed? Or is not? I want Google to find everything from images, videos, text, etc.

Yes - it means Google is allowed. There's nothing after the Disallow: so it essentially means disallow nothing (allow everything). (y)
 
I'm using the first one:

User-agent: *
Disallow: /test/
Disallow: /account/
Disallow: /admin.php
Disallow: /ajax/
Disallow: /conversations/
Disallow: /events/birthdays/
Disallow: /events/monthly
Disallow: /events/weekly
Disallow: /find-new/
Disallow: /forums/-/
Disallow: /forums/tweets/
Disallow: /goto/
Disallow: /help/
Disallow: /login/
Disallow: /lost-password/
Disallow: /media/category/
Disallow: /media/keyword/
Disallow: /media/user/
Disallow: /media/service/
Disallow: /media/submit/
Disallow: /misc/style?*
Disallow: /misc/quick-navigation-menu?*
Disallow: /online/
Disallow: /pages/conduct/
Disallow: /pages/privacy/
Disallow: /posts/
Disallow: /threads/tera-tweet-from-*
Disallow: /wiki/special/
Allow: /

I just removed 3 things like test, tera-tweet and wiki but I got a question:

If "allow" is empty, is my content indexed or shall I add something?
In fact, I don't want to loose the indexation so when I see Disallow for forums and posts, what will happen then if it stays like on the list?

Thanks.
 
If "allow" is empty, is my content indexed or shall I add something?
In fact, I don't want to loose the indexation so when I see Disallow for forums and posts, what will happen then if it stays like on the list?

Thanks.

The "Allow: /" says "Allow Everything." Since these lists are processed from top to bottom, what you're doing is disallowing a few things and then at the end saying that if you didn't disallow it, you want to allow everything else. That's what you want.

In regards to /forums/ and /posts/, keep in mind that these things are referencing data that already exists in your /threads/. For example, any thread title listed in the /forums/ list is also shown inside the thread itself, which is a more useful place for that content. Same with /posts/ - that's just a way to link to a very specific post. That content is already indexed in the thread, so re-indexing a specific post is probably not useful.
 
I want to prevent social media from tracking my members and content. I know there are add-ons for this,
but from what I've been told they don't block social media bots, like facebook. Can I use the robots.txt to do that?
 
I want to prevent social media from tracking my members and content. I know there are add-ons for this,
but from what I've been told they don't block social media bots, like facebook. Can I use the robots.txt to do that?

No... what you're talking about isn't "bots."

Robots.txt blocks indexing spiders from crawling your site. Social media tracking of members is done through cookies on the users' computers, which is tracking what they're doing as they're doing it.

You can't really do anything about this. If your users don't want social media sites tracking them, they have to install add-ons to block those cookies. You can't (and shouldn't) manipulate those cookies on a user's computer.
 
No... what you're talking about isn't "bots."

Robots.txt blocks indexing spiders from crawling your site. Social media tracking of members is done through cookies on the users' computers, which is tracking what they're doing as they're doing it.

You can't really do anything about this. If your users don't want social media sites tracking them, they have to install add-ons to block those cookies. You can't (and shouldn't) manipulate those cookies on a user's computer.

When I click on facebook a "robot" instantly appears from facebook, as can be seen in "Members Online."

There is an add-on that claims to block social media tracking.
 
When using route filters, in robots.txt, does one specify original routes, or their replacements, or perhaps both?
 
Robots.txt is purely for SEO purposes - as such you'd use the URL that it routes to (their replacement) as this is the URL that will be seen and indexed.
 
Then use the correct path for where your forum is installed.

How about /members ?
I received lots of crawl errors from MOZ as Duplicate URLs.

If you could please provide best suggestion for robots.txt

Your reply would be appreciated.
 
my robots.txt
Code:
User-agent: *
User-agent: *
Disallow: /account/
Disallow: /goto/
Disallow: /login/
Disallow: /lost-password/
Disallow: /misc/style/
Disallow: /online/
Disallow: /register/
Disallow: /admin.php
Disallow: /index.php?account/
Disallow: /index.php?goto/
Disallow: /index.php?login/
Disallow: /index.php?lost-password/
Disallow: /index.php?misc/style/
Disallow: /index.php?online/
Disallow: /index.php?register/
Disallow: /admin.php
Allow: /

My site: https://mecuabe.com/forum/robots.txt
 
Unless I'm missing something here that robots.txt used by XenForo is wrong, well wrong for everyone else. For starters there is no folder called "account" in XenForo ROOT by default (unless that's used for customers specific to this forum). Also there is no folder located here called "/attachments/" by default in an xenforo installation.

But there are two located here though: "/data/attachments" and "/internal_data/attachments/". The same goes for most of the other folder paths XenForo is using in the robots.txt file. Most of the entries just look all wrong going off the default xenforo folder structure (we install).

User-agent: *
Disallow: /find-new/ no such folder
Disallow: /account/ no such folder
Disallow: /attachments/ folder located in two areas (data and internal_data) folders, but not in "xenforo root" though
Disallow: /goto/ no such folder
Disallow: /posts/ no such folder
Disallow: /login/ no such folder
Disallow: /admin.php correct
Allow: /

I've also read before that you said most other folders in XenForo are covered and don't need to be added in robots.txt for exclusion. I have a question about that one, how are you blocking them then if I can ask? Because for example, lets take indexing the "styles" folder here.

The index.html files used in many folders like that are blank (there's no content in them) to tell spiders not to index those folders. Not like this example of code below - you would use in an index.html file to do it. But as part of the full code used that also includes html, title and body tags e.t.c.


<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
same question @Brogan
 
Unless I'm missing something here that robots.txt used by XenForo is wrong, well wrong for everyone else. For starters there is no folder called "account" in XenForo ROOT by default (unless that's used for customers specific to this forum). Also there is no folder located here called "/attachments/" by default in an xenforo installation.

However there is a URL path with xenforo.com/community/attachments/ so maybe that is what is being referenced in robots.txt
 
Back
Top Bottom