For those who do not know, a robots.txt file tells search engines what parts of the site they can and cannot explore. These are recommendations to the search engine crawlers which can be ignored, but Google and Bing follow them for the most part.
There are other ways to block content such as the "nofollow" tag, but the robots.txt file is the fastest and easiest way. Before examining my robots.txt file there are a few additional notes:
- I have XenPorta and all supporting add-ons installed
- Your robots.txt file is publicly viewable. You would never try to hide private data simply by making a robots.txt entry
- A primary reason to use the robot text file is to prevent unwanted pages from appearing in search results. If you have a forum discussing Chevy Corvettes then you may wish to block your "off topic" section and other irrelevant pages.
- A secondary reason to block areas of your site is to keep "junk" out of the search engines. A search engine will usually not crawl your whole site. By eliminating the junk, you help crawlers locate your quality content faster so it gets indexed.
The above entries explained.
/test is my backup and testing site. It should not be crawled.
/-/ is a page for marking all forums as read
/tweets/ is a forum I use to automatically post all tweets related to my site
/goto/ is used to goto a specified post
/media/keyword /user /service /submit are all XenMedio support pages
/threads/tera-tweet-from-* is a RSS feed that auto-creates a thread for each tweet
/wiki/special are the wiki support pages
NOTE: Ideally a robots.txt file is blank. You should let the crawlers access all of your site and control which pages shouldn't be indexed with the "noindex" tag. This method is used because XF doesn't offer the flexibility to easily add the noindex tag to pages on an individual basis.