robots.txt question

Ryan Kent · May 20, 2011

I have an RSS feed which automatically posts in a forum. This forum is used exclusively for this feed. I want to block this forum from being crawled. What is the best method?

I tried adding the path to the node to my robots.txt file and that did not help. I realize the path to all threads is /threads. Can I use a wildcard with robot.txt inline?

What I mean is all the threads begin as:
www.mysite.com/threads/tweet-from-

Can I add www.mysite.com/threads/tweet-from-* to the robots.txt file?

Steve F · May 20, 2011

Have you tried this?

~~Disallow: /threads/tweet-from-~~

Edit: Looking at your site I see where you talking about, try

Disallow: /forums/twitter/

Ryan Kent · May 20, 2011

I had that code already. To the best of my knowledge that would block the actual main forum page, but none of the threads contained within the forums. I like the XF URL structure, but this is one of the drawbacks.

/threads/anything.... if not contained within /forums/twitter

Steve F · May 20, 2011

That should ONLY block whats under "/forums/twitter/". This would block the forums completely: Disallow: /forums/

For reference check out the robots.txt here on the XF site.

http://xenforo.com/robots.txt

Code:

User-agent: *
Disallow: /community/find-new/
Disallow: /community/forums/-/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/admin.php
Allow: /

Ryan Kent · May 20, 2011

I understand all of those entries except /community/forums/-/. What is that designed to block?

Jake Bunce · May 20, 2011

If you disallow access to that forum for guests then search engines won't have access:

Admin CP -> Users -> Node Permission

Paul B · May 20, 2011

Oracle said:
I understand all of those entries except /community/forums/-/. What is that designed to block?

That is for Mark all forums read.

There's an old thread here about it: http://xenforo.com/community/threads/my-robots-txt-file.6752/

Ryan Kent · May 20, 2011

@Jake, I have taken your suggestion. Outside of the admin forums, our site is open to all. I don't wish to restrict public access to any sections if reasonably possible. So there isn't a means to use a wild card in the thread title like www.mysite.com/threads/tweet-from-*?

@anyone, if I use the below, then all of my attachments would presumably be blocked. At times I do Google Image searches and see files there. I presume those images would then all be blocked for my site?

Disallow: /community/attachments/

Paul B · May 20, 2011

Including a directory in robots.txt doesn't mean the content is blocked as such, it just prevents crawlers from accessing it.

However, only crawlers which abide by the convention will heed the robots.txt file.
A lot of them don't.

Ryan Kent · May 20, 2011

That's fine. I really only care about Google + Bing. Together they account for most of the traffic. I presume (perhaps wrongly so) any good smaller search engine would follow the same standards. If a search engine decides to go rogue, they probably don't have all that much traffic anyway.

Paul B · May 20, 2011

All of the well known search engines should comply with the standards.

Forsaken · May 20, 2011

Brogan said:
Including a directory in robots.txt doesn't mean the content is blocked as such, it just prevents crawlers from accessing it.

However, only crawlers which abide by the convention will heed the robots.txt file.
A lot of them don't.

Robots.txt isn't a method of stopping them from accessing pages, but a way to stop them for telling them not to index them.

Paul B · May 20, 2011

Yes, poor choice of words on my part.

Forsaken · May 20, 2011

Brogan said:
Yes, poor choice of words on my part.

I was just correcting it because most people believe it does stop bots from accessing pages.

Ryan Kent · May 21, 2011

any reason not to block /misc/ ?

I notice the /misc/quic-navigation-menu? URLs, but I am not sure if there are other more useful URLs

robots.txt question

Ryan Kent

Well-known member

Steve F

Well-known member

Ryan Kent

Well-known member

Steve F

Well-known member

Ryan Kent

Well-known member

Jake Bunce

Well-known member

Paul B

XenForo moderator

Ryan Kent

Well-known member

Paul B

XenForo moderator

Ryan Kent

Well-known member

Paul B

XenForo moderator

Forsaken

Well-known member

Paul B

XenForo moderator

Forsaken

Well-known member

Ryan Kent

Well-known member

Similar threads

We value your privacy