Why disallow "posts" in robots.txt ?

Weppa333

Well-known member
Hello all

First of all, congrats for releasing 1.2,

I've already asked this question in the past, but didn't receive an answer from staff (it was during litigation times)

May I kindly ask why, on XF.com, someone decided to add
Disallow: /community/posts/

in robots.txt ?

What is exactly the SEO explanation for this decision ?
 
No SEO explanation. Just no point in having it waste time with redirects back to content its already seen.
 
I'm asking because there is some debate in the SEO community about 301 redirects and what google makes of them.
XF installs are a very good example to show that Google does NOT do what it is supposed to do (not index the 301 url but index i'ts destination). It really looks like google has a two pass approach on these redirects ; first it indexes /posts/blah , knowing it's a 301 redirect , and after days/weeks, some " clean up" process removes the redirects and points to the final url.

I thought that your decision was based on some SEO observation.

There is something seriously wrong with the way google handles 301 redirects, but this has nothing to do with XF of course.

My 2 cents,
 
I completely forgot to add a robots.txt on a forum that was converted to xenforo a couple of days ago and found that individual posts were getting indexed as well so just checked the robots.txt on here and added the same thing which includes /posts.

The seo explaination is that if you let google index /posts then your main thread url some times may not rank well and those threads which would have ranked well otherwise may be hit with a duplicate penelty even though temporary but why loose any traffic atall.
 
That is indeed the best way of doing it if it doesnt involve too much work. I didint include a robots.txt in the forum I converted and I have a few hundred individual posts showing up.

Google is already hating communities like never before and ranking sites like stackoverflow in top results which only contain ripped off content from our forums. It makes sense not to give Google even small reasons to rank us even worse then they rank us these days.

There is a strong point for using the # thing. Or just add to installation instruction that you "do" need a robots.txt that would disallow certain things as google is stupid enough to get confused.

Or if possible then just have the direct link to the thread instead of individual posts just for guests and let members be able to see individual posts urls.
 
This is a MOD I did on my own install, but the problem remains that the front page shoudl lin directly to second , third ,etc, pages of threads. CUrrently the only way to do that is with the "post" redirect

But "censoring" all /posts/ from frontpage (either with robots or with the MOD you describe for guests), you kill the potential SERP of a link "on your homepage" (the homepage gives additionnal power to the pages linked )
If you have to wait for googlegot to discover the "N-th" page by itself, you can wait for weeks, and the SEPR will be catastrophic ( for him, it's a page only accessible after N-1 clicks on "next page" )

The only proper way is the one I describe, I'm afraid.
 
Oh by the way, I agree with you when you say google now hates comunities.
This is mostly the panda update and very few people talk about this.
If I may say so, XF's SEO is very good in theory, but in the facts, and I've installed all other forum softwares, it's the one that ranks the least for me.

VB seem to have serious friends at Google, because the maths just don't add : in page SEO for XF is far superior, there are just "specific" scripts at google for VB, to make googlebot "better understand" VB based pages. I'm sure about this... Short story is ; the same page ranks better in VB than in XF. I know I'm gonna be flamed but that is my experience atm.

If google is of any importance to the team, the thing to fix urgently is this /posts/ redirect.
 
There is really no reason/excuse for having a /posts/ link instead of having the /thread.ID/page-Y#post.

If you open up a new thread in the suggestion section I would be happy to work on gathering support for this to be changed. There is no reason why any one would not support this suggestion.

If there is a tiny performance hit which I dont think there would be but if there is I am willing to take it to increases the prospect of having my content ranked higher. I would like to hear a few very good reasons for this not to be changed.
 
Your quote does it, the arrow points back to the quoted comment.

The problem with doing it as suggested is that you'd have to actually store the topic URL or at least topic id inside the message itself and that falls apart if you split and merge topics (as other forum software learned)
 
Each forum keeps a cache that is updated when a new post is inserted into a thread that resides in that forum.

That cache consists of the last post ID, last thread title etc.

What this means, is that all of the information required to display the last post information is already there in the forum table so there doesn't need to be any complex joins or sub queries or anything like that. Then the /posts/postid is a redirect that ascertains the exact position and page of that post within the thread.

So, Parminder, the "reason/excuse" is actually a very good one, even if you don't agree with it yourself.

Am I correct in saying that the steps you've taken to disallow /posts have essentially solved the issue for you? If that's the case, surely there just needs to be best practice for people to disallow posts themselves?
 
Could you please provide an example of how Xenforo currently links to a post and how you suggest if should be done.

If you go to the home page and see the Announcements forum it has the latest post listed there which is this link : http://xenforo.com/community/posts/618539/ and when you click you would be taken to http://xenforo.com/community/threads/xenforo-1-2-1-released.58022/#post-618539 which is what should show up on the home page as well instead of http://xenforo.com/community/posts/618539/

Now the problem with having /posts disallowed in your robots.txt is that you now do not have any link on your home page to push and shove google to the latest content that has been posted which would result in delayed indexing and due to lack of a direct link from home page it would not rank as well it would do if there was a link that google is allowed to follow.
 
Each forum keeps a cache that is updated when a new post is inserted into a thread that resides in that forum.

That cache consists of the last post ID, last thread title etc.

What this means, is that all of the information required to display the last post information is already there in the forum table so there doesn't need to be any complex joins or sub queries or anything like that. Then the /posts/postid is a redirect that ascertains the exact position and page of that post within the thread.

So, Parminder, the "reason/excuse" is actually a very good one, even if you don't agree with it yourself.

Am I correct in saying that the steps you've taken to disallow /posts have essentially solved the issue for you? If that's the case, surely there just needs to be best practice for people to disallow posts themselves?

Sorry, Call it my lack of IQ or whatever you may call it but I am not able to relate your post to anything that is being discussed here. Please rephrase it for me if you wouldnt mind.
 
Now the problem with having /posts disallowed in your robots.txt is that you now do not have any link on your home page to push and shove google to the latest content that has been posted which would result in delayed indexing and due to lack of a direct link from home page it would not rank as well it would do if there was a link that google is allowed to follow.

Home page is not the correct term, I think you mean Forum List page. At any rate, I'm not sure how beneficial having the Latest post is to Google anyway. There would be many other posts and threads that would be new, so Google has to find all new posts and I assume it does this by following each forum link.

It might actually be a benefit to have the "http://xenforo.com/community/posts/618539/" format and exclude that in the robots.txt to avoid duplicate links.
 
My understanding is that if you have this link http://xenforo.com/community/threads/xenforo-1-2-1-released.58022/#post-618539 as opposed to http://xenforo.com/community/posts/618539/ then Google would follow it from the Forum list page and find the updated page with the new post and would rank for a few more long tails if Google finds it fits the search criteria for a certain phrase or word. Yes Google will eventually find its way even if we have this http://xenforo.com/community/posts/618539/ instead of the direct link. I personally prefer fresh content to be indexed as soon as possible and the few hours that link would stay on the Forum list page it would pass on some authority to that thread display page. With a /posts link that we have disallowed using robots.txt wont be followed and would not benefit from the time it spends being listed on the forum display page.

As mentioned by Weppa333 in one of the posts above http://xenforo.com/community/threads/why-disallow-posts-in-robots-txt.56006/#post-624965
There is everything in that post. It says it all.
 
I personally prefer fresh content to be indexed as soon as possible and the few hours that link would stay on the Forum list page it would pass on some authority to that thread display page.

I just checked my robots.txt file and I didn't have the following entry:

Disallow: /posts/

I just added it to see how that will change things.

My new posts and threads are indexed by Google extremely fast, typically within an hour.
 
Within an hour isint what most people would call fast. It would be fast if you posted a new thread and searched for it on Google after clicking submit and it shows up saying "1 min ago" and this would be more likely if a direct link appeared on the forum list page even for a brief interval.
 
Wouldn't it be cool if Google had an API that we could use to update it directly. That way every time a new post is created, or a post that has been edited is saved, the new data would be sent to Google for them to put into their index.
 
Another thing to consider is that Google can and I'm sure uses the Recent Posts link found under the Forums tab to find the latest posts.
 
Top Bottom