Robots.txt and /posts/ logic. + google +digitalpoint.com

Weppa333

Well-known member
Hello,

I'm trying to figure out why Xenforo.com decided to prevent google to crawl /posts/, which are probably the most prominent links on a XF default install.

By doing so, you actually kill each link to your threads from the homepage.
ALso, /posts/ are actually 301 redirects to /threads/

However, I observe that Google, when indexing /posts/ , decides to keep the /posts/ URL in the SERP ( it seems it does not care at all about the URL beeing a 301 redirect). The expected behaviour would be for google to use the /threads/ destination in the SERP, but I've lokoed at the results for various XF installs, and it does not
So your content is crawled, but with an "unfriendly url" in the result.
eg this example I took with someone posting his forum in the "showcase" forum
Image1.webp


I also noticed some very large and respectable installs, like digitalpoint, decided (like I did) to let google crawl /posts/ otherwise you really make it impossible for google to quickly discover content (it has to make 2 or 3 "click" to find any URL starting with /threads." ...

I noticed that DP.com urls in google results were mostly /threads/. I looked at their http header and they simply 301 redirect /posts/, which is the default behaviour of XF. They do nothing special.

What do you all think about indexing /posts/ ?
If @digitalpoint reads me, wdyt ?

Why is google indexing /posts/ in certain installs (and it should NOT it's a 301 redirect!), and /threads/ in other ?
 
You wouldn't want Google to independently crawl every single posts URL because you'd get penalised for duplicate content.

Every single posts link that points to a single thread will flag as duplicate content. The posts links are only there to link your users directly to the correct position in the thread.

What is crawled, however, is the threads link. These are the most relevant. The threads links only point to one copy of all the posts in that thread so this is the most appropriate.

Do not allow Google to crawl posts links.

That's just my opinion on it. I might be wrong, but I'm pretty sure that's why it is like it is.
 
Thanks Chris,

I understand this reasoning, however,

Usually, Google gives more "weight" to page linked from your root.
On an default xf install, no /threads/ is ever linked from homepage, only /posts/
So basically, it takes at least 2 "pages" to finally see /threads/ . It's unlikely google will crawl this deep on a new domain for example.

ALso, preventing /posts/ to be crawled (in robots.txt ) will make you flood by adsense telling you it can"t access /posts/ yet you often ask for advertisments on it.

And again, a very large install (dp.com) decided to index /posts/ with really nothing special in place (I lokoed at their source and http headers for the redirection) so i'm very curious
 
ALso, preventing /posts/ to be crawled (in robots.txt ) will make you flood by adsense telling you it can"t access /posts/ yet you often ask for advertisments on it.

You don't you ask for advertisements on threads.

As it's a 301 redirect, you're only ever requesting the ads on the threads links.

It's unlikely google will crawl this deep on a new domain for example.
We have a new domain and it is indexing threads fine (with /posts/ disallowed).

I'm sure it's going to be one of those things with pros and cons for both sides.

That's how I understand the default behavior hopefully someone as expert as Shawn can clarify the thinking behind how digitalpoint.com does it.
 
Thanks,
I'm really taking into account here that, at somepoint, KAM decided that "/posts/ should not be indexed" on their own forum. It's their software, they understand it better than I do, and I do value their "opinion" that is reflected on this site's robots.txt

However, xf.com is not the largest XF install, and many larger XF install don't have a robots.txt at all, or seem to agree to index /posts/ which is a bit weird...

Finally, google simply does not work "as expected" with those /posts/ 301 redirects : it should index the destination URL, not the redirect URL ! But it seems this is what it does now, indexing the "first URL ever seen with that content" as the canonical.

For me, the "proper fix" for this problem would be to hack XF so that, for unregistered users (guests) it show /threads/ on the homepage, and not /posts/ .
 
Does XenForo even have any /posts/ links normally if the user isn't logged in? The Like link is one, but search engine will never see it since they don't log in.

You won't get a duplicate content penalty because there is no duplicate content (it does a proper redirect to the correct thread). Just one of those things that really makes very little difference either way.
 
I verified on many installs, including yours :), that unreg. users see /posts/ on the homepage.
I agree with what you say about /posts/ not beeing duplicates, the fact remains that google includes the wrong URL in the SER.
 
For me, the "proper fix" for this problem would be to hack XF so that, for unregistered users (guests) it show /threads/ on the homepage, and not /posts/ .

Good point. If you want to do it correctly, this fix should be made.
However -in real life- it seems to be fine either way.
 
Thanks,
I'm not an SEO integrist and I believe you are right, I've seen sucessful communities both with and without /posts/ indexed ...
I'm glad to hear what experienced admins think about the issue tough :)
 
Top Bottom