1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Robots.txt and /posts/ logic. + google +digitalpoint.com

Discussion in 'Server Configuration and Hosting' started by Weppa333, Feb 16, 2013.

  1. Weppa333

    Weppa333 Active Member


    I'm trying to figure out why Xenforo.com decided to prevent google to crawl /posts/, which are probably the most prominent links on a XF default install.

    By doing so, you actually kill each link to your threads from the homepage.
    ALso, /posts/ are actually 301 redirects to /threads/

    However, I observe that Google, when indexing /posts/ , decides to keep the /posts/ URL in the SERP ( it seems it does not care at all about the URL beeing a 301 redirect). The expected behaviour would be for google to use the /threads/ destination in the SERP, but I've lokoed at the results for various XF installs, and it does not
    So your content is crawled, but with an "unfriendly url" in the result.
    eg this example I took with someone posting his forum in the "showcase" forum

    I also noticed some very large and respectable installs, like digitalpoint, decided (like I did) to let google crawl /posts/ otherwise you really make it impossible for google to quickly discover content (it has to make 2 or 3 "click" to find any URL starting with /threads." ...

    I noticed that DP.com urls in google results were mostly /threads/. I looked at their http header and they simply 301 redirect /posts/, which is the default behaviour of XF. They do nothing special.

    What do you all think about indexing /posts/ ?
    If @digitalpoint reads me, wdyt ?

    Why is google indexing /posts/ in certain installs (and it should NOT it's a 301 redirect!), and /threads/ in other ?
  2. Chris D

    Chris D XenForo Developer Staff Member

    You wouldn't want Google to independently crawl every single posts URL because you'd get penalised for duplicate content.

    Every single posts link that points to a single thread will flag as duplicate content. The posts links are only there to link your users directly to the correct position in the thread.

    What is crawled, however, is the threads link. These are the most relevant. The threads links only point to one copy of all the posts in that thread so this is the most appropriate.

    Do not allow Google to crawl posts links.

    That's just my opinion on it. I might be wrong, but I'm pretty sure that's why it is like it is.
  3. Weppa333

    Weppa333 Active Member

    Thanks Chris,

    I understand this reasoning, however,

    Usually, Google gives more "weight" to page linked from your root.
    On an default xf install, no /threads/ is ever linked from homepage, only /posts/
    So basically, it takes at least 2 "pages" to finally see /threads/ . It's unlikely google will crawl this deep on a new domain for example.

    ALso, preventing /posts/ to be crawled (in robots.txt ) will make you flood by adsense telling you it can"t access /posts/ yet you often ask for advertisments on it.

    And again, a very large install (dp.com) decided to index /posts/ with really nothing special in place (I lokoed at their source and http headers for the redirection) so i'm very curious
  4. Chris D

    Chris D XenForo Developer Staff Member

    You don't you ask for advertisements on threads.

    As it's a 301 redirect, you're only ever requesting the ads on the threads links.

    We have a new domain and it is indexing threads fine (with /posts/ disallowed).

    I'm sure it's going to be one of those things with pros and cons for both sides.

    That's how I understand the default behavior hopefully someone as expert as Shawn can clarify the thinking behind how digitalpoint.com does it.
  5. Weppa333

    Weppa333 Active Member

    I'm really taking into account here that, at somepoint, KAM decided that "/posts/ should not be indexed" on their own forum. It's their software, they understand it better than I do, and I do value their "opinion" that is reflected on this site's robots.txt

    However, xf.com is not the largest XF install, and many larger XF install don't have a robots.txt at all, or seem to agree to index /posts/ which is a bit weird...

    Finally, google simply does not work "as expected" with those /posts/ 301 redirects : it should index the destination URL, not the redirect URL ! But it seems this is what it does now, indexing the "first URL ever seen with that content" as the canonical.

    For me, the "proper fix" for this problem would be to hack XF so that, for unregistered users (guests) it show /threads/ on the homepage, and not /posts/ .
  6. digitalpoint

    digitalpoint Well-Known Member

    Does XenForo even have any /posts/ links normally if the user isn't logged in? The Like link is one, but search engine will never see it since they don't log in.

    You won't get a duplicate content penalty because there is no duplicate content (it does a proper redirect to the correct thread). Just one of those things that really makes very little difference either way.
  7. Weppa333

    Weppa333 Active Member

    I verified on many installs, including yours :), that unreg. users see /posts/ on the homepage.
    I agree with what you say about /posts/ not beeing duplicates, the fact remains that google includes the wrong URL in the SER.
  8. HWS

    HWS Well-Known Member

    Good point. If you want to do it correctly, this fix should be made.
    However -in real life- it seems to be fine either way.
  9. Weppa333

    Weppa333 Active Member

    I'm not an SEO integrist and I believe you are right, I've seen sucessful communities both with and without /posts/ indexed ...
    I'm glad to hear what experienced admins think about the issue tough :)
  10. digitalpoint

    digitalpoint Well-Known Member

    I think you are just over thinking it. :)
  11. Weppa333

    Weppa333 Active Member

    yeah the SEO paranoïa finally hit me :LOL:

Share This Page