1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.4 Stuff blocked by robot.txt gets indexed

Discussion in 'Troubleshooting and Problems' started by Pavle123, Jan 27, 2015.

  1. Pavle123

    Pavle123 Active Member

    Few members on the forum suggested a while ago that blocking stuff with robots.txt will get search engine not to index those things.

    However take a look at attachments that got indexed. It looks pretty bad in search engines. I would like to keep my indexed things nice, clean and content-rich, but these sort of pages bother me.

    Is there a way to noindex attachments? I´ve been using *******´s Advanced NoIndex, but they do not have option for that.
     
  2. Mike

    Mike XenForo Developer Staff Member

    The pages were probably indexed at one point. They may be removed over time since they're listed in robots.txt. That said, they're actually error pages (since you don't allow guests to view attachments) so they likely would have been removed without the robots.txt.
     
  3. Mr Lucky

    Mr Lucky Well-Known Member

    I've noticed it is possible for files in robots.txt to be indexed.

    Google has suggested this can happen, especially when there are links to the pages.

    Meta robots no index is better, but not easy for specific pages.
     
  4. crogoon

    crogoon Member

    About robots.txt do I also need to put custom route filters for blocked paths? For example if I have route filter profile/ for account/ do I need to put them both in robots.txt
     
  5. cmeinck

    cmeinck Well-Known Member

    I can confirm this. Lost of ton of traffic during the last Panda refresh because of it. No-index > robots.txt.
     
    Pavle123 likes this.
  6. Pavle123

    Pavle123 Active Member

    How can we then noindex those pages? It really can be a serious issue. It would be nice if someone from administrators could help out.
     
  7. Pavle123

    Pavle123 Active Member

    @Mike Is there a way we can put rel="nofollow" or attachment? Either via template edit or something?
     
  8. Mike

    Mike XenForo Developer Staff Member

    See the attached_files and bb_code_attach templates.
     
  9. Pavle123

    Pavle123 Active Member

    @Mike Can you please tell me exactly (which line) and what I have to put in those templates in order to have attachements to rel="nofollow".
    Sorry its my first month with XF , so I need some step by step if you do not mind.
    Thanks in advance Mike.
     
  10. Mike

    Mike XenForo Developer Staff Member

    You'd need to add that attribute to every <a> tag with an href that starts with "{xen:link attachments..." in those templates. There are a number based on different paths being true.
     
  11. Pavle123

    Pavle123 Active Member

    Hi @Mike

    Thanks, but should´t that be a rule for rel="nofollow" on links?

    From what I read online, <meta name="robots" content="noindex, nofollow" /> should be put somewhere in the html instead of the href?

    From Google´s guidelines "You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code."

    Hope you can help me out, I am really having troubles with my attachmenets being indexed. It results in more attachments shown in google then actual threads from my forum, as its a new forum.

    Thank you.
     
  12. Mike

    Mike XenForo Developer Staff Member

    I don't understand what you mean. You asked how to nofollow attachment links.

    It's still a moot point as noted earlier because you don't seem to allow attachments to be viewed by guests so Google won't index the no permission page (as it's sent with a 403 input). Or you can just block them via robots.txt as you've mentioned.

    Attachments don't have an HTML page so this isn't applicable.

    If you're just doing searches based on the site, you're really not seeing results that are relevant to how a user would actually search. You're probably seeing plenty of results that will effectively never come up in real usage.
     
  13. Pavle123

    Pavle123 Active Member

    Hi @Mike

    I asked how I can noindex the attachments from the search result. As you can see in my opening post, there are so many attachments indexed. I have blocked them by robots.txt, but no result.

    In WordPress I used to have a option to put rel="noindex" meta to a attachment, but I guess here its not possible for some reason? Why would we want to have attachments indexed anyway?

    Can you guys somehow in future implement this to be a noindex thing? From SEO point of view it looks like a junk in search result. Low quality, thin page.
     
  14. Mike

    Mike XenForo Developer Staff Member

    You can only put a noindex entry in a meta tag if it's an HTML page. Attachments aren't HTML pages so there's no meta tag to add. (It seems that you can send an HTTP header, though it's not something I've seen before.)

    Robots.txt doesn't strictly prevent it from indexing the record of the page -- it just won't index the content. Since your attachments aren't viewable by guests, the page won't be indexed (without the robots.txt in place) because it's just an error page. So if you're really concerned about the entry of a page being there, you may wish to allow it to be accessed via robots.txt and simply let it fall out of the index or explicitly request URL removals for it.
     
    Pavle123 likes this.
  15. Kuma

    Kuma Active Member

    The error pages should have a no-index meta tag, right?
     
  16. Mike

    Mike XenForo Developer Staff Member

    Correct (they also send a 4xx or 5xx level HTTP code so that wouldn't be indexed anyway).
     
  17. Pavle123

    Pavle123 Active Member

    @Mike thank you, that makes a lot of sense now.

    Can you tell me if in my robots.txt I am blocking something that prevents engines from crawling and noindexing my forum lists.

    Example of list with noindex meta.

    Example of my robots.txt

    Here is how it looks in Google, like thin content.

    What am I doing wrong?

    Additional info.

    I am also seeing. All of those blocked by robots.txt by the way.
    Appreciate your help.

    Also maybe it would be a good idea for XF to have its own noindex nofollow resource or something? @Brogan what do you think?
     
  18. dethfire

    dethfire Well-Known Member

    I can confirm this. Robots is near useless. I setup my robots.txt prior to migrating to XF so there was no opportunity for google to sneak in. Looking now I have 91k member profiles indexed in google. Unbelievable.
     
  19. cmeinck

    cmeinck Well-Known Member

    You nailed it, unbelievable.
     
  20. Pavle123

    Pavle123 Active Member

    And yet nobody from Support team seams to care to explain to us how to put rel="noindex" on those useless pages that gets indexed.
    Edit : sorry I meant
    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> not rel=nofollow, my bad.
     
    Last edited: Feb 27, 2015
    dethfire likes this.

Share This Page