XF 1.4 Stuff blocked by robot.txt gets indexed

Pavle123

Active member
Few members on the forum suggested a while ago that blocking stuff with robots.txt will get search engine not to index those things.

However take a look at attachments that got indexed. It looks pretty bad in search engines. I would like to keep my indexed things nice, clean and content-rich, but these sort of pages bother me.

Is there a way to noindex attachments? I´ve been using *******´s Advanced NoIndex, but they do not have option for that.
 
The pages were probably indexed at one point. They may be removed over time since they're listed in robots.txt. That said, they're actually error pages (since you don't allow guests to view attachments) so they likely would have been removed without the robots.txt.
 
I've noticed it is possible for files in robots.txt to be indexed.

Google has suggested this can happen, especially when there are links to the pages.

Meta robots no index is better, but not easy for specific pages.
 
About robots.txt do I also need to put custom route filters for blocked paths? For example if I have route filter profile/ for account/ do I need to put them both in robots.txt
 
I've noticed it is possible for files in robots.txt to be indexed.

Google has suggested this can happen, especially when there are links to the pages.

Meta robots no index is better, but not easy for specific pages.

I can confirm this. Lost of ton of traffic during the last Panda refresh because of it. No-index > robots.txt.
 
@Mike Can you please tell me exactly (which line) and what I have to put in those templates in order to have attachements to rel="nofollow".
Sorry its my first month with XF , so I need some step by step if you do not mind.
Thanks in advance Mike.
 
You'd need to add that attribute to every <a> tag with an href that starts with "{xen:link attachments..." in those templates. There are a number based on different paths being true.
 
Hi @Mike

Thanks, but should´t that be a rule for rel="nofollow" on links?

From what I read online, <meta name="robots" content="noindex, nofollow" /> should be put somewhere in the html instead of the href?

From Google´s guidelines "You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code."

Hope you can help me out, I am really having troubles with my attachmenets being indexed. It results in more attachments shown in google then actual threads from my forum, as its a new forum.

Thank you.
 
Thanks, but should´t that be a rule for rel="nofollow" on links?
I don't understand what you mean. You asked how to nofollow attachment links.

It's still a moot point as noted earlier because you don't seem to allow attachments to be viewed by guests so Google won't index the no permission page (as it's sent with a 403 input). Or you can just block them via robots.txt as you've mentioned.

From Google´s guidelines "You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code."
Attachments don't have an HTML page so this isn't applicable.

It results in more attachments shown in google then actual threads from my forum, as its a new forum.
If you're just doing searches based on the site, you're really not seeing results that are relevant to how a user would actually search. You're probably seeing plenty of results that will effectively never come up in real usage.
 
Hi @Mike

I asked how I can noindex the attachments from the search result. As you can see in my opening post, there are so many attachments indexed. I have blocked them by robots.txt, but no result.

In WordPress I used to have a option to put rel="noindex" meta to a attachment, but I guess here its not possible for some reason? Why would we want to have attachments indexed anyway?

Can you guys somehow in future implement this to be a noindex thing? From SEO point of view it looks like a junk in search result. Low quality, thin page.
 
You can only put a noindex entry in a meta tag if it's an HTML page. Attachments aren't HTML pages so there's no meta tag to add. (It seems that you can send an HTTP header, though it's not something I've seen before.)

Robots.txt doesn't strictly prevent it from indexing the record of the page -- it just won't index the content. Since your attachments aren't viewable by guests, the page won't be indexed (without the robots.txt in place) because it's just an error page. So if you're really concerned about the entry of a page being there, you may wish to allow it to be accessed via robots.txt and simply let it fall out of the index or explicitly request URL removals for it.
 
@Mike thank you, that makes a lot of sense now.

Can you tell me if in my robots.txt I am blocking something that prevents engines from crawling and noindexing my forum lists.

Example of list with noindex meta.

Example of my robots.txt

Here is how it looks in Google, like thin content.

What am I doing wrong?

Additional info.

I am also seeing. All of those blocked by robots.txt by the way.
Appreciate your help.

Also maybe it would be a good idea for XF to have its own noindex nofollow resource or something? @Brogan what do you think?
 
I can confirm this. Robots is near useless. I setup my robots.txt prior to migrating to XF so there was no opportunity for google to sneak in. Looking now I have 91k member profiles indexed in google. Unbelievable.
 
And yet nobody from Support team seams to care to explain to us how to put rel="noindex" on those useless pages that gets indexed.
Edit : sorry I meant
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> not rel=nofollow, my bad.
 
Last edited:
Top Bottom