XF 2.2 No index

JoyFreak

Well-known member
If I don’t want my member pages to be indexed, what would be the best way to do it?

1. There’s an option to uncheck/disable ‘User’ content to be included in the sitemap, in options. Not sure if this has anything to do with it.

2. There’s also a NoIndex code wrapped by a conditional in the member_view. Removing the condition could do it.

3. Use the robots.txt instead.

Out of the 3 methods, what would be best practice?
 

JoyFreak

Well-known member
Yes that does it but that also revokes access to guests viewing those pages which is not what I want. I just ended up adding it in my robots.txt.
 

JoyFreak

Well-known member
Ok maybe it is, but I think it doesn’t stop it from indexing disallowed pages if it sees links to those pages. So I prefer noindex in meta tag.
This, right?
<xf:head option="metaNoindex"><meta name="robots" content="noindex" /></xf:head>
Do you have to add this to every page via editing every template or is there a way you can target everything under that URL for instance website.com/example/*

Like would adding that above code to "member_view" be sufficient and targets all tab pages on the profile page or will I need to add these into the other templates like the "member_about" etc.?
 
Last edited:

Mr Lucky

Well-known member
Aha, in that link I mentioned above, it seems like Google is going to try to do better regarding not indexing when there is a link to a disallow

  • Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won't be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.
 

JoyFreak

Well-known member
This is true. But how does one remove already indexed content? Like for example, my site has indexed all member profile pages and I've only recently added the disallow for /members/ in the robots.txt. Google will still try to attempt to recrawl because it's indexed already.
 

Mr Lucky

Well-known member
This is true. But how does one remove already indexed content? Like for example, my site has indexed all member profile pages and I've only recently added the disallow for /members/ in the robots.txt. Google will still try to attempt to recrawl because it's indexed already.
Surely if disallowed and Google does actually do that these days even if there are links, then it will drop them.

Have you also investigated X-Robots-Tag?


You may be able to block /members/ using a regular expression
 

Mr Lucky

Well-known member
Like would adding that above code to "member_view" be sufficient and targets all tab pages on the profile page or will I need to add these into the other templates like the "member_about" etc.?

I don't think so but just wondering, this might work in PAGE_CONTAINER:

Code:
<xf:if is="$template == 'member_view'">
    <meta name="robots" content="noindex"></xf:if>
 

Mr Lucky

Well-known member
OK, since writing the above, I stmbled uopn this. I find it slightly confusing though:



If you can't consolidate them as described in the first bullet, block these unimportant (for search) pages using robots.txt or the URL Parameters tool (for duplicate content reached by URL parameters).Don't use noindex, as Google will still request, but then drop the page when it sees the noindex tag, wasting crawling time. Don't use robots.txt to temporarily reallocate crawl budget for other pages; use robots.txt to block pages or resources that you don't want Google to crawl at all. Google won't shift this newly available crawl budget to other pages unless Google is already hitting your site's serving limit.
 
Top