XF 2.2 No index

JoyFreak

Well-known member
If I don’t want my member pages to be indexed, what would be the best way to do it?

1. There’s an option to uncheck/disable ‘User’ content to be included in the sitemap, in options. Not sure if this has anything to do with it.

2. There’s also a NoIndex code wrapped by a conditional in the member_view. Removing the condition could do it.

3. Use the robots.txt instead.

Out of the 3 methods, what would be best practice?
 
Ok maybe it is, but I think it doesn’t stop it from indexing disallowed pages if it sees links to those pages. So I prefer noindex in meta tag.
This, right?
<xf:head option="metaNoindex"><meta name="robots" content="noindex" /></xf:head>
Do you have to add this to every page via editing every template or is there a way you can target everything under that URL for instance website.com/example/*

Like would adding that above code to "member_view" be sufficient and targets all tab pages on the profile page or will I need to add these into the other templates like the "member_about" etc.?
 
Last edited:
Aha, in that link I mentioned above, it seems like Google is going to try to do better regarding not indexing when there is a link to a disallow

  • Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won't be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.
 
This is true. But how does one remove already indexed content? Like for example, my site has indexed all member profile pages and I've only recently added the disallow for /members/ in the robots.txt. Google will still try to attempt to recrawl because it's indexed already.
 
This is true. But how does one remove already indexed content? Like for example, my site has indexed all member profile pages and I've only recently added the disallow for /members/ in the robots.txt. Google will still try to attempt to recrawl because it's indexed already.
Surely if disallowed and Google does actually do that these days even if there are links, then it will drop them.

Have you also investigated X-Robots-Tag?


You may be able to block /members/ using a regular expression
 
Like would adding that above code to "member_view" be sufficient and targets all tab pages on the profile page or will I need to add these into the other templates like the "member_about" etc.?

I don't think so but just wondering, this might work in PAGE_CONTAINER:

Code:
<xf:if is="$template == 'member_view'">
    <meta name="robots" content="noindex"></xf:if>
 
OK, since writing the above, I stmbled uopn this. I find it slightly confusing though:



If you can't consolidate them as described in the first bullet, block these unimportant (for search) pages using robots.txt or the URL Parameters tool (for duplicate content reached by URL parameters).Don't use noindex, as Google will still request, but then drop the page when it sees the noindex tag, wasting crawling time. Don't use robots.txt to temporarily reallocate crawl budget for other pages; use robots.txt to block pages or resources that you don't want Google to crawl at all. Google won't shift this newly available crawl budget to other pages unless Google is already hitting your site's serving limit.
 
From my understanding, you can still use disallow in robots.txt and that it’s still being read.

Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won't be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.

The above is from here

Bit of a mixed message though:


Important: For the noindex rule to be effective, the page or resource must not be blocked by a robots.txt file, and it has to be otherwise accessible to the crawler. If the page is blocked by a robots.txt file or the crawler can't access the page, the crawler will never see the noindex rule, and the page can still appear in search results, for example if other pages link to it.

So what they are also saying is if they can't crawl the page due to a disallow in robots.txt, they can't know it has a noindex meta tag. And if that is true, it may mean that @AndyB 's tip above in post #2 also means that if the Google cannot see the page, but knows it is there due to a link to it, they may still index it.

See here: (that report is crawled with both robots.txt Disallow: /members/ as well as denied permission to guess.


Screenshot 2023-09-26 at 12.29.39.png

This makes me think the ideal answer is to not block but to have no index meta tag (and make sure NOT in sitemap)

So I have removed the disallow in robots text. Just wondering now about giving guest permission back and conditionally noindexing with meta tag

However I'm confused by the conditional in member_view that @JoyFreak mentions above

Code:
<xf:if is="!$user.isSearchEngineIndexable()">
    <xf:head option="metaNoindex"><meta name="robots" content="noindex" /></xf:head>
</xf:if>

I took the conditional to apply to users are excluded from sitemap and/or denied permissions, but that doesn't seem to work - if I do exclude users via sitemap and/or permissions, there is still not a noindex meta tag.

So what does this actually mean and what is the condition that will show it?
 
Last edited:
Back
Top Bottom