XF 2.2 No index

JoyFreak · Dec 18, 2021

If I don’t want my member pages to be indexed, what would be the best way to do it?

1. There’s an option to uncheck/disable ‘User’ content to be included in the sitemap, in options. Not sure if this has anything to do with it.

2. There’s also a NoIndex code wrapped by a conditional in the member_view. Removing the condition could do it.

3. Use the robots.txt instead.

Out of the 3 methods, what would be best practice?

AndyB · Dec 20, 2021

JoyFreak said:
I don’t want my member pages to be indexed, what would be the best way to do it?

Set View member lists and View member profiles both to no.

JoyFreak · Dec 20, 2021

Yes that does it but that also revokes access to guests viewing those pages which is not what I want. I just ended up adding it in my robots.txt.

Mr Lucky · Dec 20, 2021

JoyFreak said:
Yes that does it but that also revokes access to guests viewing those pages which is not what I want. I just ended up adding it in my robots.txt.

I don’t think Google bother about robots.txt

A note on unsupported rules in robots.txt | Google Search Central Blog | Google for Developers

developers.google.com

JoyFreak · Dec 20, 2021

Mr Lucky said:
I don’t think Google bother about robots.txt

A note on unsupported rules in robots.txt | Google Search Central Blog | Google for Developers

developers.google.com

From my understanding, you can still use disallow in robots.txt and that it’s still being read.

Mr Lucky · Dec 20, 2021

JoyFreak said:
From my understanding, you can still use disallow in robots.txt and that it’s still being read.

Ok maybe it is, but I think it doesn’t stop it from indexing disallowed pages if it sees links to those pages. So I prefer noindex in meta tag.

JoyFreak · Dec 21, 2021

Mr Lucky said:
Ok maybe it is, but I think it doesn’t stop it from indexing disallowed pages if it sees links to those pages. So I prefer noindex in meta tag.

This, right?

<xf:head option="metaNoindex"><meta name="robots" content="noindex" /></xf:head>

Do you have to add this to every page via editing every template or is there a way you can target everything under that URL for instance website.com/example/*

Like would adding that above code to "member_view" be sufficient and targets all tab pages on the profile page or will I need to add these into the other templates like the "member_about" etc.?

Mr Lucky · Dec 21, 2021

Aha, in that link I mentioned above, it seems like Google is going to try to do better regarding not indexing when there is a link to a disallow

Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won't be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.

JoyFreak · Dec 21, 2021

This is true. But how does one remove already indexed content? Like for example, my site has indexed all member profile pages and I've only recently added the disallow for /members/ in the robots.txt. Google will still try to attempt to recrawl because it's indexed already.

Mr Lucky · Dec 21, 2021

JoyFreak said:
This is true. But how does one remove already indexed content? Like for example, my site has indexed all member profile pages and I've only recently added the disallow for /members/ in the robots.txt. Google will still try to attempt to recrawl because it's indexed already.

Surely if disallowed and Google does actually do that these days even if there are links, then it will drop them.

Have you also investigated X-Robots-Tag?

Robots Meta Tags Specifications | Google Search Central | Documentation | Google for Developers

Learn how to add robots meta tags and read how page and text-level settings can be used to adjust how Google presents your content in search results.

developers.google.com

You may be able to block /members/ using a regular expression

Mr Lucky · Dec 21, 2021

JoyFreak said:
Like would adding that above code to "member_view" be sufficient and targets all tab pages on the profile page or will I need to add these into the other templates like the "member_about" etc.?

I don't think so but just wondering, this might work in PAGE_CONTAINER:

Code:

<xf:if is="$template == 'member_view'">
    <meta name="robots" content="noindex"></xf:if>

Mr Lucky · Jan 3, 2022

OK, since writing the above, I stmbled uopn this. I find it slightly confusing though:

Crawl Budget Management For Large Sites | Google Search Central | Documentation | Google for Developers

Learn what crawl budget is and how you can optimize Google's crawling of large and frequently updated websites.

developers.google.com

If you can't consolidate them as described in the first bullet, block these unimportant (for search) pages using robots.txt or the URL Parameters tool (for duplicate content reached by URL parameters).Don't use noindex, as Google will still request, but then drop the page when it sees the noindex tag, wasting crawling time. Don't use robots.txt to temporarily reallocate crawl budget for other pages; use robots.txt to block pages or resources that you don't want Google to crawl at all. Google won't shift this newly available crawl budget to other pages unless Google is already hitting your site's serving limit.

Mr Lucky · Sep 26, 2023

JoyFreak said:
From my understanding, you can still use disallow in robots.txt and that it’s still being read.

Mr Lucky said:
Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won't be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.

The above is from here

Bit of a mixed message though:

Block Search Indexing with noindex | Google Search Central | Documentation | Google for Developers

A noindex tag can block Google from indexing a page so that it won't appear in Search results. Learn how to implement noindex tags with this guide.

developers.google.com

Important: For the noindex rule to be effective, the page or resource must not be blocked by a robots.txt file, and it has to be otherwise accessible to the crawler. If the page is blocked by a robots.txt file or the crawler can't access the page, the crawler will never see the noindex rule, and the page can still appear in search results, for example if other pages link to it.

So what they are also saying is if they can't crawl the page due to a disallow in robots.txt, they can't know it has a noindex meta tag. And if that is true, it may mean that @AndyB 's tip above in post #2 also means that if the Google cannot see the page, but knows it is there due to a link to it, they may still index it.

See here: (that report is crawled with both robots.txt Disallow: /members/ as well as denied permission to guess.

This makes me think the ideal answer is to not block but to have no index meta tag (and make sure NOT in sitemap)

So I have removed the disallow in robots text. Just wondering now about giving guest permission back and conditionally noindexing with meta tag

However I'm confused by the conditional in member_view that @JoyFreak mentions above

Code:

<xf:if is="!$user.isSearchEngineIndexable()">
    <xf:head option="metaNoindex"><meta name="robots" content="noindex" /></xf:head>
</xf:if>

I took the conditional to apply to users are excluded from sitemap and/or denied permissions, but that doesn't seem to work - if I do exclude users via sitemap and/or permissions, there is still not a noindex meta tag.

So what does this actually mean and what is the condition that will show it?

XF 2.2 No index

JoyFreak

Well-known member

AndyB

Well-known member

JoyFreak

Well-known member

Mr Lucky

Well-known member

A note on unsupported rules in robots.txt | Google Search Central Blog | Google for Developers

JoyFreak

Well-known member

A note on unsupported rules in robots.txt | Google Search Central Blog | Google for Developers

Mr Lucky

Well-known member

JoyFreak

Well-known member

Mr Lucky

Well-known member

JoyFreak

Well-known member

Mr Lucky

Well-known member

Robots Meta Tags Specifications | Google Search Central | Documentation | Google for Developers

Mr Lucky

Well-known member

Mr Lucky

Well-known member

Crawl Budget Management For Large Sites | Google Search Central | Documentation | Google for Developers

Mr Lucky

Well-known member

Block Search Indexing with noindex | Google Search Central | Documentation | Google for Developers

Similar threads

We value your privacy