• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

XF 1.5 Google Crawl error 403 on disallowed pages

snoopy5

Well-known member
#1
Hi,

I switched to https, made a new sitemap and Google starts now to index my site. But I get now on a few pages a Google crawl error 403. The surprising thing is, that some of them are pages wich are disallowed in my robots.text, like the member pages:

I have this in my robots.txt
------------------------------
User-agent: *
Disallow: /account/
Disallow: /admin.php
Disallow: /ajax/
Disallow: /attachments/
Disallow: /conversations/
Disallow: /find-new/
Disallow: /goto/
Disallow: /help/
Disallow: /login/
Disallow: /lost-password/
Disallow: /members/
Disallow: /mobiquo/
Disallow: /online/
Disallow: /posts/
Disallow: /recent-activity/
Disallow: /register/
Disallow: /search/
Disallow: /find-new/


Sitemap: https://www.mydomain.com/sitemap.php

-------------------------------

So why then I do get an error for this link:

https://www.mydomain.com/index.php?members/username.12345/

Any idea?
 

Mike

XenForo developer
Staff member
#2
As you don't appear to be using friendly URLs, your various disallow rules won't match the URLs being generated, so they'll effectively be ignored. I presume you want friendly URLs to be enabled, so that may be the most straightforward option to change.
 

snoopy5

Well-known member
#3
As you don't appear to be using friendly URLs, your various disallow rules won't match the URLs being generated, so they'll effectively be ignored. I presume you want friendly URLs to be enabled, so that may be the most straightforward option to change.
I do use friendly URLs. See screenhsot of the ACP settings:

xf_seo_settings.png
 

Mike

XenForo developer
Staff member
#4
You gave a link that wasn't using that, so robots.txt won't apply to that. I can't really comment on whether that necessarily came from as we wouldn't generate that link while friendly URLs are enabled.

We do canonicalize the URL to the "correct" one, though that only happens when we know that the page is viewable. If you don't have profiles exposed to guests (or users use privacy to hide their profiles), returning a 403 on a request would be the correct behavior.