XF 1.5 Migrated from vbulletin to Xenforo - robots.txt question

LaxmiSathy

Member
Hello,

Recently I migrated my vbulletin board to Xenforo.
vbulletin is in the path - /public_html/forums
and Xenforo installation is in the path - public_html/community.

Now I have clarifications regarding robots.txt:

1) Should I include
Disallow: /forums/

to tell Google not to crawl the /forums pages?

2)I have included
Disallow: /community/members/

for all the member pages, but am getting "Access Denied" crawl error for the member pages.
I have submitted that robots.txt is updated and also did "Fetch as Google" but still shows 300,000+ member pages as access denied crawl error
Should I do things additionally as well?

3)Also earlier I have vbulletin blogs and there were few blog pages like
/forums/blogs/anitap
which no longer exists now as I have imported all of the blog content into Xenforo threads. Should I include this url also to disallow from crawl in robots.txt?
 
1) A robots.txt file probably is not appropriate for this. Normally you would setup redirects for the old vB URLs.

2) I am not sure this is a problem. "Access Denied" might be the expected response to your robots.txt file which tells the robot not to visit that page. Or if that response is not expected then perhaps the record just needs time to update.

3) I guess you could. Most people prefer to setup specific redirects for threads, posts, etc and then use a catchall for all other URLs. That way anyone visiting the old forum is redirected to the new forum.
 
@Jake Bunce @Alfa1

Thanks for the responses.
#2) But am getting lots of crawl errors - "Access Denied" for the member profile pages. crawl_error_accessDenied.webp


The error detail for the member page - community/members/rith.278948 shows as below:
crawl_error_accessDenied1.webp

So how do I fix this 300,000+ member pages that is showing as access denied in the Google webmaster console?

#3)Yes I have set up a catch all url redirect so this url - /forums/blogs/anitap/

Hence this url -http://www.indusladies.com/forums/blogs/anitap/
is redirected to - http://www.indusladies.com/community/

crawl_error_soft404.webp
But still in webmaster console it shows 249 soft 404 errors and most of it have the links as - /forums/blogs/<blogger's user name>
How to fix this error in webmaster console?
 
2) Your members page is denied to guests so XF returns a 403. This is normal. It is not a problem to be fixed. 403 is appropriate when a user is not allowed to view a page.

I am not sure what a robots.txt disallow will show as in that report. Presumably it hasn't taken effect yet.

3) The redirect appears to be working so you shouldn't get a 404 on that page in the future.
 
Jake for the soft 404 error happening with the urls - /forums/blogs/
Should I do a "Mark as Fixed" as below :

crawl_error_soft404a.webp OR should I do a "Fetch as Google" and submit to index as below:crawl_error_soft404b.webp
 
@Jake Bunce @Alfa1
Additionally did the below setting in admincp
XML Sitemap Generator > included sitemap content > users - unchecked
so that community/members url do not get submitted for google crawl.
But still am seeing these urls in Google Search Console > Crawl error > Access denied

crawlError_AccessDenied.webp


How to fix the Access denied error for these urls.
 
The sitemap just gives an indication of things for Google to index. It still indexes things outside of it, which would include your members. The display here is informational, to let you know that it ran into these errors in case you weren't expecting it. As I assume your members profiles aren't public, that would be expected.
 
@Mike
I understand that access denied error is expected since the member profile pages aren't public in my board. But is there any specific action that need to be done at my end so that these errors do not show up in my Google Search Console?
 
Robots.txt would be the main way to prevent Google from attempting to crawl the pages. Looking at your robots.txt file, I don't see anything attempting to block /community/members/.
 
Back
Top Bottom