• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

XF 1.5 Migrated from vbulletin to Xenforo - robots.txt question

#1
Hello,

Recently I migrated my vbulletin board to Xenforo.
vbulletin is in the path - /public_html/forums
and Xenforo installation is in the path - public_html/community.

Now I have clarifications regarding robots.txt:

1) Should I include
Disallow: /forums/

to tell Google not to crawl the /forums pages?

2)I have included
Disallow: /community/members/

for all the member pages, but am getting "Access Denied" crawl error for the member pages.
I have submitted that robots.txt is updated and also did "Fetch as Google" but still shows 300,000+ member pages as access denied crawl error
Should I do things additionally as well?

3)Also earlier I have vbulletin blogs and there were few blog pages like
/forums/blogs/anitap
which no longer exists now as I have imported all of the blog content into Xenforo threads. Should I include this url also to disallow from crawl in robots.txt?
 

Jake Bunce

XenForo moderator
Staff member
#2
1) A robots.txt file probably is not appropriate for this. Normally you would setup redirects for the old vB URLs.

2) I am not sure this is a problem. "Access Denied" might be the expected response to your robots.txt file which tells the robot not to visit that page. Or if that response is not expected then perhaps the record just needs time to update.

3) I guess you could. Most people prefer to setup specific redirects for threads, posts, etc and then use a catchall for all other URLs. That way anyone visiting the old forum is redirected to the new forum.
 
#4
@Jake Bunce @Alfa1

Thanks for the responses.
#2) But am getting lots of crawl errors - "Access Denied" for the member profile pages. crawl_error_accessDenied.jpg


The error detail for the member page - community/members/rith.278948 shows as below:
crawl_error_accessDenied1.jpg

So how do I fix this 300,000+ member pages that is showing as access denied in the Google webmaster console?

#3)Yes I have set up a catch all url redirect so this url - /forums/blogs/anitap/

Hence this url -http://www.indusladies.com/forums/blogs/anitap/
is redirected to - http://www.indusladies.com/community/

crawl_error_soft404.jpg
But still in webmaster console it shows 249 soft 404 errors and most of it have the links as - /forums/blogs/<blogger's user name>
How to fix this error in webmaster console?
 

Jake Bunce

XenForo moderator
Staff member
#5
2) Your members page is denied to guests so XF returns a 403. This is normal. It is not a problem to be fixed. 403 is appropriate when a user is not allowed to view a page.

I am not sure what a robots.txt disallow will show as in that report. Presumably it hasn't taken effect yet.

3) The redirect appears to be working so you shouldn't get a 404 on that page in the future.
 
#7
Jake for the soft 404 error happening with the urls - /forums/blogs/
Should I do a "Mark as Fixed" as below :

crawl_error_soft404a.jpg OR should I do a "Fetch as Google" and submit to index as below: crawl_error_soft404b.jpg
 
#8
@Jake Bunce @Alfa1
Additionally did the below setting in admincp
XML Sitemap Generator > included sitemap content > users - unchecked
so that community/members url do not get submitted for google crawl.
But still am seeing these urls in Google Search Console > Crawl error > Access denied

crawlError_AccessDenied.jpg


How to fix the Access denied error for these urls.
 

Mike

XenForo developer
Staff member
#9
The sitemap just gives an indication of things for Google to index. It still indexes things outside of it, which would include your members. The display here is informational, to let you know that it ran into these errors in case you weren't expecting it. As I assume your members profiles aren't public, that would be expected.
 
#10
@Mike
I understand that access denied error is expected since the member profile pages aren't public in my board. But is there any specific action that need to be done at my end so that these errors do not show up in my Google Search Console?
 

Mike

XenForo developer
Staff member
#11
Robots.txt would be the main way to prevent Google from attempting to crawl the pages. Looking at your robots.txt file, I don't see anything attempting to block /community/members/.