• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

XF 1.4 Preventing Search Engine from crawling members list

#1
On my board, I don't give the guest a privilege to view users profile. As robots are techincally guests, I found 25 URL crawl errors in google webmaster, all related to denial of access to members list and users profiles. What should I do to prevent google to crawl here? Does it require a robot.txt?
 

melbo

Well-known member
#4
Use a *

See example:
https://xenforo.com/robots.txt

Code:
User-agent: *
Disallow: /community/find-new/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/admin.php
Allow: /

Sitemap: https://xenforo.com/community/sitemap.php
 
#8
In a nutshell
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
So don't try to use /robots.txt to hide information.
So... it doesn't sound too strong a deterrent to keep robots from crawling 'members lists.' Is this what most people do here? What other option is there?
 
Last edited: