XF 1.4 Preventing Search Engine from crawling members list

CommanderMadi · Feb 6, 2015

On my board, I don't give the guest a privilege to view users profile. As robots are techincally guests, I found 25 URL crawl errors in google webmaster, all related to denial of access to members list and users profiles. What should I do to prevent google to crawl here? Does it require a robot.txt?

djawir · Feb 6, 2015

yes using robots.txt

Disallow: /members/

CommanderMadi · Feb 6, 2015

What user-agent?

melbo · Feb 6, 2015

Use a *

See example:
https://xenforo.com/robots.txt

Code:

User-agent: *
Disallow: /community/find-new/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/admin.php
Allow: /

Sitemap: https://xenforo.com/community/sitemap.php

CommanderMadi · Feb 6, 2015

melbo said:

Use a *

See example:
https://xenforo.com/robots.txt

Code:

User-agent: *
Disallow: /community/find-new/
Disallow: /community/account/
Disallow: /community/attachments/
Disallow: /community/goto/
Disallow: /community/posts/
Disallow: /community/login/
Disallow: /community/admin.php
Allow: /

Sitemap: https://xenforo.com/community/sitemap.php

Thanks man.

creativeforge · Jul 11, 2015

Sorry for awakening a sleeping thread, but where do you enter these values?

Andrej · Jul 11, 2015

creativeforge said:
Sorry for awakening a sleeping thread, but where do you enter these values?

The Web Robots Pages

creativeforge · Jul 11, 2015

In a nutshell
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.

the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.

So... it doesn't sound too strong a deterrent to keep robots from crawling 'members lists.' Is this what most people do here? What other option is there?

XF 1.4 Preventing Search Engine from crawling members list

CommanderMadi

Member

djawir

Member

CommanderMadi

Member

melbo

Well-known member

CommanderMadi

Member

creativeforge

Well-known member

Andrej

Well-known member

creativeforge

Well-known member

Similar threads

We value your privacy