• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

XF 1.1 Bots, Robots.txt, and View Counts...

V

vVv

Guest
#1
I'm getting fed up with the bots on my site, they're viewing XFR useralbums and I'm believing they're racking up the album image view counts alot. I'm not sure exactly how to do up the Robots.txt file for that specific resource, but here below is what I have so far in Robots.txt.

I've already added it to site, and just seen a googlebot viewing album image. >_<. I'm resetting all useralbum views to Zero via phpMyAdmin. Does the robots.txt file look correct, or am I missing something?

Code:
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /misc
Disallow: /help
Disallow: /search
Disallow: /members
Disallow: /register
Disallow: /online
Disallow: /lost-password
Disallow: /internal_data/
Disallow: /js/
Disallow: /library/
Disallow: /styles/
Disallow: /useralbums/
Disallow: /useralbums/create
Disallow: /useralbums/own
Disallow: /useralbums/image-comments
Disallow: /useralbums/view-image
Disallow: /members/useralbums
Disallow: /members/#useralbums
Disallow: /admin.php
Disallow: /admindav.php
 
Allow: /
 
V

vVv

Guest
#2
updated it to this:

Code:
User-Agent: Googlebot
Disallow: /useralbums/
Disallow: /useralbums/create
Disallow: /useralbums/own
Disallow: /useralbums/image-comments
Disallow: /useralbums/view-image
Disallow: /members/useralbums
Disallow: /members/#useralbums
 
#Baiduspider
User-agent: Baiduspider
Disallow: /
Disallow: /useralbums/
Disallow: /useralbums/create
Disallow: /useralbums/own
Disallow: /useralbums/image-comments
Disallow: /useralbums/view-image
Disallow: /members/useralbums
Disallow: /members/#useralbums
 
User-agent: *
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /misc
Disallow: /help
Disallow: /search
Disallow: /members
Disallow: /register
Disallow: /online
Disallow: /lost-password
Disallow: /internal_data/
Disallow: /js/
Disallow: /library/
Disallow: /styles/
Disallow: /useralbums/
Disallow: /useralbums/create
Disallow: /useralbums/own
Disallow: /useralbums/image-comments
Disallow: /useralbums/view-image
Disallow: /members/useralbums
Disallow: /members/#useralbums
Disallow: /admin.php
Disallow: /admindav.php
 
Allow: /
and they're STILL going through and viewing album images... wtf?!?

Code:
66.249.75.117 crawl-66-249-75-117.googlebot.com
No one knows how to do this? I find that funny lol.. It's funny cause all the members in all these threads so far, just magically went Robots.txt dumb since their posts in them threads apparently and dunno what to offer for tips / help for me lol. Oh right, I'm not "clique buddy/elder member/buddy leg humper" that's why lol.

http://xenforo.com/community/threads/robots-txt.16735/
http://xenforo.com/community/threads/adding-noindex-to-some-pages.39513/
http://xenforo.com/community/threads/google-bot-searching-search-at-my-private-forum.36705/
http://xenforo.com/community/threads/sitemap-for-xenforo.26785/page-16#post-417202
http://xenforo.com/community/threads/robots-txt.16735/page-2#post-414196
http://xenforo.com/community/threads/should-i-disallow-community-attachments-in-robots-txt.36089
http://xenforo.com/community/threads/robots-txt.35374/
 

MagnusB

Well-known member
#3
You only need to disallow the highest level folder, for examble Disallow: /useralbums/ will disallow everything in that route (e.g. useralbums/create would be blocked).

Also, bots don't update their index list live, at least not all, so give it some time. Also, if you really want to and don't care for Russian or Chinese users, ban Yandex and Baidu, if I am not mistaken, those two take lightly to robots.txt and are very aggressive indexers.
 
V

vVv

Guest
#4
You only need to disallow the highest level folder, for examble Disallow: /useralbums/ will disallow everything in that route (e.g. useralbums/create would be blocked).

Also, bots don't update their index list live, at least not all, so give it some time. Also, if you really want to and don't care for Russian or Chinese users, ban Yandex and Baidu, if I am not mistaken, those two take lightly to robots.txt and are very aggressive indexers.
Thanks brah.. :) I'll make those adjustments then. :) But yeah, even with what I have now.. they're still indexing and etc. Does it take a while to "tame" the bots? Lol! This one is a bytch too: Baidu I keep seeing that one on my site as well. I seen some where else online about that.. and people were suggesting .htaccess edits and such. Blocking IPS and such. I dunno.
 

MagnusB

Well-known member
#5
Only way to kill Baidu is to ban it, they are not respecting robots.txt, or at least that is the theory. Baidu is the biggest search engine in China though, so if you are depending on traffic from China you are kinda depending on Baidu as well. Only way to kill it is to ban the IP range, not sure about what it is, but a Google should give it to you.

robots.txt updates should be detected within a day or so, but IIRC Google say it might take a while before the crawler takes to your updated robots.txt and even longer before their index is updated. The big ones (Bing and Google) do respect robots.txt, and Google even offers a test tool in their Webmasters Tools, where you can test URLs to see if they are allowed or blocked.
 
V

vVv

Guest
#6
Only way to kill Baidu is to ban it, they are not respecting robots.txt, or at least that is the theory. Baidu is the biggest search engine in China though, so if you are depending on traffic from China you are kinda depending on Baidu as well. Only way to kill it is to ban the IP range, not sure about what it is, but a Google should give it to you.

robots.txt updates should be detected within a day or so, but IIRC Google say it might take a while before the crawler takes to your updated robots.txt and even longer before their index is updated. The big ones (Bing and Google) do respect robots.txt, and Google even offers a test tool in their Webmasters Tools, where you can test URLs to see if they are allowed or blocked.
Baidu: http://www.simplemachines.org/community/index.php?topic=350439.0

My site doesn't depend on the China region per se, but I'm not really frowning on Chinese and such visiting and joining my site either. I guess I'm mainly just concerned about the Bots racking up my album's view counts. I don't mind bots / users from any where in the world coming to my site, finding my site, joining my site. Everyone is welcome. The only thing I don't welcome is the bots raising view counts on images in albums sigh. I want the views to be actual human visitors or humans that join site and view the images in the albums lol. I'll make some of those edits as you said above there, top level folder wise. :)
 

MagnusB

Well-known member
#7
I wouldn't worry about view counts, they are inflated anyway, people refresh, have multiple views etc etc. It is a ego boost metric, more than anything. Also, every entry should gain about the same view count from bots (or relatively the same), so it should still be a valid metric (to some degree) for popularity.
 
V

vVv

Guest
#8
I wouldn't worry about view counts, they are inflated anyway, people refresh, have multiple views etc etc. It is a ego boost metric, more than anything. Also, every entry should gain about the same view count from bots (or relatively the same), so it should still be a valid metric (to some degree) for popularity.
Yeah, you're probably right about that though. lol. Some members are more after "views" in general, or "responses" and etc etc. Like you said, ego boost. "Yay, I got more views to ALL my images, than John got!" lol. :ROFLMAO: Hopefully none of them freak out when they come back to site and see Zero views lmao. I've made those adjustments in Robots.txt. I might add a few more from this thread: http://xenforo.com/community/threads/robots-txt.16735/ Then just let it go.. :)
 
V

vVv

Guest
#9
current one now.. the "vVv keep the f out bot bytches robots.txt deluxe special" lmao..

Code:
User-Agent: Googlebot
Disallow: /ajax/
Disallow: /conversations/
Disallow: /events/birthdays/
Disallow: /events/monthly
Disallow: /events/weekly
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /misc
Disallow: /help
Disallow: /search
Disallow: /members
Disallow: /register
Disallow: /online
Disallow: /lost-password
Disallow: /internal_data/
Disallow: /data/
Disallow: /js/
Disallow: /library/
Disallow: /styles/
Disallow: /useralbums/
Disallow: /members/useralbums/
Disallow: /online/
Disallow: /media/category/
Disallow: /media/keyword/
Disallow: /media/user/
Disallow: /media/service/
Disallow: /media/submit/
Disallow: /misc/style?*
Disallow: /admin.php
Disallow: /admindav.php
 
#Baiduspider
User-agent: Baiduspider
Disallow: /
Disallow: /ajax/
Disallow: /conversations/
Disallow: /events/birthdays/
Disallow: /events/monthly
Disallow: /events/weekly
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /misc
Disallow: /help
Disallow: /search
Disallow: /members
Disallow: /register
Disallow: /online
Disallow: /lost-password
Disallow: /internal_data/
Disallow: /data/
Disallow: /js/
Disallow: /library/
Disallow: /styles/
Disallow: /useralbums/
Disallow: /members/useralbums/
Disallow: /online/
Disallow: /media/category/
Disallow: /media/keyword/
Disallow: /media/user/
Disallow: /media/service/
Disallow: /media/submit/
Disallow: /misc/style?*
Disallow: /admin.php
Disallow: /admindav.php
 
 
User-agent: *
Disallow: /ajax/
Disallow: /conversations/
Disallow: /events/birthdays/
Disallow: /events/monthly
Disallow: /events/weekly
Disallow: /find-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /misc
Disallow: /help
Disallow: /search
Disallow: /members
Disallow: /register
Disallow: /online
Disallow: /lost-password
Disallow: /internal_data/
Disallow: /data/
Disallow: /js/
Disallow: /library/
Disallow: /styles/
Disallow: /useralbums/
Disallow: /members/useralbums/
Disallow: /online/
Disallow: /media/category/
Disallow: /media/keyword/
Disallow: /media/user/
Disallow: /media/service/
Disallow: /media/submit/
Disallow: /misc/style?*
Disallow: /admin.php
Disallow: /admindav.php
 
Allow: /