1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.1 Bots, Robots.txt, and View Counts...

Discussion in 'XenForo Questions and Support' started by vVv, Nov 13, 2012.

  1. vVv

    vVv Guest

    I'm getting fed up with the bots on my site, they're viewing XFR useralbums and I'm believing they're racking up the album image view counts alot. I'm not sure exactly how to do up the Robots.txt file for that specific resource, but here below is what I have so far in Robots.txt.

    I've already added it to site, and just seen a googlebot viewing album image. >_<. I'm resetting all useralbum views to Zero via phpMyAdmin. Does the robots.txt file look correct, or am I missing something?

    Code:
    User-agent: *
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /attachments/
    Disallow: /goto/
    Disallow: /posts/
    Disallow: /login/
    Disallow: /misc
    Disallow: /help
    Disallow: /search
    Disallow: /members
    Disallow: /register
    Disallow: /online
    Disallow: /lost-password
    Disallow: /internal_data/
    Disallow: /js/
    Disallow: /library/
    Disallow: /styles/
    Disallow: /useralbums/
    Disallow: /useralbums/create
    Disallow: /useralbums/own
    Disallow: /useralbums/image-comments
    Disallow: /useralbums/view-image
    Disallow: /members/useralbums
    Disallow: /members/#useralbums
    Disallow: /admin.php
    Disallow: /admindav.php
     
    Allow: /
     
  2. vVv

    vVv Guest

    updated it to this:

    Code:
    User-Agent: Googlebot
    Disallow: /useralbums/
    Disallow: /useralbums/create
    Disallow: /useralbums/own
    Disallow: /useralbums/image-comments
    Disallow: /useralbums/view-image
    Disallow: /members/useralbums
    Disallow: /members/#useralbums
     
    #Baiduspider
    User-agent: Baiduspider
    Disallow: /
    Disallow: /useralbums/
    Disallow: /useralbums/create
    Disallow: /useralbums/own
    Disallow: /useralbums/image-comments
    Disallow: /useralbums/view-image
    Disallow: /members/useralbums
    Disallow: /members/#useralbums
     
    User-agent: *
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /attachments/
    Disallow: /goto/
    Disallow: /posts/
    Disallow: /login/
    Disallow: /misc
    Disallow: /help
    Disallow: /search
    Disallow: /members
    Disallow: /register
    Disallow: /online
    Disallow: /lost-password
    Disallow: /internal_data/
    Disallow: /js/
    Disallow: /library/
    Disallow: /styles/
    Disallow: /useralbums/
    Disallow: /useralbums/create
    Disallow: /useralbums/own
    Disallow: /useralbums/image-comments
    Disallow: /useralbums/view-image
    Disallow: /members/useralbums
    Disallow: /members/#useralbums
    Disallow: /admin.php
    Disallow: /admindav.php
     
    Allow: /
    
    and they're STILL going through and viewing album images... wtf?!?

    Code:
    66.249.75.117 crawl-66-249-75-117.googlebot.com
    No one knows how to do this? I find that funny lol.. It's funny cause all the members in all these threads so far, just magically went Robots.txt dumb since their posts in them threads apparently and dunno what to offer for tips / help for me lol. Oh right, I'm not "clique buddy/elder member/buddy leg humper" that's why lol.

    http://xenforo.com/community/threads/robots-txt.16735/
    http://xenforo.com/community/threads/adding-noindex-to-some-pages.39513/
    http://xenforo.com/community/threads/google-bot-searching-search-at-my-private-forum.36705/
    http://xenforo.com/community/threads/sitemap-for-xenforo.26785/page-16#post-417202
    http://xenforo.com/community/threads/robots-txt.16735/page-2#post-414196
    http://xenforo.com/community/threads/should-i-disallow-community-attachments-in-robots-txt.36089
    http://xenforo.com/community/threads/robots-txt.35374/
     
  3. MagnusB

    MagnusB Well-Known Member

    You only need to disallow the highest level folder, for examble Disallow: /useralbums/ will disallow everything in that route (e.g. useralbums/create would be blocked).

    Also, bots don't update their index list live, at least not all, so give it some time. Also, if you really want to and don't care for Russian or Chinese users, ban Yandex and Baidu, if I am not mistaken, those two take lightly to robots.txt and are very aggressive indexers.
     
    Jake Bunce and vVv like this.
  4. vVv

    vVv Guest

    Thanks brah.. :) I'll make those adjustments then. :) But yeah, even with what I have now.. they're still indexing and etc. Does it take a while to "tame" the bots? Lol! This one is a bytch too: Baidu I keep seeing that one on my site as well. I seen some where else online about that.. and people were suggesting .htaccess edits and such. Blocking IPS and such. I dunno.
     
  5. MagnusB

    MagnusB Well-Known Member

    Only way to kill Baidu is to ban it, they are not respecting robots.txt, or at least that is the theory. Baidu is the biggest search engine in China though, so if you are depending on traffic from China you are kinda depending on Baidu as well. Only way to kill it is to ban the IP range, not sure about what it is, but a Google should give it to you.

    robots.txt updates should be detected within a day or so, but IIRC Google say it might take a while before the crawler takes to your updated robots.txt and even longer before their index is updated. The big ones (Bing and Google) do respect robots.txt, and Google even offers a test tool in their Webmasters Tools, where you can test URLs to see if they are allowed or blocked.
     
    Jake Bunce and vVv like this.
  6. vVv

    vVv Guest

    Baidu: http://www.simplemachines.org/community/index.php?topic=350439.0

    My site doesn't depend on the China region per se, but I'm not really frowning on Chinese and such visiting and joining my site either. I guess I'm mainly just concerned about the Bots racking up my album's view counts. I don't mind bots / users from any where in the world coming to my site, finding my site, joining my site. Everyone is welcome. The only thing I don't welcome is the bots raising view counts on images in albums sigh. I want the views to be actual human visitors or humans that join site and view the images in the albums lol. I'll make some of those edits as you said above there, top level folder wise. :)
     
  7. MagnusB

    MagnusB Well-Known Member

    I wouldn't worry about view counts, they are inflated anyway, people refresh, have multiple views etc etc. It is a ego boost metric, more than anything. Also, every entry should gain about the same view count from bots (or relatively the same), so it should still be a valid metric (to some degree) for popularity.
     
    Jake Bunce likes this.
  8. vVv

    vVv Guest

    Yeah, you're probably right about that though. lol. Some members are more after "views" in general, or "responses" and etc etc. Like you said, ego boost. "Yay, I got more views to ALL my images, than John got!" lol. :ROFLMAO: Hopefully none of them freak out when they come back to site and see Zero views lmao. I've made those adjustments in Robots.txt. I might add a few more from this thread: http://xenforo.com/community/threads/robots-txt.16735/ Then just let it go.. :)
     
  9. vVv

    vVv Guest

    current one now.. the "vVv keep the f out bot bytches robots.txt deluxe special" lmao..

    Code:
    User-Agent: Googlebot
    Disallow: /ajax/
    Disallow: /conversations/
    Disallow: /events/birthdays/
    Disallow: /events/monthly
    Disallow: /events/weekly
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /attachments/
    Disallow: /goto/
    Disallow: /posts/
    Disallow: /login/
    Disallow: /misc
    Disallow: /help
    Disallow: /search
    Disallow: /members
    Disallow: /register
    Disallow: /online
    Disallow: /lost-password
    Disallow: /internal_data/
    Disallow: /data/
    Disallow: /js/
    Disallow: /library/
    Disallow: /styles/
    Disallow: /useralbums/
    Disallow: /members/useralbums/
    Disallow: /online/
    Disallow: /media/category/
    Disallow: /media/keyword/
    Disallow: /media/user/
    Disallow: /media/service/
    Disallow: /media/submit/
    Disallow: /misc/style?*
    Disallow: /admin.php
    Disallow: /admindav.php
     
    #Baiduspider
    User-agent: Baiduspider
    Disallow: /
    Disallow: /ajax/
    Disallow: /conversations/
    Disallow: /events/birthdays/
    Disallow: /events/monthly
    Disallow: /events/weekly
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /attachments/
    Disallow: /goto/
    Disallow: /posts/
    Disallow: /login/
    Disallow: /misc
    Disallow: /help
    Disallow: /search
    Disallow: /members
    Disallow: /register
    Disallow: /online
    Disallow: /lost-password
    Disallow: /internal_data/
    Disallow: /data/
    Disallow: /js/
    Disallow: /library/
    Disallow: /styles/
    Disallow: /useralbums/
    Disallow: /members/useralbums/
    Disallow: /online/
    Disallow: /media/category/
    Disallow: /media/keyword/
    Disallow: /media/user/
    Disallow: /media/service/
    Disallow: /media/submit/
    Disallow: /misc/style?*
    Disallow: /admin.php
    Disallow: /admindav.php
     
     
    User-agent: *
    Disallow: /ajax/
    Disallow: /conversations/
    Disallow: /events/birthdays/
    Disallow: /events/monthly
    Disallow: /events/weekly
    Disallow: /find-new/
    Disallow: /account/
    Disallow: /attachments/
    Disallow: /goto/
    Disallow: /posts/
    Disallow: /login/
    Disallow: /misc
    Disallow: /help
    Disallow: /search
    Disallow: /members
    Disallow: /register
    Disallow: /online
    Disallow: /lost-password
    Disallow: /internal_data/
    Disallow: /data/
    Disallow: /js/
    Disallow: /library/
    Disallow: /styles/
    Disallow: /useralbums/
    Disallow: /members/useralbums/
    Disallow: /online/
    Disallow: /media/category/
    Disallow: /media/keyword/
    Disallow: /media/user/
    Disallow: /media/service/
    Disallow: /media/submit/
    Disallow: /misc/style?*
    Disallow: /admin.php
    Disallow: /admindav.php
     
    Allow: /
    
     

Share This Page