1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

XF 1.2 199 Bots - Most of Them Baidu

Discussion in 'Troubleshooting and Problems' started by DRE, Jul 28, 2013.

  1. DRE

    DRE Well-Known Member

    How do I block Baidu bots?
  2. MattW

    MattW Well-Known Member

    Add this into your robots.txt file

    User-agent: Baiduspider
    Disallow: /
    I'm not sure if they are a "nice" spider and follow the rules in robots.txt though. If they aren't, you'll need to block their IP addresses
  3. DRE

    DRE Well-Known Member

    Last edited: Jul 28, 2013
  4. DRE

    DRE Well-Known Member

    Using this now

    RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]
    RewriteEngine on
    RewriteCond %{HTTP_REFERER} !^http://www.8thos.com/.*$         [NC]
    RewriteCond %{HTTP_REFERER} !^http://www.8thos.com$    [NC]
    RewriteCond %{HTTP_REFERER} !^http://8thos.com/.*$     [NC]
    RewriteCond %{HTTP_REFERER} !^http://8thos.com$        [NC]
    RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
    SetEnvIfNoCase User-Agent "^baiduspider" bad_bot
    <limit get="" post="">
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
  5. DRE

    DRE Well-Known Member

    I'm closing my site from guests until this is fixed.
  6. MattW

    MattW Well-Known Member

  7. whynot

    whynot Well-Known Member

    Just ban their whole range in your control panel.(cPanel?)
  8. DRE

    DRE Well-Known Member

    I don't have cpanel. I'll figure this stuff out later. For now I'm justing blocking guest and registration because all this bot traffic slowed down my site. Thanks for the link @MattW I'll check it out when I'm fully awake.
  9. whynot

    whynot Well-Known Member

    Little harder, you can ban them in your XenForo ACP:
    Users > Banned IPs
  10. DRE

    DRE Well-Known Member

    Man I don't even care anymore. I'm just gonna keep the site private for now on.
  11. JulianD

    JulianD Well-Known Member

    There's nothing to be fixed here. If you don't care about search engine traffic then keep your site private. Most of us do care about google and other search engines and we are willing to help you if you want it. Maybe you should ask why your site is not able to cope with Baidu spiders.
  12. DRE

    DRE Well-Known Member

    Baidu is the least of drama associated with having a public forum. It was the icing on the cake.
  13. Rudy

    Rudy Well-Known Member

    The sites are fine. It is Baidu's bots that are abusive. They bombard a forum with enough requests to nearly qualify them as performing a denial of service attack on servers. They did that to our vB forum while we were still on our old server--they were hitting us with the equivalent of 150 (!) users making requests to the forum, one after the other.

    I have had them blocked in our firewall. Only thing is, now they have started using a server in the EU to bypass all of the blocks on Chinese traffic. If that isn't deceitful, I don't know what is. They are also known to ignore robots.txt, which is also highly dishonest (any legitimate search engine spider would honor it).

    I had to block this range in our firewall just recently:

    I have already had this one blocked for a couple of years now:

    If anyone else has other ranges we can block, I'd love to hear about them. ;)

    The big question should be, however, why does a Chinese search engine need to hit so much of our data, so hard and so fast, for a market we don't even serve, or even want to serve?

    Baidu is bad, plain and simple.

    EDIT: I also forgot that I have this range blocked also:

    They have this block registered under the name CHINANET. Yet if you look up an IP address like you can see it is once again Baidubot.
    Last edited: Oct 7, 2013
    0xym0r0n and DRE like this.
  14. MattW

    MattW Well-Known Member

    IP Address123.125.71.52
    [Querying whois.apnic.net]
    % [whois.apnic.net]
    % Whois data copyright terms    http://www.apnic.net/db/dbcopyright.html
    % Information related to ' -'
    inetnum: -
    Rudy likes this.
  15. DRE

    DRE Well-Known Member

    Thanks for confirming this. Tried explaining this to @Slavik but he couldn't help which is why I decided to get a managed vps. I haven't done anything differently so I have no idea why they aren't attacking now. I'm just glad they aren't. @MattW did you add something to keep the Baidu bots out?
  16. Rudy

    Rudy Well-Known Member

    I knew about this two years ago, and it's not exactly anything new. The block is the one that really slammed our sites hard.

    I looked up "baidu" on APNIC and am compiling whichever netblocks I find there. I've found a few but I don't know if they are engaged in any 'bot activity or not.

    Sweet. Thanks! One more to add. :D

    I edited mine for another netblock I found.
  17. DRE

    DRE Well-Known Member

    This drama had me looking through my site to see if I or another user said something to upset China. :LOL:
  18. Rudy

    Rudy Well-Known Member

    I have fully managed servers but they basically take a hands-off approach. But if I need something blocked, I can either request them to add it (no charge), or I can add it myself. It could be that some hosts may already block Baidu, either by IP address or maybe just by detecting flooding.

    Right now I have 16 bots online, 10 of which are Bing. (Good luck with that search engine, Bing. [​IMG]) I also see one from facebook. I detest facebook's data-mining practices but since it is building traffic to the site, let 'em at it. I also see something called Brandwatch, and Proximic (which spiders for advertising metrics).
    DRE likes this.
  19. Rudy

    Rudy Well-Known Member


    No need to worry. :D It was just the way Baidu's bots were (mis)behaving. I don't mind if we have a dozen different IPs from one company visiting the forum at a time. But when I was tracking our live access log in Apache and seeing line after line of 180.76.*.* packed in among normal forum requests (page requests, image requests, etc.), you could easily tell that between their various IPs, they were hitting a page at least once or twice per second. And if you recall how much of a load vB 3.x put on a server when building/loading/displaying a thread, it could easily bog a server down.

    I told my fellow staffers what had happened, and their general attitude (which I agreed with) was that we really had no reason to be giving anything to Baidu. My question was: why so much, and so fast? They were unrelentless! Seeing their IPs in the live Apache log certainly looked like a DOS attack to me...
  20. craigiri

    craigiri Well-Known Member

    You can say that again!

    There was something in the air this Sept after the Harvest Moon. We had one of our episodes where the inmates told me THEY run the jail......

    Human nature.....what a PITA.

Share This Page