XF 1.2 199 Bots - Most of Them Baidu

Add this into your robots.txt file

Code:
User-agent: Baiduspider
Disallow: /

I'm not sure if they are a "nice" spider and follow the rules in robots.txt though. If they aren't, you'll need to block their IP addresses
 
Last edited:
Using this now

Code:
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://www.8thos.com/.*$         [NC]
RewriteCond %{HTTP_REFERER} !^http://www.8thos.com$    [NC]
RewriteCond %{HTTP_REFERER} !^http://8thos.com/.*$     [NC]
RewriteCond %{HTTP_REFERER} !^http://8thos.com$        [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
SetEnvIfNoCase User-Agent "^baiduspider" bad_bot
<limit get="" post="">
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</limit>
 
I don't have cpanel. I'll figure this stuff out later. For now I'm justing blocking guest and registration because all this bot traffic slowed down my site. Thanks for the link @MattW I'll check it out when I'm fully awake.
 
Man I don't even care anymore. I'm just gonna keep the site private for now on.

There's nothing to be fixed here. If you don't care about search engine traffic then keep your site private. Most of us do care about google and other search engines and we are willing to help you if you want it. Maybe you should ask why your site is not able to cope with Baidu spiders.
 
Maybe you should ask why your site is not able to cope with Baidu spiders.

The sites are fine. It is Baidu's bots that are abusive. They bombard a forum with enough requests to nearly qualify them as performing a denial of service attack on servers. They did that to our vB forum while we were still on our old server--they were hitting us with the equivalent of 150 (!) users making requests to the forum, one after the other.

I have had them blocked in our firewall. Only thing is, now they have started using a server in the EU to bypass all of the blocks on Chinese traffic. If that isn't deceitful, I don't know what is. They are also known to ignore robots.txt, which is also highly dishonest (any legitimate search engine spider would honor it).

I had to block this range in our firewall just recently:

185.10.104.0/22

I have already had this one blocked for a couple of years now:

180.76.0.0/16

If anyone else has other ranges we can block, I'd love to hear about them. ;)

The big question should be, however, why does a Chinese search engine need to hit so much of our data, so hard and so fast, for a market we don't even serve, or even want to serve?

Baidu is bad, plain and simple.

EDIT: I also forgot that I have this range blocked also:

220.181.0.0/16

They have this block registered under the name CHINANET. Yet if you look up an IP address like 220.181.108.177 you can see it is once again Baidubot.
 
Last edited:
123.125.71.0/24

Code:
IP Address123.125.71.52
Hostbaiduspider-123-125-71-52.crawl.baidu.com
[Querying whois.apnic.net]
[whois.apnic.net]
% [whois.apnic.net]
% Whois data copyright terms    http://www.apnic.net/db/dbcopyright.html

% Information related to '123.125.71.0 - 123.125.71.255'

inetnum:        123.125.71.0 - 123.125.71.255
 
The sites are fine. It is Baidu's bots that are abusive. They bombard a forum with enough requests to nearly qualify them as performing a denial of service attack on servers.
Baidu is bad, plain and simple.
Thanks for confirming this. Tried explaining this to @Slavik but he couldn't help which is why I decided to get a managed vps. I haven't done anything differently so I have no idea why they aren't attacking now. I'm just glad they aren't. @MattW did you add something to keep the Baidu bots out?
 
I knew about this two years ago, and it's not exactly anything new. The 180.76.0.0/16 block is the one that really slammed our sites hard.

I looked up "baidu" on APNIC and am compiling whichever netblocks I find there. I've found a few but I don't know if they are engaged in any 'bot activity or not.

123.125.71.0/24

Sweet. Thanks! One more to add. :D

I edited mine for another netblock I found.
 
Thanks for confirming this. Tried explaining this to @Slavik but he couldn't help which is why I decided to get a managed vps. I haven't done anything differently so I have no idea why they aren't attacking now. I'm just glad they aren't. @MattW did you add something to keep the Baidu bots out?

I have fully managed servers but they basically take a hands-off approach. But if I need something blocked, I can either request them to add it (no charge), or I can add it myself. It could be that some hosts may already block Baidu, either by IP address or maybe just by detecting flooding.

Right now I have 16 bots online, 10 of which are Bing. (Good luck with that search engine, Bing.
biglaugh.gif
) I also see one from facebook. I detest facebook's data-mining practices but since it is building traffic to the site, let 'em at it. I also see something called Brandwatch, and Proximic (which spiders for advertising metrics).
 
  • Like
Reactions: DRE
This drama had me looking through my site to see if I or another user said something to upset China. :LOL:

:LOL:

No need to worry. :D It was just the way Baidu's bots were (mis)behaving. I don't mind if we have a dozen different IPs from one company visiting the forum at a time. But when I was tracking our live access log in Apache and seeing line after line of 180.76.*.* packed in among normal forum requests (page requests, image requests, etc.), you could easily tell that between their various IPs, they were hitting a page at least once or twice per second. And if you recall how much of a load vB 3.x put on a server when building/loading/displaying a thread, it could easily bog a server down.

I told my fellow staffers what had happened, and their general attitude (which I agreed with) was that we really had no reason to be giving anything to Baidu. My question was: why so much, and so fast? They were unrelentless! Seeing their IPs in the live Apache log certainly looked like a DOS attack to me...
 
Baidu is the least of drama associated with having a public forum. It was the icing on the cake.

You can say that again!

There was something in the air this Sept after the Harvest Moon. We had one of our episodes where the inmates told me THEY run the jail......

Human nature.....what a PITA.
 
Top Bottom