XF 1.2 199 Bots - Most of Them Baidu

There was something in the air this Sept after the Harvest Moon. We had one of our episodes where the inmates told me THEY run the jail......

At times like that, you WISH you had Baidu hammering on your server...just to keep the inmates from posting. :D

I must say for the big forum I admin, it runs fairly smoothly. We have a good staff. There are a couple of issues that crop up, but there are enough members out there who care for the community that they are either "self-policing" the forum or reporting anything that may seem wrong.

So many dynamics...
 
  • Like
Reactions: DRE
Thanks for confirming this. Tried explaining this to @Slavik but he couldn't help which is why I decided to get a managed vps. I haven't done anything differently so I have no idea why they aren't attacking now. I'm just glad they aren't. @MattW did you add something to keep the Baidu bots out?
Nope, but I can add those IP ranges to your firewall if you want
 
  • Like
Reactions: DRE
At times like that, you WISH you had Baidu hammering on your server...just to keep the inmates from posting. :D

I must say for the big forum I admin, it runs fairly smoothly. We have a good staff. There are a couple of issues that crop up, but there are enough members out there who care for the community that they are either "self-policing" the forum or reporting anything that may seem wrong.

So many dynamics...

Well, the good news is that XF sold another copy to the inmates....who are now establishing their "con-air" version of one of my sites. That's gonna work out well.....
 
Well, the good news is that XF sold another copy to the inmates....who are now establishing their "con-air" version of one of my sites. That's gonna work out well.....

Our big board had one or two "enemy factions" at one point. It amazed me how they could sit there and watch every activity on the forum for hours (even bad stuff we'd find and delete within minutes), and copy/paste it there for their derision. Really didn't get them anywhere. Made us all wonder if they ever left the confines of their mothers' basements. :D
 
I just sifted through some old email, and found that our first Baidu incident was 11/29/2011. At the time, I did count over 150 of the bots at the moment I checked it. Even more telling was this statistic (keep in mind we were on vB at the time): Most users ever online was 2,555, Today at 02:42 PM. Consider that our peak traffic at that time was about 700-750 members, that was indeed excessive. And abusive. This was around the time of that day in which I began getting emails from staff asking why the forum was not responding. The baidu-bot was also not accessing the archive version of the forum, but the full version.

This is why we continue blocking baidu-bot.
 
It'll be interesting if this works out well. I don't get to many of the Baidu Bots (or atleast last time i checked) I didn't crawling my site. Watching thread though on the chance they come out in force and nice find.
 
  • Like
Reactions: DRE
Good find!
If your using nginx the below works well also. Thing I like about nginx is the 444 error drops the connection then and there.
Code:
    location / {
## Deny certain User-Agents (case insensitive)
    ## The ~* makes it case insensitive as opposed to just a ~
    if ($http_user_agent ~* (Ahrefsbot|Morfeus|ZmEu|Baiduspider|Jullo|Yandex|Sogou) ) {
        return 444;
    }
 
  • Like
Reactions: DRE
If your using nginx the below works well also. Thing I like about nginx is the 444 error drops the connection then and there.

I have;
Code:
        if ($http_user_agent ~* (Ahrefsbot|Morfeus|BoardReader|BoardTracker|GigaBot|ZmEu|Baiduspider|Jullo|Yandex|Sogou|LWP::Simple|BBBike|wget|Purebot|Lipperhey|libwww-perl|Mail.Ru)) {
                return 444;
        }
 
  • Like
Reactions: DRE
That article has flawed advice IMHO and in my own experiences. Blocking by user agent is not fully effective: you are relying on the bots to always include their user agent string, which I have found is not always true. Once you know their IP addresses, though, it blocks them no matter what user agent string they will use to circumvent user agent blocking. If you feel there is a drain on resources, the first step should be to look at your live server access logs if you are able or if not, note the time and date and look at your nightly logs. The bots you can easily notice: many of the same or similar IP address blocks will be hitting your site at the same time. That is how I found Baidu originally.
 
Top Bottom