• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Baidu crawling my site like crazy

Luke B

Active member
#1
Hello all,

I noticed over the last couple of days a crazy amount of hits from Baidu. Its been staying consistent pretty much all day and not much content has been added at all.

Is this typical?
 

Dean

Well-known member
#2
My site is constantly being crawled by Baidu. I've no idea what they are indexing, or not indexing.

That seems to be the only spider that also constantly checks to see if my site is running without asking for content.
 

Luke B

Active member
#3
My site is constantly being crawled by Baidu. I've no idea what they are indexing, or not indexing.

That seems to be the only spider that also constantly checks to see if my site is running without asking for content.
Exactly what I'm experiencing. They do this all the time. Maybe they aren't as efficient as Google bot?
 

Dean

Well-known member
#4
Possibly someone with more experience could give input. I've just started looking at these types of issues..
 

SilverCircle

Well-known member
#6
Baidu is apparently running amok :) I started to notice it a couple of days ago.

Code:
<Directory /...path/to/..>
  Order allow,deny
  Allow from all
  Deny from 119.63.196.
</Directory>
And gone are the buggers...
 

Dean

Well-known member
#8
I firewalled most of the IPs at the server level, but wanted a few of them to continue to visit until I could understand what they are doing. Things like 'do they respect robots.txt? 'why are they doing status checks so often?'
 

Carlos

Well-known member
#9
Why would you want to block Baidu? Yes, really, I'm asking that question.

Mark Zuckerberg took a visit to the Baidu headquarters a short while ago. This tells me that Facebook is interested in acquiring Baidu down the line. This in turn tells me that Facebook's choice of search engines is Baidu... So, if you want more traffic to your site, Baidu's interested in seeing what you've got. And if successful, they'll direct more 'human' to your site, just like google before them.
 

SilverCircle

Well-known member
#10
Why would you want to block Baidu? Yes, really, I'm asking that question.
Because I can.
Mark Zuckerberg took a visit to the Baidu headquarters a short while ago. This tells me that Facebook is interested in acquiring Baidu down the line. This in turn tells me that Facebook's choice of search engines is Baidu... So, if you want more traffic to your site, Baidu's interested in seeing what you've got. And if successful, they'll direct more 'human' to your site, just like google before them.
And this has exactly what to do with the OP's request?
 

SilverCircle

Well-known member
#11
I firewalled most of the IPs at the server level, but wanted a few of them to continue to visit until I could understand what they are doing. Things like 'do they respect robots.txt? 'why are they doing status checks so often?'
Afaik, it does respect robots.txt, but it will still hit your site like crazy, generating lots of unneeded traffic. Blocking it before it even sees the page makes sense.
 

Carlos

Well-known member
#12
And this has exactly what to do with the OP's request?
...Why the question? It has nothing to do with the request, I'm just stating that it's not really a good idea to block them especially when the future looks better with baidu as your search "king." Because you're actually getting visitors FROM other countries - especially China.

China's business prospects are so busy right now, that if you're an online business, and you're dealing with worldwide community - once a visitor likes your site, so much that he/she tells her friends about it, your site - BOOM. And then starts bringing in more visitors like a wildfire.

Oh, and FYI: Baidu is 6th in alexia rankings, according to wikipedia.

But hey, it's not my site. I'm just offering an idea here.
 

iTuN3r

Well-known member
#13
...Why the question? It has nothing to do with the request, I'm just stating that it's not really a good idea to block them especially when the future looks better with baidu as your search "king." Because you're actually getting visitors FROM other countries - especially China.

China's business prospects are so busy right now, that if you're an online business, and you're dealing with worldwide community - once a visitor likes your site, so much that he/she tells her friends about it, your site - BOOM. And then starts bringing in more visitors like a wildfire.

Oh, and FYI: Baidu is 6th in alexia rankings, according to wikipedia.

But hey, it's not my site. I'm just offering an idea here.
What if your community is related about a small city within one of the states ....how will baidu help my forum ?
 

Kevin

Well-known member
#16
What if your community is related about a small city within one of the states ....how will baidu help my forum ?
I've been asking myself the same thing about an XF site (recently converted from vB) that I am running for my wife's church. How the heck are they even finding it let alone why they are indexing it puzzles me.
 

Dean

Well-known member
#17
I've been asking myself the same thing about an XF site (recently converted from vB) that I am running for my wife's church. How the heck are they even finding it let alone why they are indexing it puzzles me.
Have you gone through your cpanel access log?

Baidu is constantly trying to access the content of my site like this:
Code:
119.63.196.41 - - [24/Jul/2011:05:13:55 -0700] "GET /misc/style?style_id=4&redirect=%2Fmisc%2Fstyle%3Fredirect%3D%252Fthreads%random-thread-title252.16076%252Fpage-4 HTTP/1.1" 303 - "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
 

Kevin

Well-known member
#18
Have you gone through your cpanel access log?
I have and it looks like they are hitting every link off of the main URL page. Thinking out loud... they are likely picking up the URL from the church's Twitter account and are then doing a blind crawl of the site regardless of the sites locale or content.
 

EQnoble

Well-known member
#19
No I doubt that because they used to crawl my site like mad and I don't even think there is a single link from twitter linking to my site.

Baidu is apparently running amok :) I started to notice it a couple of days ago.

Code:
<Directory /...path/to/..>
  Order allow,deny
  Allow from all
  Deny from 119.63.196.
</Directory>
And gone are the buggers...
this is the first thing I do when an ip range pisses me off.
Code:
ip route add blackhole 119.63.196.0/24
 

Luke B

Active member
#20
I have and it looks like they are hitting every link off of the main URL page. Thinking out loud... they are likely picking up the URL from the church's Twitter account and are then doing a blind crawl of the site regardless of the sites locale or content.
I've actually thought about this exact thing. It could very well be a coincidence but it seems like every time I post something on Twitter, Baidu come a crawl'n. But then again, I didn't post anything yesterday and Baidu was all over SLRuser.

I'm not really concerned only curious as to why so much.