Exactly what I'm experiencing. They do this all the time. Maybe they aren't as efficient as Google bot?My site is constantly being crawled by Baidu. I've no idea what they are indexing, or not indexing.
That seems to be the only spider that also constantly checks to see if my site is running without asking for content.
<Directory /...path/to/..>
Order allow,deny
Allow from all
Deny from 119.63.196.
</Directory>
Because I can.Why would you want to block Baidu? Yes, really, I'm asking that question.
And this has exactly what to do with the OP's request?Mark Zuckerberg took a visit to the Baidu headquarters a short while ago. This tells me that Facebook is interested in acquiring Baidu down the line. This in turn tells me that Facebook's choice of search engines is Baidu... So, if you want more traffic to your site, Baidu's interested in seeing what you've got. And if successful, they'll direct more 'human' to your site, just like google before them.
Afaik, it does respect robots.txt, but it will still hit your site like crazy, generating lots of unneeded traffic. Blocking it before it even sees the page makes sense.I firewalled most of the IPs at the server level, but wanted a few of them to continue to visit until I could understand what they are doing. Things like 'do they respect robots.txt? 'why are they doing status checks so often?'
...Why the question? It has nothing to do with the request, I'm just stating that it's not really a good idea to block them especially when the future looks better with baidu as your search "king." Because you're actually getting visitors FROM other countries - especially China.And this has exactly what to do with the OP's request?
...Why the question? It has nothing to do with the request, I'm just stating that it's not really a good idea to block them especially when the future looks better with baidu as your search "king." Because you're actually getting visitors FROM other countries - especially China.
China's business prospects are so busy right now, that if you're an online business, and you're dealing with worldwide community - once a visitor likes your site, so much that he/she tells her friends about it, your site - BOOM. And then starts bringing in more visitors like a wildfire.
Oh, and FYI: Baidu is 6th in alexia rankings, according to wikipedia.
But hey, it's not my site. I'm just offering an idea here.
*Hides away from hard hitting questions*
I've been asking myself the same thing about an XF site (recently converted from vB) that I am running for my wife's church. How the heck are they even finding it let alone why they are indexing it puzzles me.What if your community is related about a small city within one of the states ....how will baidu help my forum ?
Have you gone through your cpanel access log?I've been asking myself the same thing about an XF site (recently converted from vB) that I am running for my wife's church. How the heck are they even finding it let alone why they are indexing it puzzles me.
119.63.196.41 - - [24/Jul/2011:05:13:55 -0700] "GET /misc/style?style_id=4&redirect=%2Fmisc%2Fstyle%3Fredirect%3D%252Fthreads%random-thread-title252.16076%252Fpage-4 HTTP/1.1" 303 - "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
I have and it looks like they are hitting every link off of the main URL page. Thinking out loud... they are likely picking up the URL from the church's Twitter account and are then doing a blind crawl of the site regardless of the sites locale or content.Have you gone through your cpanel access log?
this is the first thing I do when an ip range pisses me off.Baidu is apparently running amok I started to notice it a couple of days ago.
Code:<Directory /...path/to/..> Order allow,deny Allow from all Deny from 119.63.196. </Directory>
And gone are the buggers...
ip route add blackhole 119.63.196.0/24
I've actually thought about this exact thing. It could very well be a coincidence but it seems like every time I post something on Twitter, Baidu come a crawl'n. But then again, I didn't post anything yesterday and Baidu was all over SLRuser.I have and it looks like they are hitting every link off of the main URL page. Thinking out loud... they are likely picking up the URL from the church's Twitter account and are then doing a blind crawl of the site regardless of the sites locale or content.
We use essential cookies to make this site work, and optional cookies to enhance your experience.