1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Baidu crawling my site like crazy

Discussion in 'Off Topic' started by Luke B, Aug 10, 2011.

  1. Luke B

    Luke B Active Member

    Hello all,

    I noticed over the last couple of days a crazy amount of hits from Baidu. Its been staying consistent pretty much all day and not much content has been added at all.

    Is this typical?
     
  2. Dean

    Dean Well-Known Member

    My site is constantly being crawled by Baidu. I've no idea what they are indexing, or not indexing.

    That seems to be the only spider that also constantly checks to see if my site is running without asking for content.
     
  3. Luke B

    Luke B Active Member

    Exactly what I'm experiencing. They do this all the time. Maybe they aren't as efficient as Google bot?
     
  4. Dean

    Dean Well-Known Member

    Possibly someone with more experience could give input. I've just started looking at these types of issues..
     
  5. Lucas

    Lucas Well-Known Member

    Same here, notice it a lot.
     
  6. SilverCircle

    SilverCircle Well-Known Member

    Baidu is apparently running amok :) I started to notice it a couple of days ago.

    Code:
    <Directory /...path/to/..>
      Order allow,deny
      Allow from all
      Deny from 119.63.196.
    </Directory>
    
    And gone are the buggers...
     
  7. iTuN3r

    iTuN3r Well-Known Member

    Exactly same problem here .
     
  8. Dean

    Dean Well-Known Member

    I firewalled most of the IPs at the server level, but wanted a few of them to continue to visit until I could understand what they are doing. Things like 'do they respect robots.txt? 'why are they doing status checks so often?'
     
  9. Carlos

    Carlos Well-Known Member

    Why would you want to block Baidu? Yes, really, I'm asking that question.

    Mark Zuckerberg took a visit to the Baidu headquarters a short while ago. This tells me that Facebook is interested in acquiring Baidu down the line. This in turn tells me that Facebook's choice of search engines is Baidu... So, if you want more traffic to your site, Baidu's interested in seeing what you've got. And if successful, they'll direct more 'human' to your site, just like google before them.
     
  10. SilverCircle

    SilverCircle Well-Known Member

    Because I can.
    And this has exactly what to do with the OP's request?
     
  11. SilverCircle

    SilverCircle Well-Known Member

    Afaik, it does respect robots.txt, but it will still hit your site like crazy, generating lots of unneeded traffic. Blocking it before it even sees the page makes sense.
     
  12. Carlos

    Carlos Well-Known Member

    ...Why the question? It has nothing to do with the request, I'm just stating that it's not really a good idea to block them especially when the future looks better with baidu as your search "king." Because you're actually getting visitors FROM other countries - especially China.

    China's business prospects are so busy right now, that if you're an online business, and you're dealing with worldwide community - once a visitor likes your site, so much that he/she tells her friends about it, your site - BOOM. And then starts bringing in more visitors like a wildfire.

    Oh, and FYI: Baidu is 6th in alexia rankings, according to wikipedia.

    But hey, it's not my site. I'm just offering an idea here.
     
  13. iTuN3r

    iTuN3r Well-Known Member

    What if your community is related about a small city within one of the states ....how will baidu help my forum ?
     
  14. Carlos

    Carlos Well-Known Member

    That's whole 'nother topic. *Hides away from hard hitting questions*
     
  15. Dean

    Dean Well-Known Member

    :ROFLMAO:

    (I do understand your point, it could be useful for some forums)
     
  16. Kevin

    Kevin Well-Known Member

    I've been asking myself the same thing about an XF site (recently converted from vB) that I am running for my wife's church. How the heck are they even finding it let alone why they are indexing it puzzles me.
     
  17. Dean

    Dean Well-Known Member

    Have you gone through your cpanel access log?

    Baidu is constantly trying to access the content of my site like this:
    Code:
    119.63.196.41 - - [24/Jul/2011:05:13:55 -0700] "GET /misc/style?style_id=4&redirect=%2Fmisc%2Fstyle%3Fredirect%3D%252Fthreads%random-thread-title252.16076%252Fpage-4 HTTP/1.1" 303 - "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
     
  18. Kevin

    Kevin Well-Known Member

    I have and it looks like they are hitting every link off of the main URL page. Thinking out loud... they are likely picking up the URL from the church's Twitter account and are then doing a blind crawl of the site regardless of the sites locale or content.
     
    Luke B likes this.
  19. EQnoble

    EQnoble Well-Known Member

    No I doubt that because they used to crawl my site like mad and I don't even think there is a single link from twitter linking to my site.

    this is the first thing I do when an ip range pisses me off.
    Code:
    ip route add blackhole 119.63.196.0/24
    
     
  20. Luke B

    Luke B Active Member

    I've actually thought about this exact thing. It could very well be a coincidence but it seems like every time I post something on Twitter, Baidu come a crawl'n. But then again, I didn't post anything yesterday and Baidu was all over SLRuser.

    I'm not really concerned only curious as to why so much.
     

Share This Page