• This forum has been archived. New threads and replies may not be made. All add-ons/resources that are active should be migrated to the Resource Manager. See this thread for more information.

Tip for XenForo users on Shared Hosting

mrGTB

Well-known member
OK, since installing XenForo yesterday there have been times when things are running slow, but looking at the online list there have been quite a few bots present. At times I've hit my server resources limit (they use the tool on my site) showing too many connections.

Earlier today I decided to add this to my robots.txt file and things have been running much smoother and faster since. It may prove useful to others here also to limit all spiders (not just some) if your on shared hosting. It may also benefit you to stop using things like "TwitterFeed" to auto send all threads into Facebook and twitter e.t.c. Which only sends lots of spiders on your sites none stop slowing things down, "Shared hosting" isn't really cut out for it!

I use this in my robots.txt (crawl delay) being the thing I'm talking about here. If your on shared hosting it might pay to start limiting them.

Code:
User-agent: *
Crawl-delay: 10
Disallow: /cgi-bin/

Can you move this to TIP's for people using shared hosting. I've posted it in the wrong forum by mistake.
 
What I posted above doesn't block any search engines. It just slows the rate down at which they can crawl your site (if they follow the rule). Not all bots do of course, but search engines like Google, Yahoo and MSN do. Basically, it's telling Yahoo and others to slow their crawl rate down using that set period of time "crawl-delay". That helps because spiders like Yahoo have been known to hit your site in mass number fast, and that if your on shared hosting can bring your server to a crawl.

It's not a perfect method, but it helps with shared hosting.

If you search the web on Robots.txt you can find many guides allowing you to only slow certain spiders down. There are tons of guides out there. I posted the above because it's easier to just slow the whole lot down using one entry to cover all. I do use it on my own forum, but now since yesterday I'm using "CloudFlare" also and that seems so far to have made a huge difference for my forum. Still seeing how things go right now though?
 
some infos:
Crawl-delay: 10 means that ony every 10 seconds a page can be indexed.
if you set this to 60, there can be only 1440 pages indexed per day.
so it depends on your website how you set this.
 
Be careful though how high you set that crawl-delay setting, Google is your friend, there's plenty info on the best crawl-delay times to go with using! It's not recommended that you go with a high setting slowing them down too much, and it can have an adverse effect in not getting your community indexed very well doing so.
 
Not all bots do of course, but search engines like Google, Yahoo and MSN do.
The Microsoft bots do not appear to respect crawl-delay at all. I've gone back and forth with them about this for some time, and while they insist that their bots respect robots.txt, whenever I've tried to implement crawl-delay, my logs contradict their claims.

In my experience so far there has not been any negative effect to completely blocking the MS bots. My site comes up on top of Google and bing, but the MS bots have been blocked for more than a year. Which would seem to indicate that bing relies pretty heavily on other, non-spider, data.
 
Hmm. like you say it's been said MS do respect it though. But if that's your findings I won't argue the point? Oddly, my domain has never been plagued much by MS spiders, they've never posed me a problem anyway in numbers coming. More yahoo slurp and baiduspider are my enemy bots, yet if I search yahoo on my domain I get very poor results given. If anything, I should block Yahoo because they have proven to be useless to me.
 
Yeah, I've limited slurp to something ridiculous like once an hour. That used to be the worst offender. Haven't been swarmed by baidu yet.
 
The Microsoft bots do not appear to respect crawl-delay at all. I've gone back and forth with them about this for some time, and while they insist that their bots respect robots.txt, whenever I've tried to implement crawl-delay, my logs contradict their claims.

In my experience so far there has not been any negative effect to completely blocking the MS bots. My site comes up on top of Google and bing, but the MS bots have been blocked for more than a year. Which would seem to indicate that bing relies pretty heavily on other, non-spider, data.

Bing is powered by Google. ;)
http://xenforo.com/community/threads/bing-copying-google-results.11433/
 
Top Bottom