Shared Hosting - Trying to save my site.

ChicagoCoin

Member
My site is on Shared Hosting and has been for the past 19 years, we have noticed sporadic site response times, sometimes it loads fine and other times it takes awhile to load, also have some 503 errors. I contacted the host and it's due to a procwatch daemon.

Sorry for the trouble with your site. It turns out that the account has
processes are being killed by our procwatch daemon. Procwatch is a
daemon that runs constantly on shared servers to monitor the usage of
RAM/CPU and execution time so that no single user can use an
inappropriately high percentage of the shared resources and impact the
overall health of the server or the server’s ability to serve all users’
pages.
Additionally, all processes run by all users on the server from the same account are also counted together. When a process is killed it is generally not using too much memory by itself, it was just the process that tipped the total usage over the limit.
I was trying to get things back to normal before receiving this response and updated XF to the latest version, they first suggested to use a robots.txt file to limit crawlers, did some research and it looks like a different option might be CloudFlare? It's available but not activated. They mention that it's memory related and then suggest a VPS or private server. This has just been a hobby of mine and due to the size of the site, a VPS is not something that I'd consider.

I'm wondering if Cloudflare and/or a robots.txt file would help, I should mention that I have a custom phprc file (php.ini) file that I altered a long time ago to allow large file uploads, maybe that also needs to be altered?
 
Last edited:

motowebmaster

Well-known member
I haven't used shared hosting in a long time, but have helped some forums on shared hosting that did run very well. You should shop around.

I limit particular "agents" and block them at my firewall, but only to keep the baddies out. Crawlers are going to be a fact of life, and for a site that is optimized for guest caching it's really not going to put that much of a load.
 

ChicagoCoin

Member
I've been with Dreamhost since 2007, trying to see if Cloudflare and/or a robots.txt might work?

I really haven't had many problems with them until now. Although a few times per year they have started to delete my backup folder which contains a bunch of daily and weekly database files. They started to add me into something called DreamObjects where my deleted backups have been placed for a 60 day trial, I keep rejecting that and it's getting annoying.

This is a rather new huge problem and I'd like to see if their suggestions of a robot.txt or CloudFlare might help.
 

ChicagoCoin

Member
They linked to a help page about blocking spiders with robots.txt and they also have a setting in the control panel on a per domain basis with a bunch of options, wondering which ones I should try to activate:
Bing (User agent: bingbot)
Google (User agents: Googlebot, Mediapartners-Google, Adsbot-Google)
Majestic (User agent: MJ12bot)
Yandex (User agent: YandexBot)

And then directories to block from EVERY spider:
File extensions to block from EVERY spider:
Ask all spiders to pause? how many seconds:

Do you think any of this stuff might help and what would harm XF?
 

VersoBit

Well-known member
They linked to a help page about blocking spiders with robots.txt and they also have a setting in the control panel on a per domain basis with a bunch of options, wondering which ones I should try to activate:
Bing (User agent: bingbot)
Google (User agents: Googlebot, Mediapartners-Google, Adsbot-Google)
Majestic (User agent: MJ12bot)
Yandex (User agent: YandexBot)

And then directories to block from EVERY spider:
File extensions to block from EVERY spider:
Ask all spiders to pause? how many seconds:

Do you think any of this stuff might help and what would harm XF?
From experience utilizing robots is not going to provide enough of a mitigation to improve performance. If they are stating there is an issue with bots, it would make more sense to switch to Cloudflare and turn on bot fight mode and see if conditions improve.

In the interim, you can use the robots.txt that we have over on Fellowsfilm as a base template, it has worked pretty well for us:
Code:
User-agent: *
Crawl-Delay: 60

Disallow: /whats-new/
Disallow: /account/
Disallow: /attachments/
Disallow: /goto/
Disallow: /posts/
Disallow: /login/
Disallow: /admin.php
Allow: /

Sitemap: https://fellowsfilm.com/sitemap.php

This defines that any user agent should take 60 seconds in between each browse event, this can be increased or decreased as needed.
 
Last edited:

VersoBit

Well-known member
They linked to a help page about blocking spiders with robots.txt and they also have a setting in the control panel on a per domain basis with a bunch of options, wondering which ones I should try to activate:
Bing (User agent: bingbot)
Google (User agents: Googlebot, Mediapartners-Google, Adsbot-Google)
Majestic (User agent: MJ12bot)
Yandex (User agent: YandexBot)

And then directories to block from EVERY spider:
File extensions to block from EVERY spider:
Ask all spiders to pause? how many seconds:

Do you think any of this stuff might help and what would harm XF?
Additionally, just reviewing some of the other robots.txt around the web on XenForo based forums, it seems to be common to block a few bots specifically...

Code:
User-agent: PetalBot
User-agent: AspiegelBot
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: DotBot
User-agent: MauiBot
User-agent: MJ12bot
Disallow: /community/

User-agent: Amazonbot
Disallow: /community/threads/*/reply

Depending on your host, they may block some robots at the WAF level, but that really depends on who you host with and how they customize you per domain. We employ a few different tools to block bots with NGINX and Cloudflare.
 
Top