So who's running HA on their XF install?

ichpen

Well-known member
I'm thinking more of a low budget/low key somewhat High(er) Availability.

Previously I was running a simple setup with 2 VMs and a load balancer provided by one of my VPS providers:

CloudFlare -> LB node (Pool of 2 with failover)-> All-in-One VM 1 (lsyncd and mariadb/mysql master) replicating to -> VM2 (passive slave). In case of failure at VM1 the LB would switch traffic over mostly automatically. But backfilling VM1 was a manual process obviously.

Not an ideal setup as I don't like to run all in one servers with DB etc and I don't have geo redundant LBs either but on a shoe string budget it sort of worked. Plus at mercy of sync daemons and slave writes not getting naughty during a crash.

My XF forum is not busy enough to warrant full blown HA so I'm curious what others are running on smaller forums to at least minimize outages or potentially roll out updates safely to prod (outside of a staging area which we should all have).

Now I'm thinking of setting up a new XF 2 forum with more resilient infra but keeping costs down and without resorting to 5+ VMs. Ideally you'd want 2 LBs, 2 WWWs, 3 DBs (galera master-master replication) and likely 2 VMs for a Gluster FS or some other distributed filesystem. It's too much admin for my liking. Is there a middle ground? :)

Opinions?
 
Last edited:
You don't need any of that nonsense, and this doesn't need to be complicated.

On any good provider, you're going to (or should) have 99.99% uptime or darn close to it. That said, you are only potentially going to be using the 2nd VPS a very small amount of the time. If you're finding you have so much downtime on your main site that you need a major HA system in place, there's something very wrong, and it's time to change hosts. So generally speaking, the second VPS does not have to match the specs of your main VPS (if you're looking to save costs)...particularly when it comes to bandwidth, etc. Most of the time, it should just be sitting idle.

The easiest method is just to set up a series of CRON jobs to copy your files and database from the main VPS to the backup VPS. Every hour or so is probably fine.

Then use DNSMadeEasy to setup failover DNS.

Super easy, and only 2 VPS involved.
 

You had an "Unexpected error has occurred" for a while? Lol :)

Sorry couldn't resist...

You don't need any of that nonsense, and this doesn't need to be complicated.

On any good provider, you're going to (or should) have 99.99% uptime or darn close to it. That said, you are only potentially going to be using the 2nd VPS a very small amount of the time. If you're finding you have so much downtime on your main site that you need a major HA system in place, there's something very wrong, and it's time to change hosts. So generally speaking, the second VPS does not have to match the specs of your main VPS (if you're looking to save costs)...particularly when it comes to bandwidth, etc. Most of the time, it should just be sitting idle.

The easiest method is just to set up a series of CRON jobs to copy your files and database from the main VPS to the backup VPS. Every hour or so is probably fine.

Then use DNSMadeEasy to setup failover DNS.

Super easy, and only 2 VPS involved.

Fair enough, it's definitely not for everyone but there are many other benefits with running site mirrors. I don't think you ever want to do a file level copy of your database in any circumstance....
 
Fair enough, it's definitely not for everyone but there are many other benefits with running site mirrors.

I didn't say there weren't benefits to running a mirror. However, it makes no sense to run 5 servers to achive that mirror, unless you're either making some ultra serious money on your site, or the host for your main server is just that bad to where you're seeing constant downtime.

I don't think you ever want to do a file level copy of your database in any circumstance....

To a backup server that is presumably only ever going to be used for 50 minutes or less a year????? Not going to make any difference, I'm afraid.
 
I didn't say there weren't benefits to running a mirror. However, it makes no sense to run 5 servers to achive that mirror, unless you're either making some ultra serious money on your site, or the host for your main server is just that bad to where you're seeing constant downtime.
If you re-read my post I'm looking for a less than 5 server way of achieving some redundancy and adding some resiliency. 5 servers in the traditional sense is the minimum for a more proper HA approach.

To a backup server that is presumably only ever going to be used for 50 minutes or less a year????? Not going to make any difference, I'm afraid.

Well, I'm no DBA but I'm fairly sure you risk crazy corruption doing a hotcopy of sql data files on an in-use server (unless you lockup tables). There are many safe mysql replication and backup tools out there. To do a safe pure file copy you'd want to shut down mysql first, restart, check for errors etc. You may 'have' a backup server but if the data is crap you may as well avoid this altogether. I assume that's what you were referring to when you said 'copy'.
 
  • Like
Reactions: Xon
Can't register on your site Matt (highly masked invisible captcha not enabled). I guess I'll follow this up with some questions here.
Ah, let me have a look at that. Only just upgraded to XF2 and sorting some stuff out.
 
Spacebattles has a high degree of redundancy, generally due to the pricing model of Linode means the $20 and $40 linodes are the best CPU/memory value at the low end.
  • Linode's Node balancer terminates requests before forwarding to the webnodes..
  • 3x web nodes, using nginx/php/redis + 3-node elasticsearch cluster.
    • An elasticsearch node is on each web node, this uses the disk space that would be otherwise unused.
    • A custom csync2 wrapper is used to keep php files, avatars & (the few) attachments up to date across each node.
    • Maxscale (to be replaced with proxysql when I have the time) is used to locate & communicate to the database.
      • This is used to send writes to a single node, but non-transactional reads to all nodes.
    • Redis sentinel which locates the redis master and provides a list of redis slaves for php
    • A redis slave for read-only cache hits for css & other XF caching. This redis slave is not a failover target
  • 3x database nodes
    • MariaDB with galera clustering using xtrabackup to transfer state without blocking a node.
    • A Redis instance. Redis sentinels will promote one of these to a master when required.
  • Outbound mail gateway/testing site.
    • If this node is down, email queues in the database.
    • I hate email notification, so this being 100% redundant isn't a design goal.
This is vast overkill, but runs the site with a high degree of redundancy which allows me to upgrade components at any time with effectively no downtime and recovers automatically very quickly on an unexpected node failure.

The biggest issue is maxscale 1.x (before the licence change) tends to fall over every so often, but the loadbalancer routes around that until I get an alert and restart the process.

Still cheaper than the single old server Space Battles used to run on when I took over sysadmin in 2014 Q4.
 
Last edited:
I assume that's what you were referring to when you said 'copy'.

You assumed incorrectly. If you're setting up a HA system, then you presumably know and understand how to copy a database safely from one server to another. If not, you probably shouldn't be setting up your own HA system. You can call it replicate, copy, move, whatever you want. I'm not going to play the semantics game.

Like I said, 99.99% uptime should only yield about 52 minutes of downtime a year, or just under 5 minutes a month on average. You need a cheap backup VPS to receive the copied database and files, and the $49/year or whatever it costs these days business DNS plan from DNS Made Easy. That's it.

You said you wanted a simple and low-cost way of doing it. That's how you do it. Would cost you no more than $10/mo. for the whole thing.
 
Thanks @MattW for sharing that.

Interesting stuff for the ANAME part. I know AWS does the same with aliases on its own services (like for instance targeting a cloudfront config or a S3 bucket from your root dns if managed with route53), but didnt know some dns providers were working on getting cnames for the root.

Sounds like you're experimenting here :), happy with the setup ?

With 5 servers I'd have tried sthg more traditional like : a cluster of 2x servers dedicated to the LB, then a cluster of 2x servers for the webfiles for the front-end part and the last one for the DB (would have added a slave db with a 6th one if available on the budget). Always seemed to me that a master/master rep for the database works well until...the day u have a failure...where it's more complex to rebuild rather than just an active one (but maybe that's just me :p), but to be honest i know nothing regarding the galera cluster. Also no prob regarding perf with glusterFS ?
 
Still cheaper than the single old server Space Battles used to run on when I took over sysadmin in 2014 Q4.

... but does that take the cost of your time into consideration? :p

Seriously though - surely the extra time for setup & maintenance of a more complex setup like this does reduce the cost savings compared to a simple vertically scaled (ie large server!) installation?

Of course, then you need to consider the cost factor of outages (including during upgrades and system maintenance etc), which the more complex solution does let you avoid ... so the equation isn't necessarily as simple as it might seem.

Curious because I'm thinking through the same calculations myself.
 
  • Haha
Reactions: Xon
Sounds like you're experimenting here :), happy with the setup ?
The setup worked, and I did test the failover for each part, and it worked as expected. Each node could self heal and re-attached following downtime.

Also no prob regarding perf with glusterFS ?
It was OK, but Opcaching sped it up quite a bit by having a higher cache (3 minutes on the timestamp validation). Syncing the files over was pretty slow TBH, but again, for day to day stuff, was acceptable. I'm back onto a single VPS setup now, but at least I tried it :)
 
Impressive what you’ve done with the old ship!
SSD-based VM hosting is what makes it all possible.

... but does that take the cost of your time into consideration? :p

Seriously though - surely the extra time for setup & maintenance of a more complex setup like this does reduce the cost savings compared to a simple vertically scaled (ie large server!) installation?

Of course, then you need to consider the cost factor of outages (including during upgrades and system maintenance etc), which the more complex solution does let you avoid ... so the equation isn't necessarily as simple as it might seem.

Curious because I'm thinking through the same calculations myself.
The initial setup and learning had a fair overhead, but the day-to-day running is utterly trivial. But since I'm doing this for a hobby, I'm ok with sinking a bunch of time into technologies I wouldn't normally touch.

The primary reason is for a CPU & RAM per dollar, the $20 and $40 VM's from Linode and Digital Ocean give the best value per CPU over the larger VMs. Which for a php front-end is critical for handling load.

I made a 3 node database cluster because I wanted a failover database instance to survive the "oops we lost your VM" which can happen. With a sync cluster I can just point connections at any node without worrying about slave replication delay. And for $80 vs $120 for a slave vs a cluster the hosting cost is a drop in the bucket.
 
  • A custom csync2 wrapper is used to keep php files, avatars & (the few) attachments up to date across each node.

Curious how you managed this, currently have a setup using an nginx reverse proxy as a load balancer between two separate web servers that connect to a Galera cluster for the database, using an NFS mount for data and internal_data directories, and would love to be able to improve on this :p The csync2 setup is something I hadn't even wanted to touch, seems like it'd be way too easy for something to sync wrong and mess something up. This site is fairly heavy on attachments as well, around 500gb or so of attachments
 
Top Bottom