Xenforo cluster for a new upcoming forum

Rait

New member
Hello!

I've just bought Xenforo and been building its backend infrastructure for couple of days now. I wanted to post this thread so people who would like to do the same thing as I did can ask me for guidance.
Lets get started....

First of all I decided that I want to start a forum for all Ops (ITOps, SysOps, DevOps etc) which would contain a lot of helpful discussion and guides for people to develop their skills in the big IT world. I did not want to make it so that it runs on a single host and is doomed if the VPS goes down, so I decided to build a fault tolerant cluster. I plan on writing down a whole guide how to do this, but it will be a long process to write every aspect of it down, because I used Oracle Linux 7 as my main distro and it had its ups and down regarding getting everything running. Some of the packages were missing from the repo's and I needed to compile the software myself etc.

This cluster has been made with using software listed below:
  • HAProxy (Floating VIP IP)
  • NGINX (3 Nodes)
  • MySQL + ClusterControl (2 Master nodes)
  • Memcached (2 Nodes)
  • GlusterFS (2 Nodes)
  • Plus monitoring softwares (Grafana, Prometheus etc)
Let me know what you think, was this all worth it? :D

Here is a little diagram how it all works:
Untitled Diagram.png
 

andybond

Well-known member
Hi fella,

Interesting idea

For ultimate resilience have you considered :

Placing each NGINX node in a different datacenter, offering a geographically disperate route
Failing that, keeping each NGINX node in the same datacenter but UPS powered by seperate feeds and two power supplies per host in case of power failure. I spec dual power supplies in my datacenters per host for this.
You must keep each VPS , or ideally dedicated host fed by two seperate NIC that are in LACP and also supplied by two separate feeds across two separate NICs on two separate LOM.

I am not 100% convinced of the requirement to do it , but technically as a challenge its fun. Well , something like that!
 

Rait

New member
Hi fella,

Interesting idea

For ultimate resilience have you considered :

Placing each NGINX node in a different datacenter, offering a geographically disperate route
Failing that, keeping each NGINX node in the same datacenter but UPS powered by seperate feeds and two power supplies per host in case of power failure. I spec dual power supplies in my datacenters per host for this.
You must keep each VPS , or ideally dedicated host fed by two seperate NIC that are in LACP and also supplied by two separate feeds across two separate NICs on two separate LOM.

I am not 100% convinced of the requirement to do it , but technically as a challenge its fun. Well , something like that!
Hi :)

Yep it is a interesting journey to get this thing fully redundant. The whole cluster is running off 2 dedicated servers running hypervisor which are in a cluster with HA migration. I have two ISP links coming into my firewall which has BGP configured so if one goes down the other one takes over. Dedicated hosts are connected by two separate NIC's and are in LACP. Which are connected to two switches which are in stack. I was thinking about going with AWS on this project to have Route 53, but I decided that the 2x 1Gbit/s links are fine and I never had issues with my production racks in the past 6 years I've had them there.

I basically did this because I like the technical side of the whole cluster and setting it up is a fun process.
 

andybond

Well-known member
Even better that you are in a HA cluster on a hypervisor

RAID disks ( mirror will be fine ) on the hypervisors?
 

Rait

New member
Servers are running on hardware RAID1 and VM’s disks are on datastores in storage (unity).
 

andybond

Well-known member
Servers are running on hardware RAID1 and VM’s disks are on datastores in storage (unity).
I was referring to the hypervisor OS on the RAID1 rather than the VM storage.

Either way sounds comprehensive.
 

briansol

Well-known member
to answer your question, was it worth it? probably not.

I've had next to 0 downtime that wasn't my own fault (touching things i shouldn't have) in the past 20 years of running servers. The one time i crashed that wasn't my fault, i lost a hard drive. even in raid 1+0, still had to turn the site off to do the disk rebuild.

For a new forum that will likely be low/little traffic for a long time, it's complete overkill and the hardware costs likely are not worth it.


you built an architecture for a 100k active forum, and you have 0 active users.
 

Rait

New member
to answer your question, was it worth it? probably not.

I've had next to 0 downtime that wasn't my own fault (touching things i shouldn't have) in the past 20 years of running servers. The one time i crashed that wasn't my fault, i lost a hard drive. even in raid 1+0, still had to turn the site off to do the disk rebuild.

For a new forum that will likely be low/little traffic for a long time, it's complete overkill and the hardware costs likely are not worth it.


you built an architecture for a 100k active forum, and you have 0 active users.
Ye I probably over built it to be honest, but at least I do not need to scale up in the future.
But I also did this to write down a guide how to do it as a lot of people have been asking about it on here. I guessed it would be a good tutorial for people.
I've had a lot of downtime's and not due to my fault as most of my configuration comes from automation (Ansible), but hardware faults, bad network cables etc.
Wait why did you have to turn the site off, as RAID1+0 is hot-swap RAID? Did something really bad happen to it, as I've never had to take down a site to replace a HDD in RAID1+0.
 

digitalpoint

Well-known member
I've been running a fairly high-traffic cluster for 10+ years at this point, so I'll share what my setup looks like in case it helps you (or anyone else). Over the years I've found that for some things, less is more (less things that can go wrong). At one point I had redundant load balancers that would route traffic to web servers and I also had database setup that involved a master with a bunch of slaves. Getting away from this was the single best thing I did... just too many issues where under high load transaction locking would cause replication lag and it just didn't scale well enough for SQL writes (things like Galera Cluster and InnoDB Cluster were a little better, but you are still bottlenecked by using a disk-based database system). Anyway, this is what I run currently:

  • MySQL Cluster with 8 data nodes (ndbcluster storage engine). I can't stress how good ndbcluster is. You can scale into billions of SQL reads per second and 10s of millions of SQL writes per second. You end up having zero down time for users (even through server reboots, software updates, backups are non-locking). Good video here:
  • All 8 servers have Nginx running, but Nginx is so efficient, there's only 1 live and the other 7 are just on standby. There's a cron job that checks the status of web server every 60 seconds and if it's not available (for any reason) it makes an API call to Cloudflare to just route traffic through the next available web server. Basically using Cloudflare as the load balancer and cutting out latency load balancers add. A 60 second max failover isn't a big deal because in the last 10 years, there's been exactly 0 times where an unplanned failover needed to happen. The systemd management of the nginx daemon automatically switches the Cloudflare DNS entry for the live web server before the process is stopped/restarted (so no need to remember to switch the DNS if you are doing maintenance). This is done with the ExecStop entry in the nginx systemd config.
  • I have 2 servers running memcached. I've thought about switching to Redis for some of the more advanced things it does (like the ability to not lose the cache on a server reboot), but memcached works so well and has been so stable, I haven't really gone down that road. If I was starting fresh, it's something I'd look at.
  • All 8 servers form an Elasticsearch cluster (2 copies of data across all the shards)
  • The 8 servers formed a GlusterFS partition with lots of redundancy (4 copies of everything spread across 8 servers). This is NOT used for static files (PHP files, templates, etc.) The overhead of using a network file system for files that rarely change isn't worth it. Specifically, gluster is used for the sub-directories within internal_data and data (but NOT internal_data/temp or internal_data/code_cache).

Code:
twin1:/home/sites/rlqry.com/web # ls -al *data
data:
total 12
drwxrwxrwx  2 root root 4096 Jul 15  2019 .
drwxr-xr-x 10 root root 4096 Oct 14  2019 ..
lrwxrwxrwx  1 root root   42 Jul 15  2019 attachments -> /gluster/sites/rlqry.com/data/attachments/
lrwxrwxrwx  1 root root   38 Jul 15  2019 avatars -> /gluster/sites/rlqry.com/data/avatars/
lrwxrwxrwx  1 root root   48 Jul 15  2019 imported_reactions -> /gluster/sites/rlqry.com/data/imported_reactions
-rwxrwxrwx  1 root root    1 Nov 22  2017 index.html
lrwxrwxrwx  1 root root   35 Jul 15  2019 video -> /gluster/sites/rlqry.com/data/video

internal_data:
total 28
drwxrwxrwx  4 root   root 4096 Jul 20  2019 .
drwxr-xr-x 10 root   root 4096 Oct 14  2019 ..
lrwxrwxrwx  1 root   root   50 Jul 15  2019 addon_batch -> /gluster/sites/rlqry.com/internal_data/addon_batch
lrwxrwxrwx  1 root   root   51 Jul 15  2019 attachments -> /gluster/sites/rlqry.com/internal_data/attachments/
drwxrwxrwx  5 root   root 4096 Jun 16 10:57 code_cache
lrwxrwxrwx  1 root   root   50 Jul 15  2019 file_check -> /gluster/sites/rlqry.com/internal_data/file_check/
-rwxrwxrwx  1 root   root   31 Nov 22  2017 .htaccess
lrwxrwxrwx  1 root   root   50 Jul 15  2019 image_cache -> /gluster/sites/rlqry.com/internal_data/image_cache
lrwxrwxrwx  1 root   root   51 Jul 15  2019 imported_xml -> /gluster/sites/rlqry.com/internal_data/imported_xml
-rwxrwxrwx  1 root   root    1 Nov 22  2017 index.html
-rwxrwxrwx  1 wwwrun www    86 Jul 20  2019 install-lock.php
lrwxrwxrwx  1 root   root   51 Jul 15  2019 oembed_cache -> /gluster/sites/rlqry.com/internal_data/oembed_cache
lrwxrwxrwx  1 root   root   48 Jul 15  2019 sitemaps -> /gluster/sites/rlqry.com/internal_data/sitemaps/
drwxrwxrwx  2 wwwrun www  4096 Aug  7 15:57 temp

There is a custom addon that triggers a csync2 -x command when things like templates are written to the file system. Again, csync2 is used to keep stuff in sync that rarely change (and typically only by an administrator action). In case you missed it before, running live PHP files on a networked file system isn't a good idea because (normally) PHP is constantly checking if the files have changed (better to keep them local on each server).

Hardware-wise, all servers are running RAID-6 with 6 hard drives (any two drives can fail without downtime, usable space is the size of 4 hard drives), have 1TB of RAM and are interconnected with 54Gbit Infiniband.
 

Rait

New member
I've been running a fairly high-traffic cluster for 10+ years at this point, so I'll share what my setup looks like in case it helps you (or anyone else). Over the years I've found that for some things, less is more (less things that can go wrong). At one point I had redundant load balancers that would route traffic to web servers and I also had database setup that involved a master with a bunch of slaves. Getting away from this was the single best thing I did... just too many issues where under high load transaction locking would cause replication lag and it just didn't scale well enough for SQL writes (things like Galera Cluster and InnoDB Cluster were a little better, but you are still bottlenecked by using a disk-based database system). Anyway, this is what I run currently:

  • MySQL Cluster with 8 data nodes (ndbcluster storage engine). I can't stress how good ndbcluster is. You can scale into billions of SQL reads per second and 10s of millions of SQL writes per second. You end up having zero down time for users (even through server reboots, software updates, backups are non-locking). Good video here:
  • All 8 servers have Nginx running, but Nginx is so efficient, there's only 1 live and the other 7 are just on standby. There's a cron job that checks the status of web server every 60 seconds and if it's not available (for any reason) it makes an API call to Cloudflare to just route traffic through the next available web server. Basically using Cloudflare as the load balancer and cutting out latency load balancers add. A 60 second max failover isn't a big deal because in the last 10 years, there's been exactly 0 times where an unplanned failover needed to happen. The systemd management of the nginx daemon automatically switches the Cloudflare DNS entry for the live web server before the process is stopped/restarted (so no need to remember to switch the DNS if you are doing maintenance). This is done with the ExecStop entry in the nginx systemd config.
  • I have 2 servers running memcached. I've thought about switching to Redis for some of the more advanced things it does (like the ability to not lose the cache on a server reboot), but memcached works so well and has been so stable, I haven't really gone down that road. If I was starting fresh, it's something I'd look at.
  • All 8 servers form an Elasticsearch cluster (2 copies of data across all the shards)
  • The 8 servers formed a GlusterFS partition with lots of redundancy (4 copies of everything spread across 8 servers). This is NOT used for static files (PHP files, templates, etc.) The overhead of using a network file system for files that rarely change isn't worth it. Specifically, gluster is used for the sub-directories within internal_data and data (but NOT internal_data/temp or internal_data/code_cache).

Code:
twin1:/home/sites/rlqry.com/web # ls -al *data
data:
total 12
drwxrwxrwx  2 root root 4096 Jul 15  2019 .
drwxr-xr-x 10 root root 4096 Oct 14  2019 ..
lrwxrwxrwx  1 root root   42 Jul 15  2019 attachments -> /gluster/sites/rlqry.com/data/attachments/
lrwxrwxrwx  1 root root   38 Jul 15  2019 avatars -> /gluster/sites/rlqry.com/data/avatars/
lrwxrwxrwx  1 root root   48 Jul 15  2019 imported_reactions -> /gluster/sites/rlqry.com/data/imported_reactions
-rwxrwxrwx  1 root root    1 Nov 22  2017 index.html
lrwxrwxrwx  1 root root   35 Jul 15  2019 video -> /gluster/sites/rlqry.com/data/video

internal_data:
total 28
drwxrwxrwx  4 root   root 4096 Jul 20  2019 .
drwxr-xr-x 10 root   root 4096 Oct 14  2019 ..
lrwxrwxrwx  1 root   root   50 Jul 15  2019 addon_batch -> /gluster/sites/rlqry.com/internal_data/addon_batch
lrwxrwxrwx  1 root   root   51 Jul 15  2019 attachments -> /gluster/sites/rlqry.com/internal_data/attachments/
drwxrwxrwx  5 root   root 4096 Jun 16 10:57 code_cache
lrwxrwxrwx  1 root   root   50 Jul 15  2019 file_check -> /gluster/sites/rlqry.com/internal_data/file_check/
-rwxrwxrwx  1 root   root   31 Nov 22  2017 .htaccess
lrwxrwxrwx  1 root   root   50 Jul 15  2019 image_cache -> /gluster/sites/rlqry.com/internal_data/image_cache
lrwxrwxrwx  1 root   root   51 Jul 15  2019 imported_xml -> /gluster/sites/rlqry.com/internal_data/imported_xml
-rwxrwxrwx  1 root   root    1 Nov 22  2017 index.html
-rwxrwxrwx  1 wwwrun www    86 Jul 20  2019 install-lock.php
lrwxrwxrwx  1 root   root   51 Jul 15  2019 oembed_cache -> /gluster/sites/rlqry.com/internal_data/oembed_cache
lrwxrwxrwx  1 root   root   48 Jul 15  2019 sitemaps -> /gluster/sites/rlqry.com/internal_data/sitemaps/
drwxrwxrwx  2 wwwrun www  4096 Aug  7 15:57 temp

There is a custom addon that triggers a csync2 -x command when things like templates are written to the file system. Again, csync2 is used to keep stuff in sync that rarely change (and typically only by an administrator action). In case you missed it before, running live PHP files on a networked file system isn't a good idea because (normally) PHP is constantly checking if the files have changed (better to keep them local on each server).

Hardware-wise, all servers are running RAID-6 with 6 hard drives (any two drives can fail without downtime, usable space is the size of 4 hard drives), have 1TB of RAM and are interconnected with 54Gbit Infiniband.
And I thought my cluster is too much for XenForo :D Man that is awesome, but one thing that I would argue on is the GlusterFS. I'm running multiple sites with a lot of traffic on GlusterFS and never had any issues. Even running a medium sized crypto trading site cluster with it and haven't had a single downtime.
 

digitalpoint

Well-known member
It certainly will work and shouldn’t cause any issues, but it’s not the most efficient thing to do. The way PHP normally works is it’s constantly checking if PHP files have changed so it can recompile them. A normal XenForo request will use over 100 PHP files... so why add hundreds of network i/o requests per second for things that are accessed often and (very) rarely change. If you must use a networked file system for PHP files, you might want to adjust how often (or if at all) PHP checks for file changes. The annoying part of that is changed files might not be seen right away (or at all until you restart PHP-FPM).
 

Rait

New member
You have a really solid point there. I might do some changes in the near future, but right now there has not been any issues with network hitting it's limits. In the future I most likely will consider keeping some of XF PHP files on the NGINX hosts. I just need to keep on monitoring the network and if that becomes a bottleneck then its time to think of a new solution.
 

digitalpoint

Well-known member
Yep... like I said, it will work, it's just more of cutting out unnecessary overhead, that's all. Sure, it works... but why not make your servers run more efficiently. The "check" for changes on PHP files will be faster when it doesn't need to talk to other Gluster nodes through the network, so it will actually even make your site (slightly) faster by cutting out that overhead. Just something to think about going forward. At the very least, you might want to consider raising the opcache.revalidate_freq setting for PHP.

 

Rait

New member
Yep I've tuned opcache settings. These are my settings and I use OpCacheGUI for monitoring.
Code:
opcache.enable=1
opcache.revalidate_freq=120
opcache.validate_timestamps=1
opcache.max_accelerated_files=7500
opcache.memory_consumption=192
opcache.interned_strings_buffer=16
opcache.fast_shutdown=1

Some of my bigger clients have opcache frequency set to 2-4 hours. When they deploy code to production I've setup the deployment script so that it will call flush_cache.php which will clear the opcache so it checks the files again.
 
Last edited:
Top