XenForo high performance benchmarking (What sized server do I need?)

Slavik

XenForo moderator
Staff member
Foreword
This thread will serve as the basis for an up coming project I am undertaking to demonstrate and evaluate how XenForo performs under high stress situations. Much planning has done into this project, and notes which have been made will be transfered in due course into this thread.

The aim is simple. To provide a baseline for referencing the approximate server power any particular forum may require based on number of posts and active users.


Contents
  1. Introduction
  2. Special Thanks
  3. Total Number of Tests
  4. Testing Methodology
  5. Server Setups
  6. Server Stacks
  7. Test Data
  8. Data Size
  9. Results
  10. Analysis
  11. Conclusions
 
Introduction

One of the most common questions when people are looking for hosting for their XenForo forum, or are looking to transfer a large system into XenForo is "what sized server do I need?".

Whilst estimations and approximations will point someone in the right direction, it would be good to have some solid numbers to go by and give people a much closer idea to how much power they need, and when they might need to consider upgrading.

The following posts will detail the testing I intend to carry out in order to generate a workable dataset.
 
Special Thanks

Special thanks goes out to Rackspace for crediting my account with £50 to use towards the testing.

A shoutout to Linode support for similarly supporting the testing.
 
Total Number of Tests

Below is the potential number of tests that will be run. We do not know how much of an effect the database size has on a server. As such before any testing is done an experiement will take place running Server 1 with firstly an intentionally small dataset (500k posts) and then a large dataset (2 million posts). Dependent on the results we may trim the testing down to use only 1 dataset.

Server 1
Stack 1​
Small Dataset​
Large Dataset​
Stack 2​
Small Dataset​
Large Dataset​
Stack 3​
Small Dataset​
Large Dataset​
Stack 4​
Small Dataset​
Large Dataset​
Stack 5​
Small Dataset​
Large Dataset​

Server 2
Stack 1​
Small Dataset​
Large Dataset​
Stack 2​
Small Dataset​
Large Dataset​
Stack 3​
Small Dataset​
Large Dataset​
Stack 4​
Small Dataset​
Large Dataset​
Stack 5​
Small Dataset​
Large Dataset​

Server 3
Stack 1​
Small Dataset​
Large Dataset​
Stack 2​
Small Dataset​
Large Dataset​
Stack 3​
Small Dataset​
Large Dataset​
Stack 4​
Small Dataset​
Large Dataset​
Stack 5​
Small Dataset​
Large Dataset​

Server 4
Stack 1​
Small Dataset​
Large Dataset​
Stack 2​
Small Dataset​
Large Dataset​
Stack 3​
Small Dataset​
Large Dataset​
Stack 4​
Small Dataset​
Large Dataset​
Stack 5​
Small Dataset​
Large Dataset​
As you can see, with 10 tests per server and 4 servers, this amounts to a total of 40 tests.
 
Testing Methodology

Thought into how best to run this test was given from the very beggining. Whilst tools such as JMeter or Siege are popular, in my opinion they only give an indication of how best to optimise your sever to extract every last cycle of power out of it. What they don't show is the whole stack starting from the network through to the application at work to show you some real usable numbers.

As such I have decided to harness the power of the cloud to perform these tests. Using the cloud allows us to fire up multiple servers with the ability to simulate several thousand users per instance, thus giving us a much more realistic overview.

This then brings us onto the subject of Virtual Users and Simulated Users.

When running testing such as this, thought has so be given to how the load is generated and what it actually represents. Respectively the 2 types of load which we will be dealing with are Virtual Users and Simulated Users, both will give drastically different results. So whats the difference?

A Virtual User is exactly that, it represents creating a fake person accessing your website by opening a single connection to your server, loading the page and recording the result. By any account that might sound perfect, however in todays modern browers this is not the normal behaviour demonstrated.

Just a few years ago, Virtual Users may have been representative of how browsers accessed your site. Your internet browser of choise would have opened up a single connection to a site and loaded the page. However in modern day browsers opening up to 4-10 simultanious connections is quite normal to load pages much more quickly by downloading the various resources in parallel. Unfortunately simply adding 4-10x the load doesn't make up for this change in behaviours, and as such we need Simulated Users.

Simulated Users in essence aim to provide more accurate results by working in a similar way to how a real browser would load your site. A simulated user is generated, then proceeds to open multiple connections to a page and loads the elements of the page and records the result much as an end user would. This provides a much more realistic result, and as such is the type of testing we will undertake.

Now the final point in the methodology to look at is random vs pre-selected content. Many tests when running such benchmarks only hit one, or a number of pre-selected pages. This creates a false set of results as the reality of any forum is that users are all over the place and not just looking at a handful of threads.

To overcome this we have set up a special page which selects a random post from the database and then loads it. The Simulated User then records the result and repeats, pulling a new post and so on. This does create one problem and that is we have no way to repeat the exact same page loads when running the alternate setups, so a true like-for-like page for page comparison will not be possible, however I feel that this much more representative of a real forums behaviour and will provide a a much more usable set of results than simply loading a handful of pre-selected pages.

So that concludes our methodology on the testing. In short we will be using Simulated Users to open concurrent connections to a randomly selected page on our test site, recording the result and going again. Each time this occurs an extra user will be added, reaching all the way up to 10,000 concurrent users. If we find 10,000 simply doesn't provide enough load, the flexibility of the cloud-based testing we are using means we can easily ramp this up even further. However loads of concurrent 10,000 users are rarely even seen by some of the largest forum sites out there such as IGN*, Digitalpoint* or Bodybuilding.com*.

*To the best of my knowledge
 
Server Setups

Setting a universal baseline for server hardware is near impossible. So many variables come into play and hardware can change from order to order on some hosts, let alone comparing host to host changes.

However we need some point of reference, and being one of the most popular hosts available for VPS solutions, linode provide a solid baseline set of packages which can be fairly representative of packages across the internet.

Therefore, the following setups will be used.

Setup 1
Ram: 1GB
vCPUs: 1 (2.8ghz)
HDD: 7200 RPM Sata

Setup 2
Ram: 2GB
vCPUs: 2 (2.8ghz)
HDD: 7200 RPM Sata

Setup 3
Ram: 4GB
vCPUs: 2 (2.8ghz)
HDD: 7200 RPM Sata

Setup 4
Ram: 8GB
vCPUs: 4 (2.8ghz)
HDD: 7200 RPM Sata

These were chosen as the baseline for the tests. Of particular interest will be the results difference between setup 2 and 3, whilst still only having 2 vCPUs each, the ram difference will be the contributing factor as to the user capacity of the server.

Whilst the option to test higher capacities is available, already at an 8GB VPS you are able to aquire a dedicated server for a similar or lower cost which would offer higher performance, so testing any further would be a pointless exercise.
 
Server Stacks

Chosing the server stack and deciding on how to configure it was quite tricky. My personal preference would be to optimise each one to the maximum possible using a multitude of tools and personal experience before attempting any benchmarks. However this creates an issue, not all servers respond in the same way to the same enhancements.

Because of this, the decision was made that each server would be deployed using only the basic configuration. No advanced configuration or tuning would take place.

So which linux distro to use? One could spend a lifetime browing the internet for peoples opinions on which version of linux is best. Regardless it soon became apparent the only 2 real contenders to be considered would be CentOS and Debian.

I ran down a list of features for each, including things such as updates, stability, end user ease, commercial support, package management, out the box performance and finally my own preference.

In the end CentOS had a small margin and it realy came down to that even though Debian is more cutting edge and receives updates to packages much more quickly, I feel that CentOS by its nature (being based off RHEL which has its own independent testing and requirements of Fedora packages to ensure stability and quality is maintained) is much more geared to being an Enterprise Level system with focus on stability and being methodical. It also provides a much easier migration path if in the future users want to upgrade to a mission critical or highly supported system such as RHEL (as CentOS is mostly RHEL with the branding and trademarks removed, so maintains a near 100% binary compatability) or to Scientific Linux.

As such, all these tests will be based off the latest version of CentOS. Currently this is Centos 6.3

The remaining options come down to what software to run the web stack. Whilst providing a comprehensive list of web servers such as Hiawatha and lighttpd would be nice, it simply adds to a large list of items to test.

As such the stacks tested will be as follows.

CentOS, Apache, MySQL, PHP (As Apache Module)
- Not much to say here, the most popular stack setup going so will provide our baseline.​
CentOS, NginX, MySQL, PHP-FPM
- A more common performance setup​
CentOS, Cherokee, MySQL, PHP-FPM
- While NginX is more popular, Cherokee is often overlooked due to its GUI provided. However it is still a very high performance web server (sometimes beating NginX) and is considerably more user friendly in my experience.​
CentOS, NginX, Percona, PHP-FPM, APC
- Superstack one. Replacing MySQL with Percona and adding in APC to provide caching.​
CentOS, Cherokee, Percona, PHP-FPM, APC
- Superstack two. Replacing NginX with the more user friendly Cherokee.​
Why no mariaDB or PostgreSQL? As with the web servers, adding more simply adds to the list. MySQL is chosen obviously for its popularity. Percona was chosen due to the excelent performance, features, documentation and support.

Why no Xcache, EAccelerator, Varnish, Memcached? As previously, this is due to time restrictions. Also, APC has proven itself to be the cache of choice for many, and will be included in future versions of PHP itself due to it's popularity and excelent features. Memcached on XenForo seems to offer no benefits over APC except in some very specific circumstances. Varnish also offers questionable performance gains in performance stacks.

So this concludes the Server Stacks section of the experiement.

Each server will be subject to 5 sets of stacks and testing. As it currently stands I am planning to run these tests on 6 different servers, meaning 30 lots of tests. These tests will then be done on 3 data sizes. Meaning 90 total tests. Thats a lot of testing! As such some of the data sets may be removed to lower the total number of tests down, or, one or 2 of the servers used may be removed instead.
 
Will you include any information about addons? Many Xenforo users utilize addons extensively and this can have a significant impact on what hardware is required.
 
Will you include any information about addons? Many Xenforo users utilize addons extensively and this can have a significant impact on what hardware is required.

For these tests there will be a script in place adding a couple of queries (will be explained in the methodology), but no, in general the usage of addons will not be included in the tests due to several factors.

1) Quality of coding, depending on the addon creator the coding ranges from poor to exceptional, theres no uniform standard.
2) Addons are unique to most forums, even installing the top downloaded addons isn't representative.
3) Many addons collect their own data, so I have no way to adding that in to a legitimate data set as it currently stands.
 
1) Quality of coding, depending on the addon creator the coding ranges from poor to exceptional, theres no uniform standard.

I think this is even more reason to include more about addons in your testing. It would be really helpful if anyone reading this could at least compare how a clean Xenforo install performs versus one that has a multitude of addons installed, some of which may not be coded particularly well. That might help people in planning capacity to handle wanted addons as well as upgrades down the road. More information is always preferred to less, and I think information about percentages of server performance that addons could possibly take up would be invaluable considering how popular and easy addons are to both install and even create.
 
  • Like
Reactions: HWS
I think this is even more reason to include more about addons in your testing. It would be really helpful if anyone reading this could at least compare how a clean Xenforo install performs versus one that has a multitude of addons installed, some of which may not be coded particularly well. That might help people in planning capacity to handle wanted addons as well as upgrades down the road. More information is always preferred to less, and I think information about percentages of server performance that addons could possibly take up would be invaluable considering how popular and easy addons are to both install and even create.

Point taken. However likewise, all addons should aim to be of extremely high quality generating as close to 0 queries as possible, and thus, the load impact should be minimal.

However this would be an independent test taken at the end after all the other data is collected.
 
Given the current situation you should really consider testing the performance of Xenforo with very many activated addons too. It is not only the queries to be considered for performance loss.
 
Given the current situation you should really consider testing the performance of Xenforo with very many activated addons too. It is not only the queries to be considered for performance loss.

As previously stated there is no representative set of addons. So any testing with addons will be done at the end for a select number of setups only.
 
  • Like
Reactions: HWS
As previously stated there is no representative set of addons. So any testing with addons will be done at the end for a select number of setups only.

I understand. How about to just simulate the connection of addons with just calling the event controllers?

Digitalpoint assumed that there will be a performance loss with many addons:
http://xenforo.com/community/threads/best-way-to-determine-current-controller.22128/#post-278587

I just thought it would be good to find out if this may be true. And if yes, to pick up his solution in a future XF version: Compile Each Event Listener Type Into Single PHP File
 
Top Bottom