tristan9
Member
Hi,
We are soon going to roll out Xenforo (2.2.12 on PHP 8.1.13 though that's not super relevant) as our public forums solution. As the one responsible for the ops side of things, I have some questions regarding the performance of it all. Mostly around 2 main concerns:
For the pure latency/performance side: we do serve millions of daily users (though not all will visit our forums) spread all around the world, so the latency penalty of network roundtrips all the way to our hosting facilities might be pretty bad (which is why we run our edge network).
Even with Xon's excellent Redis plugins and a page cache setup, that is still up to at best ~240ms penalty for guests on the other side of the globe, even if the render time was then instantaneous (which it understandably isn't). You can't really beat a 10ms latency for a regional cache even on cache hit :/
For the DoS mitigation concerns: We are regularly on the receiving end of DoS attacks. While they're not world-record breaking, and we have a well-oiled granular rate-limit system, assuming something like XF non-asset-paths (ie not /js, /style, /data, /css.php, etc) limited to 5r/s/ip with a 60s window, we would still see spikes up to something just under 50'000 requests/s from an attacker with some 10'000 host+ip botnet (which would mean each IP is technically under the limit), and that's not super uncommon for us.
Usually we can handle that kind of abuse (and many times more, but this is a good rough estimate of the average skiddie attack) just fine through aggressive caching (and active cache-busting in our apps) and other techniques, but with XF insisting on passing CSRF and session cookies, we'd have to transport all of that noise to our XF deployment, and it's unlikely that it would be any able to handle it gracefully (we do deploy it in our on-prem k8s clusters, w/ autoscaling etc, but we don't have the compute capacity to scale up infinitely). At least not without introducing an UAM-like interstitial of some type, which we've tried to avoid so far.
Also, as I imagine it will (understandably) be suggested from this, we cannot use Cloudflare (or similar services) due to a mix of bandwidth limits on their free plan (multi-PB monthly traffic site), costs associated with that (we are non-profit), and privacy concerns.
Either way, I was looking at how to improve XF's cacheability, and came up to the following setup:
However, I was hoping to be able to cache guest pages at the edge (that is any guest-cacheable page from the X-XF-Cache-Status being HIT where the request lacks a xf_user cookie) to further encompass it all. Unfortunately I'm seeing some confusing inconsistencies from that point onward:
I was wondering what other people were doing for that? Assuming there is a solution at all (I see that Xenforo's own forums are using CF's dynamic mode even for guests, so I guess not a no-compromises one at least); idk who runs the biggest deployments and how they achieve it confidently.
For the record, I've read the following threads before, looking for similar information:
Thanks
We are soon going to roll out Xenforo (2.2.12 on PHP 8.1.13 though that's not super relevant) as our public forums solution. As the one responsible for the ops side of things, I have some questions regarding the performance of it all. Mostly around 2 main concerns:
- latency penalty due to uncacheability of even guest pages (due to CSRF and session cookies)
- difficulty of efficient edge-side DoS mitigation for the same reasons
For the pure latency/performance side: we do serve millions of daily users (though not all will visit our forums) spread all around the world, so the latency penalty of network roundtrips all the way to our hosting facilities might be pretty bad (which is why we run our edge network).
Even with Xon's excellent Redis plugins and a page cache setup, that is still up to at best ~240ms penalty for guests on the other side of the globe, even if the render time was then instantaneous (which it understandably isn't). You can't really beat a 10ms latency for a regional cache even on cache hit :/
For the DoS mitigation concerns: We are regularly on the receiving end of DoS attacks. While they're not world-record breaking, and we have a well-oiled granular rate-limit system, assuming something like XF non-asset-paths (ie not /js, /style, /data, /css.php, etc) limited to 5r/s/ip with a 60s window, we would still see spikes up to something just under 50'000 requests/s from an attacker with some 10'000 host+ip botnet (which would mean each IP is technically under the limit), and that's not super uncommon for us.
Usually we can handle that kind of abuse (and many times more, but this is a good rough estimate of the average skiddie attack) just fine through aggressive caching (and active cache-busting in our apps) and other techniques, but with XF insisting on passing CSRF and session cookies, we'd have to transport all of that noise to our XF deployment, and it's unlikely that it would be any able to handle it gracefully (we do deploy it in our on-prem k8s clusters, w/ autoscaling etc, but we don't have the compute capacity to scale up infinitely). At least not without introducing an UAM-like interstitial of some type, which we've tried to avoid so far.
Also, as I imagine it will (understandably) be suggested from this, we cannot use Cloudflare (or similar services) due to a mix of bandwidth limits on their free plan (multi-PB monthly traffic site), costs associated with that (we are non-profit), and privacy concerns.
Either way, I was looking at how to improve XF's cacheability, and came up to the following setup:
- Forcing caching of asset paths is not a problem with a few overrides in varnish & friends (/js, /style, /data, /css.php, ...)
- Forcing caching of cookie-less requests to guest pages is also fine (ie no-cookie + X-XF-Cache-Status set to HIT means we can safely cache it after removing the Set-Cookie response)
- Ofc we have the generic performance stuff enabled (our main backend is in PHP so we're used to that tuning+setup): opcache, general+session HA redis cluster, page-cache-dedicated HA redis cluster, ES cluster+enhanced search addon enabled, nginx+fpm correctly tuned, relevant internal_data subdirectories (non-temp ones, basically) on a high-performance distributed filesystem (40Gbps & jumbo frames internal network, SSD-based CephFS) for XF containers, container-local SSD storage for temp dir, separate (caching) network route and nginx cluster for the data-serving path, tuned mysql cluster deployment for XF with the replica-aware DB adapter configured (tho it seems to seldom use replicas at all?), ...
- We're aware this will mess with XF's visitor counts, but that's not relevant to us as we have our own audience monitoring already set up
However, I was hoping to be able to cache guest pages at the edge (that is any guest-cacheable page from the X-XF-Cache-Status being HIT where the request lacks a xf_user cookie) to further encompass it all. Unfortunately I'm seeing some confusing inconsistencies from that point onward:
- Upon logout, it seems the / (or redirect, as we use an OAuth2 login) cache somehow ends up poisoned with the logged out user's information
- Login occasionally gets messed up due to the CSRF cookie not being set by the time /login/?xf_... gets called
I was wondering what other people were doing for that? Assuming there is a solution at all (I see that Xenforo's own forums are using CF's dynamic mode even for guests, so I guess not a no-compromises one at least); idk who runs the biggest deployments and how they achieve it confidently.
For the record, I've read the following threads before, looking for similar information:
- https://xenforo.com/community/threa...ion-cloudflare-full-html-page-caching.202315/
- https://xenforo.com/community/threa...-amazon-s3-for-file-storage-in-xf-2-1.156282/
- https://xenforo.com/community/threads/xenforo-on-amazon-ec2.111804/
Thanks
Last edited: