Xenforo and edge caches

tristan9

Member
Hi,

We are soon going to roll out Xenforo (2.2.12 on PHP 8.1.13 though that's not super relevant) as our public forums solution. As the one responsible for the ops side of things, I have some questions regarding the performance of it all. Mostly around 2 main concerns:
  • latency penalty due to uncacheability of even guest pages (due to CSRF and session cookies)
  • difficulty of efficient edge-side DoS mitigation for the same reasons

For the pure latency/performance side: we do serve millions of daily users (though not all will visit our forums) spread all around the world, so the latency penalty of network roundtrips all the way to our hosting facilities might be pretty bad (which is why we run our edge network).
Even with Xon's excellent Redis plugins and a page cache setup, that is still up to at best ~240ms penalty for guests on the other side of the globe, even if the render time was then instantaneous (which it understandably isn't). You can't really beat a 10ms latency for a regional cache even on cache hit :/

For the DoS mitigation concerns: We are regularly on the receiving end of DoS attacks. While they're not world-record breaking, and we have a well-oiled granular rate-limit system, assuming something like XF non-asset-paths (ie not /js, /style, /data, /css.php, etc) limited to 5r/s/ip with a 60s window, we would still see spikes up to something just under 50'000 requests/s from an attacker with some 10'000 host+ip botnet (which would mean each IP is technically under the limit), and that's not super uncommon for us.
Usually we can handle that kind of abuse (and many times more, but this is a good rough estimate of the average skiddie attack) just fine through aggressive caching (and active cache-busting in our apps) and other techniques, but with XF insisting on passing CSRF and session cookies, we'd have to transport all of that noise to our XF deployment, and it's unlikely that it would be any able to handle it gracefully (we do deploy it in our on-prem k8s clusters, w/ autoscaling etc, but we don't have the compute capacity to scale up infinitely). At least not without introducing an UAM-like interstitial of some type, which we've tried to avoid so far.

Also, as I imagine it will (understandably) be suggested from this, we cannot use Cloudflare (or similar services) due to a mix of bandwidth limits on their free plan (multi-PB monthly traffic site), costs associated with that (we are non-profit), and privacy concerns.

Either way, I was looking at how to improve XF's cacheability, and came up to the following setup:
  • Forcing caching of asset paths is not a problem with a few overrides in varnish & friends (/js, /style, /data, /css.php, ...)
  • Forcing caching of cookie-less requests to guest pages is also fine (ie no-cookie + X-XF-Cache-Status set to HIT means we can safely cache it after removing the Set-Cookie response)
  • Ofc we have the generic performance stuff enabled (our main backend is in PHP so we're used to that tuning+setup): opcache, general+session HA redis cluster, page-cache-dedicated HA redis cluster, ES cluster+enhanced search addon enabled, nginx+fpm correctly tuned, relevant internal_data subdirectories (non-temp ones, basically) on a high-performance distributed filesystem (40Gbps & jumbo frames internal network, SSD-based CephFS) for XF containers, container-local SSD storage for temp dir, separate (caching) network route and nginx cluster for the data-serving path, tuned mysql cluster deployment for XF with the replica-aware DB adapter configured (tho it seems to seldom use replicas at all?), ...
  • We're aware this will mess with XF's visitor counts, but that's not relevant to us as we have our own audience monitoring already set up

However, I was hoping to be able to cache guest pages at the edge (that is any guest-cacheable page from the X-XF-Cache-Status being HIT where the request lacks a xf_user cookie) to further encompass it all. Unfortunately I'm seeing some confusing inconsistencies from that point onward:
  • Upon logout, it seems the / (or redirect, as we use an OAuth2 login) cache somehow ends up poisoned with the logged out user's information
  • Login occasionally gets messed up due to the CSRF cookie not being set by the time /login/?xf_... gets called

I was wondering what other people were doing for that? Assuming there is a solution at all (I see that Xenforo's own forums are using CF's dynamic mode even for guests, so I guess not a no-compromises one at least); idk who runs the biggest deployments and how they achieve it confidently.

For the record, I've read the following threads before, looking for similar information:

Thanks
 
Last edited:
I use AWS Cloudfront as CDN and it works very well, everything is served from the CDN - on the same hostname - with the exception of every xenforo page (guest and logged in). Also the "am I still logged in" requests to index.php and the PWA (where xenforo appears as a webpage) is served directly from the server. I have very low traffic at the moment so I dont even need a CDN, but that works out of the box for me at the moment.

For Attack / DOS: What kind of Web Application Firewall (WAF) do you use ? I use Modsecurity currently after some setup on highest paranoia level 4, with also some special rules on nginx like limit the time a "visitor" uses POST/PUT requests per minute etc. There are some crazy attacks with thousands of POST requests but most are for wordpress installations.
 
For our main site we use HAProxy stick tables for rate-limit counting and enforcing, and as WAF we also use ModSecurity on a mix of level 1 and 2, mainly because false-positives are already a difficult problem and because our API is relatively clean so it's more of an over-the-top thing than a necessity.

And for Xenforo so far, while not in prod yet, yeah the caching works just fine for non-pages content, and we added ModSecurity at level 2 with the XF exclusions. It's unlikely we'll have the time to raise that to PL4 however unless it just works out of the box...
 
For our main site we use HAProxy stick tables for rate-limit counting and enforcing, and as WAF we also use ModSecurity on a mix of level 1 and 2, mainly because false-positives are already a difficult problem and because our API is relatively clean so it's more of an over-the-top thing than a necessity.

And for Xenforo so far, while not in prod yet, yeah the caching works just fine for non-pages content, and we added ModSecurity at level 2 with the XF exclusions. It's unlikely we'll have the time to raise that to PL4 however unless it just works out of the box...
I have not really optimized ModSecurity for xenForo, i had to exclude more than a handful of ModSecurity rules for the admin control panel so its possible to update templates and so. The XenForo exclusions are pretty good for the public area.

Good choice with HAProxy.
 
i had to exclude more than a handful of ModSecurity rules for the admin control panel
Ah, our admin panel isn't routable via the internet at all (exposed only on the internal network, with another domain, etc ...), so the route for it doesn't have modsecurity enabled. At first I let it on by default as a "doesn't hurt" type of thinking, but it caused quite a few issue so I took it out yeah.
 
I'm using Litespeed's LSCache to cache pages for guests only (I don't run apache or nginx). Redis is being used as well for everyone, but not full-page caching. I understand that Redis can support that, but haven't tried it. Litespeed also has an excellent Web Application Firewall feature that I'm using a simple config on, but also use ConfigServer Firewall, a separate LAN-based Firewall with a simple config that opens a few ports, and my host's DDOS protection feature.

For external caching, I'm using AWS Cloudfront, but only avatars and my XF Theme's images. Admittedly my Cloudfront billing is practically nothing, but works best for my content-heavy XF site.
 
Top Bottom