Make data & internal_data filesystem adaptor more granular

digitalpoint

Well-known member
The problem:

You want to store attachments and avatars in a non-local filesystem.

The solution:

There are currently 3 locations you can override filesystem adaptors: data, internal-data and code-cache. So to solve "the problem" you can override data and internal-data to use something like AWS S3. So far, so good...

The issue:

Overwriting the data and internal-data filesystem is (for the most part) an "all or nothing" thing. So when I really just wanted to store internal_data/attachments and data/avatars in the cloud, now I'm forced to also store a whole slew of other things in the cloud:

WhereWhat
data/attachmentsAttachment thumbnails
data/audioUser uploaded audio
data/imported_reactionsReactions imported from other forum software
data/profile_bannersUser uploaded banners
data/stylesStyles imported from an archive (why is this in data rather than internal_data?)
data/videoUser uploaded video
internal_data/addon_batchData only used when installing add-ons from an archive
internal_data/file_checkInfo about nightly file checks
internal_data/image_cacheImages from image proxy
internal_data/imported_xmlImported smiles
internal_data/keysEmail DKIM key
internal_data/oembed_cacheMedia embeds
internal_data/sitemapsSitemaps for search engines

Examples of why you wouldn't want to store some of these items in the cloud:
  • Making cloud API calls on the backend every time a user views a reaction on a post (or is given an opportunity to use that reaction) just to get the reaction image. Not efficient.
  • Reading and writing what ultimately is temporary data to the cloud when you install an add-on. Not efficient.
  • Reading proxied images via cloud API calls every time someone views a proxied image (at least partially) defeats the purpose of the image proxy.
  • Making cloud API calls to read what is a singular DKIM key every time an email is sent? This one is SO bad. We've now slowed down outgoing mails and added an actual cost to send emails (cloud API calls aren't terribly expensive, but if you are sending a ton of emails from XenForo, which large sites will be doing, 10,000,000 individually cheap emails is a real cost). Real-world example: Let's say the API call only takes 1/4 of a second to fully complete. You send a mass email to 10,000 users. Doing the same API call once for each email (to get the same key each time), and you just added 42 minutes of extra time to send those 10,000 emails on top of paying for each API call.
There's already a suggestion thread specifically related to the DKIM keys, but I think it would be better to have that simply be part of what could be a more granular system:


I don't think it would be terribly hard to do and also maintain backward compatibility (see my post in the above thread).
 
Upvote 12

Kirby

Well-known member
WhereWhat
data/imported_reactionsReactions imported from other forum software

Those can easily be moved after import (for example to styles), as this is a one-time effoert I think this isn't too bad.
WhereWhat
data/stylesStyles imported from an archive (why is this in data rather than internal_data?)
This directory contains style assets (Images, Fonts,, etc.) - those resources must be in data as they are directly accessed by browsers and internal_data isn't directly accessible.
Saving those resources in data rather than styles is purely done for convenience reasons; when importing in development mode assets will go to styles.

Making cloud API calls on the backend every time a user views a reaction on a post (or is given an opportunity to use that reaction) just to get the reaction image. Not efficient.
I am not sure if I understand this - which API calls are you talking about here?

Reading and writing what ultimately is temporary data to the cloud when you install an add-on. Not efficient.
Agreed, though I don't think is is a sigificant issue.

Reading proxied images via cloud API calls every time someone views a proxied image (at least partially) defeats the purpose of the image proxy.
The main purpose of the image proxy is to service images that otherwise couldn't be served (because they are on HTTP while the forum is running HTTPS).
It's also more or less required for GDPR compliance.

Making cloud API calls to read what is a singular DKIM key every time an email is sent? This one is SO bad.
Absolutely, that's why I made a specific bug report for this - even if the key is stored in internal_data it shoud'nt be read for every single mail (that's insane!) but once per script call.

I don't think it would be terribly hard to do and also maintain backward compatibility (see my post in the above thread).
Unless I am missing smth. or you suggest to implement an OverlayFS-type Flysystem Adapter I don't think it is possible to implement this suggestion without breaking backwards compatibility as every single VFS path used in XenForo and Add-ons would have to be changed, eg. data://attachments -> attachments://
 
Last edited:

digitalpoint

Well-known member
Those can easily be moved after import (for example to styles), as this is a one-time effoert I think this isn't too bad.
Yep, but it's not really ideal.

This directory contains style assets (Images, Fonts,, etc.) - those resources must be in data as they are directly accessed by browsers and internal_data isn't directly accessible.
Saving those resources in data rather than styles is purely don for convenience reasons; when importing in development mode assets will go to styles.
Ah... that makes sense, I've never installed a style on any version of XenForo ever, so had no clue. :)

I am not sure if I understand this - which API calls are you talking about here?
Might be me not understanding what's stored in there. I assumed it was images from imported reactions. If it's just temporary and nothing stays there, it's the same as the first point. Workable, but not ideal.


Agreed, though I don't think is is a sigificant issue.
Yep, it can also be worked around, but it's also not ideal.


The main purpose of the image proxy is to service images that otherwise couldn't be served (because they are on HTTP while the forum is running HTTPS).
It's also more or less required for GDPR compliance.
Right, that's why I said it "partially" defeats the purpose. A purpose some (including myself) use it for is to speed up the site and not being reliant on third-party domains. But now we are back to fetching the image from a remote server just to serve it from cache. Caching of things on geographically remote servers kind of defeats the purpose of a cache. Image if your browser cache lived on a server 1,000 miles from your computer.

Absolutely, that's why I made a specific bug report for this - even if the key is stored in internal_data it shoud'nt be read for every single mail (that's insane!) but once per script call.
Right... I understand it extensively. I was the one that made the original suggestion for DKIM support, and as part of that suggestion, my implementation was to add the DKIM key to options, because it's write once, read a billion times type of thing. The file system isn't the right place for it. I didn't suggest it go in options because options was the only place I could think of to store it. haha


Unless I am missing smth. or you suggest to implement an OverlayFS-type Flysystem Adapter I don't think it is possible to implement this suggestion without breaking backwards compatibility as every single VFS path used in XenForo and Add-ons would have to be changed, eg. data://attachments -> attachments://
Nah, it can be done without changing the VFS paths. In fact, I had an idea of how to do it simply by extending a single XenForo method. I haven't had time to build it yet, so I can't say for certain if there might be something I'm not thinking of that wouldn't work. But we'll see... :)
 

Kirby

Well-known member
Might be me not understanding what's stored in there. I assumed it was images from imported reactions.
Yes, those are images for imported reactions and they are not temporary, but as said before - I don't get what API calls you are talking about.

I didn't suggest it go in options because options was the only place I could think of to store it. haha
I probably wouldn't keep DKIM key material in options (as it's not really an option), quite a bit more data than average options and not needed in many cases (for example for guest views) - I probably would have put it into the registry.

Nah, it can be done without changing the VFS paths.
Sounds interesting :)
 

digitalpoint

Well-known member
Yes, those are images for imported reactions and they are not temporary, but as said before - but as said before, I don't get what API calls you are talking about.
Okay, say you had images in there... and data is using AWS S3 as your filesystem. In that case, whenever an image in there needs to be presented to the user, we need to be reading it via AWS API (generically "cloud api calls"). :)

I probably wouldn't keep DKIM key material in options (as it's not really an option), quite a bit more data than average options and not needed in many cases (for example for guest views) - I probably would have put it into the registry.
Other "keys" are stored in options (for example for captcha authentication) as well as not-human-usable credentials (like Google OAuth credentials for email transport). And then on the flip side, there are some things that I consider actual options that are stored in the registry (avatarSizeMap & inlineImageTypes come to mind, but there are others).

But yes... registry works too. Whatever that is not the file system and can have an in-memory cache is fine by me so either would work.
 

Kirby

Well-known member
In that case, whenever an image in there needs to be presented to the user, we need to be reading it via AWS API (generically "cloud api calls"). :)
Erm ... no?

Files stored in data (like avatars, attachment thumbnails, etc.) are usually directly accessed by the browser to present them to the user, no API calls are involved there.
 

digitalpoint

Well-known member
Erm ... no?

Files stored in data (like avatars, attachment thumbnails, etc.) are usually directly accessed by the browser to present them to the user, no API calls are involved there.
Usually, but not always. Say you used something like Cloudflare R2 (a drop-in replacement for AWS S3). You don't have to use alternate sub-domains for serving that content (say for example you didn't want people knowing you were using AWS for whatever reason, having those images prepended with something like https://s3.eu-west-2.amazonaws.com might not be an option). There may also be GDPR ramifications in some case where fetching images in that way is disclosing things like IP and user agent of your end-users to a third party (AWS). What you can do in that case is use Cloudflare Workers where you are intercepting the HTTP request to your domain, and doing the backend API calls at that point and serving them directly without an alternate sub-domain. Again, it's not the way everyone does it, but some do.


Doing it with something like AWS sub-domains, is effectively doing the same thing on the backend, just without YOU needing to make the API calls, rather AWS doing it for you seamlessly (but ultimately a API calls are being made to read your bucket/object).
 

digitalpoint

Well-known member
I tend to think more in how things work on the backend. Replace where I said, "cloud API calls" with "cloud requests". My whole point was based around imported_reactions (and everything else) being forced into S3 if someone really just wanted to have avatars in S3. Cloud API calls or cloud requests (which cost money for each one) are synonymous for what I was (poorly) trying to convey. :)

Ultimately, my point is that if each directory within data and internal_data could be setup as a site wanted/needed (maybe it's as simple as using the same S3 service, but each folder being in a different bucket or something if they needed it that way for example).

The current "all or nothing" way of doing it isn't ideal imo, that's all.
 

digitalpoint

Well-known member
If anyone wants to see the underlying code of how it can be done, I made a tiny addon that allows you to store just the internal_data/keys folder in it's own file system (I also made a Data Registry Flysystem adapter for that purpose). So you can store just the DKIM in XenForo's registry (without moving all of internal_data there).

 
Top