Expand S3 support for mirroring and geo-location serving purposes

frm

Well-known member
I think XF should consider expanding S3 Object Storage so it can mirror to more than one Object Storage location. This could be useful if one location goes down, as the secondary location could take over. If both locations are online and operable, it might also improve the user experience if data is served from a location closer to the user.

Object Storage is affordable, so even with the trade-off of keeping redundant data in another location, it might still save on transfer costs, especially for larger buckets. For example, if users in the US access a US bucket while users in Europe, Africa, etc., access a bucket in Europe, it could split the transfer load, while increasing page load, and potentially reduce transfer costs. Vultr’s base Object Storage plan includes 1TB free, with additional TBs at $6, so splitting transfers between buckets could save money on transfer costs, the trade-off of transfer costs providing for a redundant Object Storage mirrored location.

The same applies to a US West Coast/US East Coast Object Storage location. If traffic was split based on geo-location, even in the US, it could save transfer expenses which could be applied to having a secondary Object Storage location redundant backup.

Additionally, if a bucket becomes unavailable or unwritable, XF could begin writing to the second bucket, ensuring that they're essentially mirrored when the first comes back online.

This approach would benefit both data reliability and the user experience.

I know that keeping a redundant copy is possible, but it would be much easier baked in to write to both and read from the closest location.

It would probably require moving configuration from config.php to an S3/Object Storage option in the ACP to designate which bucket is where for XF to serve from the closest bucket to the user (also makes it a tad easier to set up Object Storage that way too).

Three-part suggestion:​

  1. Object Storage Mirroring
  2. Geo-location serving the closest Object Storage location to the user's request
  3. ACP option for setting up S3/Object Storage location(s)

@Chris D - is it possible to mirror data to more than one object storage location with something like the above? And if so, in the instance a bucket doesn't resolve, would it request from the 2nd bucket, later mirroring when it comes back online?
Only the last one defined would take effect. Why would you want to do this?
Redundancy.

One bucket goes offline (say the Netherlands for maintenance; it's happened) with the US bucket online to still serve content.

Also, perhaps, for geo-location bucket accessing. Grab from the bucket closest to the user for faster access speeds.
 
Upvote 2
I think XF should consider expanding S3 Object Storage so it can mirror to more than one Object Storage location. This could be useful if one location goes down, as the secondary location could take over. If both locations are online and operable, it might also improve the user experience if data is served from a location closer to the user.

Object Storage is affordable, so even with the trade-off of keeping redundant data in another location, it might still save on transfer costs, especially for larger buckets. For example, if users in the US access a US bucket while users in Europe, Africa, etc., access a bucket in Europe, it could split the transfer load, while increasing page load, and potentially reduce transfer costs. Vultr’s base Object Storage plan includes 1TB free, with additional TBs at $6, so splitting transfers between buckets could save money on transfer costs, the trade-off of transfer costs providing for a redundant Object Storage mirrored location.

The same applies to a US West Coast/US East Coast Object Storage location. If traffic was split based on geo-location, even in the US, it could save transfer expenses which could be applied to having a secondary Object Storage location redundant backup.

Additionally, if a bucket becomes unavailable or unwritable, XF could begin writing to the second bucket, ensuring that they're essentially mirrored when the first comes back online.

This approach would benefit both data reliability and the user experience.

I know that keeping a redundant copy is possible, but it would be much easier baked in to write to both and read from the closest location.

It would probably require moving configuration from config.php to an S3/Object Storage option in the ACP to designate which bucket is where for XF to serve from the closest bucket to the user (also makes it a tad easier to set up Object Storage that way too).

Three-part suggestion:​

  1. Object Storage Mirroring
  2. Geo-location serving the closest Object Storage location to the user's request
  3. ACP option for setting up S3/Object Storage location(s)
You can setup S3 Cross Region Replication yourself but you'd still need to change the bucket url to the replicated one. However, I'm not sure how the syncing between buckets would work once the main bucket comes back online.

Another thing to understand is that if you're in S3 Standard Storage class, your files are ensured to have 99.99% availability and 99.999999999% Durability (files are automatically replicated to a minimum of 3 Availability Zones within your region). That means that it shouldn't be unavailable for more than 9ish hours per year.
 
Another thing to understand is that if you're in S3 Standard Storage class, your files are ensured to have 99.99% availability and 99.999999999% Durability (files are automatically replicated to a minimum of 3 Availability Zones within your region). That means that it shouldn't be unavailable for more than 9ish hours per year.
Was unaware of that as I'm a Vultr customer.

But, to add to it, I didn't do a full cost analysis on savings with redundancy. There are too many variables to consider (space and transfer as well as services and their prices).

My theory was based on 1TB space/1TB transfer Object Storage mirrored with 1TB/1TB Object Storage on the Vultr network.

With Vultr, both would cost $6 - or 2 TB space (mirrored, so 1 TB) and 2TB transfer. However, you might have 2TB of data transfer on your (1), costing another $6 in transfer. If you were able to mirror the two or toggle between them, there would be nearly a 100% savings (perhaps less if geo-location serving was on as one would carry a heavier load) on the mirrored data as you would split the transfer of 2TB over the two Object Storages. After 2 TB and 2 mirrors, if you exceed 1TB in space, there is not much cost-savings, but rather just peace of mind with redundancy, as well as a better user experience if data can be pulled from the Object Storage nearest to them.
 
Back
Top Bottom