Using DigitalOcean Spaces or Amazon S3 for file storage in XF 2.1+

Using DigitalOcean Spaces or Amazon S3 for file storage in XF 2.1+

No permission to download
does your config file have all 3 adapter instances in it? It's likely a mis-match of your half-complete migration where it's trying to source an image where it is not loaded to
 
Yes, it has all three adapter instances in it. The only portion I have not completed yet is migrating the old attachments to S3. The problem I am facing is for the newly uploaded files that actually get to the S3 Bucket. Again, the attachment thumbnails do not work, while when opening the image in the lightbox, it shows up correctly. The same broken image problem with the avatars as well.
 
Is the data folder in your S3 bucket actually being populated? If you right click a thumbnail, click Inspect, and expand the anchor tag - what URL is it showing for the <img> tag?
 
I can see your XF site from its IP address via your license callbacks so I was able to find a broken image and get the URL for an image we're trying to load:

Code:
https://s3.eu-central-1.amazonaws.com/data/avatars/s/6/6816.jpg?1613728101

Visiting this URL gives an error:

XML:
<Error>
<Code>PermanentRedirect</Code>
<Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message>
<Endpoint>data.s3-ap-northeast-1.amazonaws.com</Endpoint>
<Bucket>data</Bucket>
<RequestId>3233044C22A99172</RequestId>
<HostId>BiZe3oWFWudDZmK/hHa7dL9My0BpkJm2/uGlS20TXX5UboQDx8QHswhvK4JazBYyJnkakxRq78E=</HostId>
</Error>

I'm assuming the issue is with the entry in your config.php that looks similar to this:

PHP:
$config['externalDataUrl'] = function($externalPath, $canonical)
{
   return 'https://xftest.s3.eu-west-2.amazonaws.com/data/' . $externalPath;
};

I guess you currently have it set to:

PHP:
$config['externalDataUrl'] = function($externalPath, $canonical)
{
   return 'https://s3.eu-central-1.amazonaws.com/data/' . $externalPath;
};

But the error seems to suggest you need to change it to:

PHP:
$config['externalDataUrl'] = function($externalPath, $canonical)
{
   return 'https://data.s3-ap-northeast-1.amazonaws.com/data/' . $externalPath;
};

Or something like that anyway. Fair to say I'm no Amazon AWS/S3 expert.
 
Chris, thank you very much for the reply.
Much appreciated.

Unfortunately it doesn't work this way. I see that while it has created the filesystem in the bucket for the attachments it has not for the avatars. -No change at all.
Actually this northeast region is in Tokyo while the bucket I have created is in Frankfurt.

I will raise it with Amazon as well, if they provide a reply, will post it for future reference. If anyone has any other ideas, it will be more than appreciated.
 
I may have similar issue but have yet to analyze. (lots of errors in server log related to /data/ image files). Thanks for posting about it.
 
Note that the attachment URL for your attachments will remain the same - it will still be you’re URL. But we stream this from the remote location.

If we didn’t do this then we wouldn’t be able to still handle permissions.
That doesn't have to be the case. You can create a pre-signed URL from PHP and redirect to said URL. It gives you a 36-hour valid URL to a private object. I would much prefer if this was possible in XenForo / adapters. One would need to extend XF\Pub\View\Attachment\View to cause a redirect instead of streaming the file.

Is this something you'd consider?
 
It'd be a suggestion for the core software, if one doesn't already exist (it may do).

This resource just aims to provide a guide and easier access to the AWS SDK so doesn't and won't contain any custom code.
 
There is no internalDataPath and due to various approaches - mainly permission checks - we have to serve the attachments from a URL where we can do those checks.

The offloading issue is indeed an issue that we're conscious of but we've not take steps to address that yet.
Chris, is this something that has been looked into further? If the attachments from /internal_data/ are streamed through the hosting server, wouldn't this then cause double the bandwidth usage?
 
I guess it depends how bandwidth is measured. If bandwidth is measured based on what a client requests from the server then it's no different because from a client perspective, it just downloads a file from the server. If you're measuring all inbound and outbound traffic from the server then, yes, I guess bandwidth is effectively doubled when both downloading an attachment and uploading an attachment.

Even if we made a change for the download case, I assume uploads would still use double bandwidth. The file has to be uploaded to the server before it is then uploaded to the remote storage.

The download case is something we're aware of and ideally we'd like to support in the future, though it would be optional as it would effectively make potentially private attachments accessible to anyone who happens to have the URL and no longer directly protected by node / conversation permissions.
 
Thanks Chris, it's more from a server point of view that I'm currently look at.

Basically, image heavy site (500GB of attachments), that was previously behind Cloudflare, with page rules to cache /attachments/ (negating permissions anyway on images cached on the edge):

1617205907221.webp

You can see from the above where Cloudflare was disabled, and network usage went from 5Mbps to ~50Mbps, which shows what a good job CF itself was doing in offloading the network load (plus getting the content closer to the end users).

So offloading the images to S3 buckets would double this if the images are streamed through the hosting server.
 
I also have one question. I am running this for 2 months now.

Is it normal that besides internal_data/attachments folder that internal_data/filecheck and internal_data/sitemaps are also the ones being uploaded to the s3 server? Just to be safe I uploaded the whole internal_data folder to the s3 storage server and I see that those 3 sub-folders get regularly updated by new files. And there is nothing mentioned about that so just to be safe, wanted to ask it here.
 
The download case is something we're aware of and ideally we'd like to support in the future, though it would be optional as it would effectively make potentially private attachments accessible to anyone who happens to have the URL and no longer directly protected by node / conversation permissions.

I think there's a simple per-node setting here that could simply allow admins to choose their privacy concerns.

eg, I probably only care about my admin forum being permission checked and streamed through the server. all the rest can be served direct through the cdn (eg, cloudfront on top of s3), regardless of banned permissions, ability to see the forum the attachment is in, etc.

something like
if(nodeAttachmentPermsOpen() { url=cdn.domain.com/attachment ...' }
else { url = domain.com/attachment... ; ]

let the url dictate needing a permission check.
 
So offloading the images to S3 buckets would double this if the images are streamed through the hosting server.
Yeah pretty much.

I also have one question. I am running this for 2 months now.

Is it normal that besides internal_data/attachments folder that internal_data/filecheck and internal_data/sitemaps are also the ones being uploaded to the s3 server? Just to be safe I uploaded the whole internal_data folder to the s3 storage server and I see that those 3 sub-folders get regularly updated by new files. And there is nothing mentioned about that so just to be safe, wanted to ask it here.
The abstracted file system in XF works on the basis of "mounts". We actually have three. data, internal-data and code-cache. This resource goes through the steps of changing the data and internal-data mounts to use the S3 adapter rather than the local file system adapter.

We don't distinguish further than that so everything in data, everything in internal_data is hosted remotely whether it be an attachment or some other file. Bear in mind that while attachments is the primary one that people may want to host remotely add-ons may host other larger files elsewhere in internal_data, and perhaps we might for other things in future so it makes much more sense to do it based on the entire internal_data directory to ensure we catch everything. The benefits of that are significant whereas hosting smaller files remotely that arguably isn't beneficial still doesn't have any negative effects overall.

Two exceptions:
  • Because code-cache is its own mount, we still write out files to the local directory internal_data/code_cache. We have to do this because this contains things like compiled templates which need to be executed directly on the server. You don't need to maintain a copy of this remotely if you copied it over.
  • The default temp file location we will use is still internal_data/temp. There isn't a separate mount for this, and it cannot be changed, it's just always written to as internal_data/temp. Again, you don't need to maintain a copy of this remotely.
 
  • Like
Reactions: sbj
I think there's a simple per-node setting here that could simply allow admins to choose their privacy concerns.

eg, I probably only care about my admin forum being permission checked and streamed through the server. all the rest can be served direct through the cdn (eg, cloudfront on top of s3), regardless of banned permissions, ability to see the forum the attachment is in, etc.

something like
if(nodeAttachmentPermsOpen() { url=cdn.domain.com/attachment ...' }
else { url = domain.com/attachment... ; ]

let the url dictate needing a permission check.
Yeah I think roughly that might have been the approach we have discussed in the past.
 
@Chris D

i am using XF 2.2 and try to upload my files on DO,and i installed the plugin (v.2.1), put the following code into my config.php

Code:
$s3 = function()

{

   return new \Aws\S3\S3Client([

      'credentials' => [

         'key' => 'xxx',

         'secret' => 'xxx'

      ],

      'region' => 'sfo3',

      'version' => 'latest',

      'endpoint' => 'https://sfo3.digitaloceanspaces.com'

   ]);

};


$config['fsAdapters']['data'] = function() use($s3)

{

   return new \League\Flysystem\AwsS3v3\AwsS3Adapter($s3(), 'name', 'data');

};


$config['externalDataUrl'] = function($externalPath, $canonical)

{

   return 'https://name.sfo3.digitaloceanspaces.com/data/' . $externalPath;

};


$config['fsAdapters']['internal-data'] = function() use($s3)

{

   return new \League\Flysystem\AwsS3v3\AwsS3Adapter($s3(), 'name', 'internal_data');

};


looks like it work , but i noticed that only thumb,avatar and profile banner image shows DO URL, the full size image still shows internal URL and i can see them in the XF admin panel, i logged into my ftp accounts, they still in my local server (data folder), i don't know what's going on here
 
Last edited:
Good morning everyone.
@Chris D

i am using XF 2.2 and try to upload my files on DO,and i installed the plugin (v.2.1), put the following code into my config.php

Code:
$s3 = function()

{

   return new \Aws\S3\S3Client([

      'credentials' => [

         'key' => 'xxx',

         'secret' => 'xxx'

      ],

      'region' => 'sfo3',

      'version' => 'latest',

      'endpoint' => 'https://sfo3.digitaloceanspaces.com'

   ]);

};


$config['fsAdapters']['data'] = function() use($s3)

{

   return new \League\Flysystem\AwsS3v3\AwsS3Adapter($s3(), 'name', 'data');

};


$config['externalDataUrl'] = function($externalPath, $canonical)

{

   return 'https://name.sfo3.digitaloceanspaces.com/data/' . $externalPath;

};


$config['fsAdapters']['internal-data'] = function() use($s3)

{

   return new \League\Flysystem\AwsS3v3\AwsS3Adapter($s3(), 'name', 'internal_data');

};


looks like it work , but i noticed that only thumb,avatar and profile banner image shows DO URL, the full size image still shows internal URL and i can see them in the XF admin panel, i logged into my ftp accounts, they still in my local server (data folder), i don't know what's going on here
I'm having the exact same result via S3. I've read through this thread a few times and I see where others have reported the same but didn't see a resolution note. Can someone confirm that full-size attachments should show the S3 or DO URL instead of an internal URL?
 
Full size attachments will still have an internal URL. XF has a layer of permission checking for attachments, so if you happen to know the attachment URL, you can only see the attachment if you have permission to view the content it is associated with.

The exception is audio and video attachments which have to be handled differently so they will be served from a S3/DO URL.

If you can see that the attachments are being uploaded to S3/DO and not your internal local disk then this is also where they're streamed from but via an internal URL on your site.
 
Thanks @Chris D! Also, the use of xftest in the instructions for the bucket name and user was a bit confusing for this newbie to the S3 world when I was debugging an issue of using a CNAME with my S3 bucket. Just wanted to let you know if you ever revisit those instructions. Thanks as always for you quick replies and information.
 
Top Bottom