XF 2.3 XF's standard importer & abstracted filesystem

eDaddi

Active member
Not sure if it's a bug, something overlooked or I'm chasing my tail. Here is what I am seeing:

  1. I'm running Social Groups for XenForo [Paid] on my live site (v2.2.16) and like many others, we've been waiting too long for 2.3 compatibility.
  2. I copied my live site to dev, and updated XF to 2.3.4
  3. I installed [DBTech] DragonByte Social Groups [Paid] which is 2.3 compatible & has an importer to migrate from Song's add-on.
  4. I use R2 buckets for attachment storage.
  5. The importer errors out if attachments are not local. (this is where I think there is a bug)
  6. I have to duplicate everything from R2 buckets to my local to run the importer ... which is not practical.
  7. After completing #6, Import runs successfully.
  8. I remove the attachments so XF looks back to R2 buckets for those items.
  9. Things like images within the group threads stop loading.
After extensive discussions with [DBTech] and an R2 expert (both of whom I've already pestered enough), I wonder if the default XF importer doesn't use the abstract file system. If it does, I’m unsure why issue #5 persists.

Here are some quotes from this forum and the dev's discussing this that seem relevant to troubleshooting:
The "magic" where everything just inherently works automatically does rely on third-party addons using XenForo's abstracted filesystem for things.
It's only the importer portion, because it does the same as XenForo's standard importer. Social Groups doesn't refer to attachment paths at all outside of the importer. Chances are the attachments are created with file_path column set in the xf_attachment_data
Both add-ons are confirmed using the abstracted filesystem for normal operations. Every image on my live & dev DB have blanks for file_path, only exception is audio & video entries. I have not tested with those two formats yet as 99% of the attachments not showing are images.

It doesn't make sense for importers to use abstracted paths.
Since data is just being moved around in DB tables, this makes sense .... but I'm guessing the XF's standard importer is checking if files exist locally which is why that portion of the import fails. Yes/No?
 
Last edited:
For added context since my add-on is mentioned; I'm using the XF default \XF\Import\Data\Attachment handler and the actual attachment import is a copy of src/addons/XFI/Import/Importer/XenForo2's stepAttachments with adjustments to the sourceDb query only to fetch the correct attachment records.

I don't know what part of the process actually causes failure when attempting to read files from remote storage during import, as I don't run any kind of external storage so I have no testing environment for this.
 
I wonder if the default XF importer doesn't use the abstract file system.
This. We generally expect the source files to be hosted locally. At least, that's how all our importers do it.

the actual attachment import is a copy of src/addons/XFI/Import/Importer/XenForo2's stepAttachments
This is somewhat the root cause. In our own importers, we do this:

PHP:
            $sourceFile = $this->getSourceAttachmentDataPath(
                $attachment['data_id'],
                $attachment['file_path'],
                $attachment['file_key'] ?? $attachment['file_hash']
            );
            if (!file_exists($sourceFile) || !is_readable($sourceFile))
            {
                continue;
            }

Which ultimately results in this:

PHP:
        return strtr($path, [
            'internal-data://' => $this->baseConfig['internal_data_dir'] . '/',
            'data://' => $this->baseConfig['data_dir'] . '/',
        ]);

In other words we convert the abstracted file path to a local one.

This is entirely intentional as in the sheer majority of scenarios files will be locally stored and that would usually be better to avoid some of the inherent issues with doing such things remotely.

That being said, as importers such as this would presumably be working on a much smaller subset of the overall files, you could do something like this:

PHP:
            $attachData = \XF::em()->find(\XF\Entity\AttachmentData::class, $attachment['data_id']);
            $abstractedPath = $attachData->getAbstractedDataPath();

            $sourceFile = \XF\Util\File::copyAbstractedPathToTempFile($abstractedPath);

            // code continues as normal, e.g. from XenForo2.php

            /** @var Attachment $import */
            $import = $this->newHandler(Attachment::class);
            $import->bulkSet($this->mapKeys($attachment, [
                'content_type',
                'attach_date',
                'temp_hash',
                'unassociated',
                'view_count',
            ]));
            $import->content_id = $contentId;
            $import->setDataExtra('upload_date', $attachment['upload_date']);
            $import->setDataExtra('file_path', $attachment['file_path']);
            $import->setDataUserId($this->lookupId('user', $attachment['user_id'], 0));
            $import->setSourceFile($sourceFile, $attachment['filename']);
            $import->setContainerCallback([$this, 'rewriteEmbeddedAttachments']);

I believe the downstream code from this should use the remote buckets if configured.
 
That being said, as importers such as this would presumably be working on a much smaller subset of the overall files
It's unknowable how big of a subset of the files I'd be working with and whether the system would run out of disk space partway through the import process even if the temp files are cleaned up on the next iteration 🤔

I think it's reasonable to ask to move the attachments back to the local file system temporarily for the import, but the ultimate problem is; what does @eDaddi need to do to get attachments working again after moving them back to R2 post-import?
 
It's unknowable how big of a subset of the files I'd be working with and whether the system would run out of disk space partway through the import process even if the temp files are cleaned up on the next iteration 🤔
That's not really your concern, that's an issue for whoever is running the server. But temp files are cleaned up automatically after each batch anyway.

I think it's reasonable to ask to move the attachments back to the local file system temporarily for the import
Yeah I agree.

what does @eDaddi need to do to get attachments working again after moving them back to R2 post-import?
I'm actually not sure. I assume @eDaddi isn't actually moving them back to R2 as that isn't mentioned in the OP steps. If the R2 add-on/bucket configuration remains enabled, then in theory, the "new" attachment should already be saved remotely.

There would be a few troubleshooting steps.

1. Make sure the imported attachments/files have made their way to the remote bucket and exist on the file system
2. Create new content with new attachments/files after the import to see if it's only imported content affected or if new content is affected too
3. If all else fails, disable the R2 bucket configuration during the import and manually sync the data back to the buckets after the import

Aside from that, not really seeing where the issue would lie.
 
@Chris D & @DragonByte Tech, I admit I don’t know what part of the import is XF or DBT’s, I thought this was written by at DBT just for his add-on which is why I was originally peppering him with questions. Sorry for that DBT. 🍻

It's good to know XF has import functions to build on. Anyway ... just trying to help the devs and community.

I don't know what part of the process actually causes failure when attempting to read files from remote storage during import, as I don't run any kind of external storage so I have no testing environment for this.
After a ton of testing, I think this is the only real issue I ran into with the importer for THIS import process:
Screenshot 2024-12-05 at 10.34.56 AM.webp
This happens are soon as you kick off the import process.

what does @eDaddi need to do to get attachments working again after moving them back to R2 post-import?
I did finally find a process that works:
  1. To get passed the error shown above, I created 3 empty folders that the importer is looking for, which are:
    1. data/avatars
    2. data/attachments
    3. data/groups
  2. That lets me to get passed the error above and gets me here:

    Screenshot 2024-11-25 at 6.52.40 PM.webp
    I have to uncheck the 'Content IDs' or I get duplicate ID errors but I'm OK with new IDs.

  3. I delete the folders created above BEFORE clicking the 'Continue...' button in the above pic.
  4. Import process does its thing and DBT's add-on works just fine. All images (attachments) appear in threads from R2 buckets as expected.
So overall I'm good, I'm up and running as expected on my dev site.

FWIW: I did test by rsync'n those 3 folders to my dev site to test. The import process ran fine, but once complete, I removed those folders and attachments in the group threads did not display. The rest of the site still worked as expected.

I think it's reasonable to ask to move the attachments back to the local file system temporarily for the import
Yeah I agree.
For small sites, sure. Moving to R2 storage let me downgrade my hosting plan to save money. Moving hundreds of GB back locally is a hassle. Besides transfer time for a 1-minute import, I had to upgrade my hosting plan for 2 months just to test it. Testing at the end of the month and falling into the next was my mistake, but you get the point.

This. We generally expect the source files to be hosted locally. At least, that's how all our importers do it.
This is entirely intentional as in the sheer majority of scenarios files will be locally stored and that would usually be better to avoid some of the inherent issues with doing such things remotely.
I totally get that and agree that the vast majority of XF instances keep everything local. But .. there are threads here detailing how to configure remote storage for several hosts, XF now has core features to support it and there are add-ons like [DigitalPoint] App for Cloudflare that make it super simple to do, I'm sure you'll see a lot more admins implementing this to save $ and increase site speeds.

Given my two points above, and especially with XF bundling support for remote object storage in 2.3, I thought it was an issue that importer processes do not account for that.
 
Back
Top Bottom