Fixed Importing from SMF 2.0 - missing old attachments (and solution?)

Overscan

Active member
So, I bit the bullet at last, paid the money and am test migrating my 12 year old forum from SMF to Xenforo. Its done a pretty good job in general, but I am missing the oldest attachments.

SMF 1.0 named attachment files differently from later versions and also did not store a file_hash value in the database. This caused me identical issues with missing old attachments with a previous import of my forum to Elkarte (an SMF fork) where the importer could not find the attachments due the their different filename format.

I've looked at the Xenforo import code and there is obviously code to deal with attachments missing a file_hash:

Code:
if ($attachment['file_hash'] === '')
            {
                $filePath = "$attachmentPath/$attachment[filename]";
            }
            else
            {
                $filePath = "$attachmentPath/$attachment[id_attach]_$attachment[file_hash]";
            }
However it is mostly wrong.

The actual filenames for my old attachments are like this:

790_Rockwell-1_gif1930c2099b9fe0e6077e09dcf4c68d25

But of course the hash value isn't stored in the database.

This is relevant code for handling legacy attachments these from SMF:

Code:
function getLegacyAttachmentFilename($filename, $attachment_id, $dir = null, $new = false)
{
    global $modSettings;
    $clean_name = $filename;
    // Remove international characters (windows-1252)
    // These lines should never be needed again. Still, behave.
    // Sorry, no spaces, dots, or anything else but letters allowed.
    $clean_name = preg_replace(array('/\s/', '/[^\w_\.\-]/'), array('_', ''), $clean_name);
    $enc_name = $attachment_id . '_' . strtr($clean_name, '.', '_') . md5($clean_name);
    $clean_name = preg_replace('~\.[\.]+~', '.', $clean_name);
    if ($attachment_id == false || ($new && empty($modSettings['attachmentEncryptFilenames'])))
        return $clean_name;
    elseif ($new)
        return $enc_name;
    // Are we using multiple directories?
    if (!empty($modSettings['currentAttachmentUploadDir']))
    {
        if (!is_array($modSettings['attachmentUploadDir']))
            $modSettings['attachmentUploadDir'] = unserialize($modSettings['attachmentUploadDir']);
        $path = $modSettings['attachmentUploadDir'][$dir];
    }
    else
        $path = $modSettings['attachmentUploadDir'];
    if (file_exists($path . '/' . $enc_name))
        $filename = $path . '/' . $enc_name;
    else
        $filename = $path . '/' . $clean_name;
    return $filename;
}
I've never written any PHP code, but all my missing attachments match the $enc_name format so presumably I can do something like this;

Code:
if ($attachment['file_hash'] === '')
            {
                $clean_name = $attachment[filename]
                $clean_name = preg_replace(array('/\s/', '/[^\w_\.\-]/'), array('_', ''), $clean_name);
                $enc_name = $attachment[id_attach] . '_' . strtr($clean_name, '.', '_') . md5($clean_name);
                $filePath = "$attachmentPath/$enc_name
            }
            else
            {
                $filePath = "$attachmentPath/$attachment[id_attach]_$attachment[file_hash]";
            }
and reimport my data?

This would seem to be a good candidate to add to the importer codebase to improve SMF forum to Xenforo migration experience.
 

Chris D

XenForo developer
Staff member
That certainly would seem like the correct code in your case.

It seems like there are other cases here and that's pretty annoying; for example I didn't realise it was possible to have multiple attachment upload directories. It also seems like there's an option to control whether the file names are encrypted or not attachmentEncryptFilenames.

Frankly, I've always found the SMF code base to be a bit of a mess in a few places. They've made changes in a few places and rather than doing any sort of migration internally to new formats (for things like attachment file names) they've just layered new stuff on top of the old stuff making things like this much more difficult than they need to be (not only for people trying to decipher their code, but also for themselves maintaining it, presumably).

I think we'd need to do some more testing. I'm actually going to move this to the XF2 bug reports forum. I'm sure we'll port any fixes back, but right now this change is going to be more relevant to us when we start work on the SMF importer for XF2 in the near future.

Let us know how you get on with those changes.
 

Overscan

Active member
In case anyone else needs this:

Just replace this in SMF.php about line 1992 or so:

Code:
if ($attachment['file_hash'] === '')
            {
                $filePath = "$attachmentPath/$attachment[filename]";
            }
With this :

Code:
        if ($attachment['file_hash'] === '')
            {
                $clean_name = $attachment['filename'];
                $clean_name = preg_replace(array('/\s/', '/[^\w_\.\-]/'), array('_', ''), $clean_name);
                $enc_name = $attachment['id_attach'] . '_' . strtr($clean_name, '.', '_') . md5($clean_name);
                $filePath = "$attachmentPath/$enc_name";
            }
The first version I posted above was missing essential semicolons and quotes in various places. It took me a few tries to get the syntax correct. PHP is all new to me...

With this minor change my 12 year old, 300k posts, 30GB of attachments SMF forum migrated just fine.

One thing to note: the Import Custom Avatars step doesn't show any progress, and just sat there for almost 6 mins before completing. The first time I tried it I thought it wasn't working, went on to the next steps, then found I got rolled back after importing forums and had to abandon that import attempt. Just wait on the main screen until the Import Custom Avatars task finishes.
 
Last edited:

bluecrab

Member
I'm migrating a SMF 2.0 forum with over 165,000 image attachments, the above fix by Overscan resolved a problem that I was having. Thank you!!
 
Last edited:

Overscan

Active member
I'm migrating a SMF 2.0 forum with over 165,000 image attachments, the above fix by Overscan resolved a problem that I was having. Thank you!!
I have 146,962 image attachments. I also hit an issue with a single corrupt JPG which crashed the import, and I had to find it and delete it.
 

Chris D

XenForo developer
Staff member
We're implementing an SMF importer in the 1.3.0 release of the importers. (Not the next release, but the release after).

This issue has been resolved in the new version of the importer.
 

bluecrab

Member
I just now tried using the 1.3.0 Beta release in v2.1.7, it's a good first effort but there's a rather serious bug in that it doesn't import any private messages and avatars. So, for now, I'm using the importer in v1.5 with the code modification shown above.
 

bluecrab

Member
Apparently there are three different filename formats that can be used by an SMF forum. First generation uses just a file name, the second generation uses a combination of attachment id, filename, and hash. The third (and current) generation uses a combination of attachment id and hash. See below for examples (in order generation):

Code:
thisismyphoto.jpg
465_thisismyphoto_jpg1a5cf6d4fd3e79c234912d00ca00a1af
465_39621b067b9a387779d75c3b4a54b3de1c8fd93a
So, with that said, the code modifications shown earlier only work with gen 2 and gen 3 attachments, none of my gen 1 attachments were pulled over. I have modified the code yet again, using code from the 1.3.0 import module as a guide, to fix things so that all three attachment types will be inported using XenForo 1.5.x. Enjoy.

Code:
if ($attachment['file_hash'] === '')
{
  $clean_name = $attachment['filename'];
  $clean_name = preg_replace(['/\s/', '/[^\w_\.\-]/'], ['_', ''], $clean_name);
  $enc_name = $attachment['id_attach'] . '_' . strtr($clean_name, '.', '_') . md5($clean_name);
  $clean_name = preg_replace('~\.[\.]+~', '.', $clean_name);
  $filePath = "$attachmentPath/$enc_name";
  if (!file_exists($filePath))
  {
    $filePath = "$attachmentPath/$clean_name";
  }
}
else
{
  $filePath = "$attachmentPath/$attachment[id_attach]_$attachment[file_hash]";
}
 
Top