Confirmed Importing from SMF 2.0 - missing old attachments (and solution?)

#1
So, I bit the bullet at last, paid the money and am test migrating my 12 year old forum from SMF to Xenforo. Its done a pretty good job in general, but I am missing the oldest attachments.

SMF 1.0 named attachment files differently from later versions and also did not store a file_hash value in the database. This caused me identical issues with missing old attachments with a previous import of my forum to Elkarte (an SMF fork) where the importer could not find the attachments due the their different filename format.

I've looked at the Xenforo import code and there is obviously code to deal with attachments missing a file_hash:

Code:
if ($attachment['file_hash'] === '')
            {
                $filePath = "$attachmentPath/$attachment[filename]";
            }
            else
            {
                $filePath = "$attachmentPath/$attachment[id_attach]_$attachment[file_hash]";
            }
However it is mostly wrong.

The actual filenames for my old attachments are like this:

790_Rockwell-1_gif1930c2099b9fe0e6077e09dcf4c68d25

But of course the hash value isn't stored in the database.

This is relevant code for handling legacy attachments these from SMF:

Code:
function getLegacyAttachmentFilename($filename, $attachment_id, $dir = null, $new = false)
{
    global $modSettings;
    $clean_name = $filename;
    // Remove international characters (windows-1252)
    // These lines should never be needed again. Still, behave.
    // Sorry, no spaces, dots, or anything else but letters allowed.
    $clean_name = preg_replace(array('/\s/', '/[^\w_\.\-]/'), array('_', ''), $clean_name);
    $enc_name = $attachment_id . '_' . strtr($clean_name, '.', '_') . md5($clean_name);
    $clean_name = preg_replace('~\.[\.]+~', '.', $clean_name);
    if ($attachment_id == false || ($new && empty($modSettings['attachmentEncryptFilenames'])))
        return $clean_name;
    elseif ($new)
        return $enc_name;
    // Are we using multiple directories?
    if (!empty($modSettings['currentAttachmentUploadDir']))
    {
        if (!is_array($modSettings['attachmentUploadDir']))
            $modSettings['attachmentUploadDir'] = unserialize($modSettings['attachmentUploadDir']);
        $path = $modSettings['attachmentUploadDir'][$dir];
    }
    else
        $path = $modSettings['attachmentUploadDir'];
    if (file_exists($path . '/' . $enc_name))
        $filename = $path . '/' . $enc_name;
    else
        $filename = $path . '/' . $clean_name;
    return $filename;
}
I've never written any PHP code, but all my missing attachments match the $enc_name format so presumably I can do something like this;

Code:
if ($attachment['file_hash'] === '')
            {
                $clean_name = $attachment[filename]
                $clean_name = preg_replace(array('/\s/', '/[^\w_\.\-]/'), array('_', ''), $clean_name);
                $enc_name = $attachment[id_attach] . '_' . strtr($clean_name, '.', '_') . md5($clean_name);
                $filePath = "$attachmentPath/$enc_name
            }
            else
            {
                $filePath = "$attachmentPath/$attachment[id_attach]_$attachment[file_hash]";
            }
and reimport my data?

This would seem to be a good candidate to add to the importer codebase to improve SMF forum to Xenforo migration experience.
 

Chris D

XenForo developer
Staff member
#2
That certainly would seem like the correct code in your case.

It seems like there are other cases here and that's pretty annoying; for example I didn't realise it was possible to have multiple attachment upload directories. It also seems like there's an option to control whether the file names are encrypted or not attachmentEncryptFilenames.

Frankly, I've always found the SMF code base to be a bit of a mess in a few places. They've made changes in a few places and rather than doing any sort of migration internally to new formats (for things like attachment file names) they've just layered new stuff on top of the old stuff making things like this much more difficult than they need to be (not only for people trying to decipher their code, but also for themselves maintaining it, presumably).

I think we'd need to do some more testing. I'm actually going to move this to the XF2 bug reports forum. I'm sure we'll port any fixes back, but right now this change is going to be more relevant to us when we start work on the SMF importer for XF2 in the near future.

Let us know how you get on with those changes.
 
#3
In case anyone else needs this:

Just replace this in SMF.php about line 1992 or so:

Code:
if ($attachment['file_hash'] === '')
            {
                $filePath = "$attachmentPath/$attachment[filename]";
            }
With this :

Code:
        if ($attachment['file_hash'] === '')
            {
                $clean_name = $attachment['filename'];
                $clean_name = preg_replace(array('/\s/', '/[^\w_\.\-]/'), array('_', ''), $clean_name);
                $enc_name = $attachment['id_attach'] . '_' . strtr($clean_name, '.', '_') . md5($clean_name);
                $filePath = "$attachmentPath/$enc_name";
            }
The first version I posted above was missing essential semicolons and quotes in various places. It took me a few tries to get the syntax correct. PHP is all new to me...

With this minor change my 12 year old, 300k posts, 30GB of attachments SMF forum migrated just fine.

One thing to note: the Import Custom Avatars step doesn't show any progress, and just sat there for almost 6 mins before completing. The first time I tried it I thought it wasn't working, went on to the next steps, then found I got rolled back after importing forums and had to abandon that import attempt. Just wait on the main screen until the Import Custom Avatars task finishes.
 
Last edited:
#4
I'm migrating a SMF 2.0 forum with over 165,000 image attachments, the above fix by Overscan resolved a problem that I was having. Thank you!!
 
Last edited:
Top