Fixed Truncated posts after IPB 3.4 import

Any post with a < in it will be cut short by the importer.

For instance this post if imported from IPB would only read "Any post with a" in XF. Clearly the importer is decoding entities before parsing html tags. So of course &lt; becomes < and then the rest of this post is discarded because the < is assumed to be the start of a broken html tag.
 
@andwhyisit or me? I can confirm the raw content of the post I noticed:
Code:
<throwing my wand into the ring...>
That was the entire contents of the original post. It came through the converter as an empty post.

Without being able to search on a single character, I have no way of knowing how many other posts might have been affected.
 
I was referring to @andwhyisit 's case. If what's in the message is a literal < and >, compared to &lt; and &gt;, we basically have to process it like that as it appears that the messages are generally HTML in newer versions of IPB.
 
The < and > were stored as &lt; and &gt; in the database in all cases.

Before:
Code:
<p>To set a flag, use &lt;FL+(add 4 digit flag number here - can't be more than 8000 IIRC)<br> &lt;FL-(4 digit flag number) to remove a flag<br> &lt;FLJ(event you want to skip to):(flag number) to skip to a certain event if a certain flag you specify is set<br> Also, theres a topic for this, <a href="http://www.cavestory.org/forums/index.php?/topic/4462-quick-moddinghacking-answers-thread/">http://www.cavestory.org/forums/index.php?/topic/4462-quick-moddinghacking-answers-thread/</a></p>

After:
Code:
To set a flag, use

I think I have traced the problem but let me know if I am mistaken.

XenForo_Importer_IPBoard34x::_parseIPBoardBbCode()
..calls XenForo_Importer_IPBoard::_parseIPBoardBbCode()
..which then calls XenForo_Importer_IPBoard34x::_parseIPBoardText()
..which then calls XenForo_Importer_Abstract::_convertToUtf8()
..which finally calls utf8_unhtml()

..which from what I can tell converts all entities back to their plaintext counterparts all before the html is reparsed into BBCode. Which normally wouldn't be a problem if it wasn't for the fact that XenForo_Importer_IPBoard34x::_parseIPBoardBbCode() calls strip_tags() at the end. &lt; and &gt; are generally the things you want to leave as entities until after strip_tags() is called.
 
Last edited:
I've fixed this now, thanks. Basically, you can replace the _parseIPBoardBbCode function in
XenForo_Importer_IPBoard34x with:
Code:
    protected function _parseIPBoardBbCode($message, $autoLink = true)
    {
        $message = preg_replace('/<br( \/)?>(\r?\n)?/si', "\n", $message);
        $message = str_replace('&nbsp;' , ' ', $message);

        // handle the IPB media format
        if (stripos($message, '[media') !== false)
        {
            $message = $this->_parseIPBoardMediaCode($message);
        }

        $search = $this->_getIPBoardBBCodeReplacements();

        $message = preg_replace(array_keys($search), $search, $message);
        $message = strip_tags($message);

        return $this->_convertToUtf8($message, true);
    }
 
One thing to watch out for with this change is the &#39; entity. I've seen one instance of a quote stored in bbcode format in the IPB posts database whose opening tag looked something like this:

Code:
[quote name=&#39;username&#39;]

I don't think it's a common issue but a str_replace on the matches from a preg_replace_callback for /\[[a-z][^\]]+\]/ should fix it nicely.
 
Top Bottom