1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Fixed Truncated posts after IPB 3.4 import

Discussion in 'Resolved Bug Reports' started by andwhyisit, Sep 8, 2015.

  1. andwhyisit

    andwhyisit Member

    Any post with a < in it will be cut short by the importer.

    For instance this post if imported from IPB would only read "Any post with a" in XF. Clearly the importer is decoding entities before parsing html tags. So of course &lt; becomes < and then the rest of this post is discarded because the < is assumed to be the start of a broken html tag.
  2. Tiki Tiki

    Tiki Tiki Active Member

  3. Mike

    Mike XenForo Developer Staff Member

    Can you confirm the raw content of the post in the database as it was in IPB?
  4. Tiki Tiki

    Tiki Tiki Active Member

    @andwhyisit or me? I can confirm the raw content of the post I noticed:
    <throwing my wand into the ring...>
    That was the entire contents of the original post. It came through the converter as an empty post.

    Without being able to search on a single character, I have no way of knowing how many other posts might have been affected.
  5. Mike

    Mike XenForo Developer Staff Member

    I was referring to @andwhyisit 's case. If what's in the message is a literal < and >, compared to &lt; and &gt;, we basically have to process it like that as it appears that the messages are generally HTML in newer versions of IPB.
  6. andwhyisit

    andwhyisit Member

    The < and > were stored as &lt; and &gt; in the database in all cases.

    <p>To set a flag, use &lt;FL+(add 4 digit flag number here - can't be more than 8000 IIRC)<br> &lt;FL-(4 digit flag number) to remove a flag<br> &lt;FLJ(event you want to skip to):(flag number) to skip to a certain event if a certain flag you specify is set<br> Also, theres a topic for this, <a href="http://www.cavestory.org/forums/index.php?/topic/4462-quick-moddinghacking-answers-thread/">http://www.cavestory.org/forums/index.php?/topic/4462-quick-moddinghacking-answers-thread/</a></p>
    To set a flag, use
    I think I have traced the problem but let me know if I am mistaken.

    ..calls XenForo_Importer_IPBoard::_parseIPBoardBbCode()
    ..which then calls XenForo_Importer_IPBoard34x::_parseIPBoardText()
    ..which then calls XenForo_Importer_Abstract::_convertToUtf8()
    ..which finally calls utf8_unhtml()

    ..which from what I can tell converts all entities back to their plaintext counterparts all before the html is reparsed into BBCode. Which normally wouldn't be a problem if it wasn't for the fact that XenForo_Importer_IPBoard34x::_parseIPBoardBbCode() calls strip_tags() at the end. &lt; and &gt; are generally the things you want to leave as entities until after strip_tags() is called.
    Last edited: Sep 9, 2015
  7. Mike

    Mike XenForo Developer Staff Member

    I've fixed this now, thanks. Basically, you can replace the _parseIPBoardBbCode function in
    XenForo_Importer_IPBoard34x with:
        protected function _parseIPBoardBbCode($message, $autoLink = true)
            $message = preg_replace('/<br( \/)?>(\r?\n)?/si', "\n", $message);
            $message = str_replace('&nbsp;' , ' ', $message);
            // handle the IPB media format
            if (stripos($message, '[media') !== false)
                $message = $this->_parseIPBoardMediaCode($message);
            $search = $this->_getIPBoardBBCodeReplacements();
            $message = preg_replace(array_keys($search), $search, $message);
            $message = strip_tags($message);
            return $this->_convertToUtf8($message, true);
  8. andwhyisit

    andwhyisit Member

    One thing to watch out for with this change is the &#39; entity. I've seen one instance of a quote stored in bbcode format in the IPB posts database whose opening tag looked something like this:

    [quote name=&#39;username&#39;]
    I don't think it's a common issue but a str_replace on the matches from a preg_replace_callback for /\[[a-z][^\]]+\]/ should fix it nicely.

Share This Page