1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Fixed Import fails on posts with special characters from MS Word.

Discussion in 'Resolved Bug Reports' started by Baron, Oct 22, 2010.

  1. Baron

    Baron Member

    When doing an import last night with Beta 2, it failed about 3/4ths of the way through. After checking the server error log, I was able to track down the post that was causing this. It appears that the user copied and pasted a research report originally done in MS word, and the importer is crashing when encountering the code for a square looking bullet point character. When I deleted this character, the import continued without a problem. Here's an excerpt from the source of the post:


    􀂙 Company reported a $0.01 for 1Q07, one cent lower than our
    estimates and consensus of $0.02. We are maintaining a Buy Rating,
    and lowering our FY07 estimates from $0.25 to $0.21 and FY08
    estimates from $0.30 to $0.27. We are maintaining a target price of $3 or
    11x 2008 projections.
     
  2. Mike

    Mike XenForo Developer Staff Member

    I can guess, but do you know the specific error message? It should be logged in the server error log part of the admin CP. It should be stripping out 4 byte UTF-8 characters, though maybe that is bugged.
     
  3. Baron

    Baron Member

    Zend_Db_Statement_Mysqli_Exception: Mysqli statement execute error : Incorrect string value: '\xF4\x80\x82\x99 C...' for column 'message' at row 1 - library/Zend/Db/Statement/Mysqli.php:214
     
  4. Baron

    Baron Member

    I've attached the source of the entire post if you want to try to duplicate the error.
     

    Attached Files:

  5. Mike

    Mike XenForo Developer Staff Member

    Looks like the fix for stripping out 4-byte UTF-8 characters was incorrect (my fault!). I appears to actually be stripping 5 byte UTF-8 characters (with a slight mistake), which aren't actually allowed by the RFC anyway. (As a note, MySQL only supports 3 bytes UTF-8 chars, which represents the BMP.)

    To anyone having this issue, try this as a fix. In library/XenForo/Importer/vBulletin.php, change the following line:
    Code:
    return preg_replace('/[\xF8-\xFB].../', '', $string);
    to:
    Code:
    return preg_replace('/[\xF0-\xF4].../', '', $string);
    This code is effectively the last line in the file. I believe that should prevent/fix this error then.
     
    Daracon, Dean and Walter like this.
  6. Walter

    Walter Well-Known Member

    Thanks, did work! I had the same problem with posts and now the import continues...
     
  7. AlexT

    AlexT Well-Known Member

    Deleted.
     
  8. Daracon

    Daracon Member

    Worked perfect for me too :)
     

Share This Page