1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Import from IPB 3.2: posts not parsed at all?

Discussion in 'Installation, Upgrade, and Import Support' started by Filetrip, Oct 25, 2012.

  1. Filetrip

    Filetrip Member

    Hi everyone,

    I just purchased a Xenforo licence and started the Import process from my former big-ass IPB-based website (5 million posts, 300k users).

    Seeing as importing threads and posts is taking a while (got 5-6 GB worth to import) I am looking at the first imported posts -- about 30% processed so far. This is what I'm seeing:



    I hid a few names but that's not what I'm concerned about, lol. All the posts appear as complete raw HTML. None of the styling, bbcode tags or anything was imported.

    Is this supposed to happen? I couldn't find any information on the forums, doesn't look like anyone had major issues like this while importing their IPB...

    Looking into the source code of the importer--
    PHP:
            $message preg_replace('/<br( \/)?>(\s*)/si'"\n"$message);
    ...
     
            
    $search = array(
                
    // HTML image <img /> smilies
                
    "/<img\s+src='([^']+)'\s+class='bbc_emoticon'\s+alt='([^']+)'\s+\/>/siU"
                    
    => '\2',
     
                
    // strip anything after a comma in [FONT]
                
    '/\[(font)=(\'|"|)([^,\]]+)(,[^\]]*)(\2)\]/siU'
                    
    => '[\1=\2\3\2]'
            
    );
     
            return 
    preg_replace(array_keys($search), $search$message);
     
    ...
     
            return 
    preg_replace_callback('#\[media[^\]]*\](http://.*)\[/media\]#siU', array($this'_convertIPBoardMediaTag'), $message);
    is this all that the Importer does when it comes to converting IPB posts?
    I've got what it takes to do the rest of the processing on my own, but if there is a hidden/secret function that I overlooked I'd be glad if someone would let me know.

    I've already seen the quick plugin developed by Kier that does a regexp search & replace in the posts, but no, thank you very much. Running a query like that on 5 million posts at once? You can't be serious. I'll have to modify the importer and create my own custom step for post processing. Having read the code a bit, it doesn't look too difficult in comparison of what I've done before.

    Thanks for reading this!
     
    Jake Bunce likes this.
  2. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    Yeah. Kier's Post Content Find / Replace works well but it doesn't do batch processing which would definitely be a problem with 5 million posts. You could modify it to limit the number of posts:

    library/PostReplace/ControllerAdmin/PostReplace.php

    Code:
    	public function actionReplace()
    	{
    		$this->_assertPostOnly();
    
    		$input = $this->_input->filter(array(
    			'find' => XenForo_Input::STRING,
    			'regex' => XenForo_Input::STRING,
    			'replace' => XenForo_Input::STRING,
    			'commit' => XenForo_Input::UINT,
    			'page' => XenForo_Input::UINT,
    		));
    
    		$posts = $this->_getPRPostModel()->getPostsContaining($input['find'], array(
    			'limit' => 5000,
    			'offset' => 0
    		));
    
    		foreach ($posts AS $postId => &$post)
    		{
    			if (preg_match_all($input['regex'], $post['message'], $matches))
    			{
    
    That should work. It limits the set of matching posts to 5000. You can potentially go much higher depending on your server limits. The idea here is that you would run each replacement multiple times until it matches zero posts which means it has done them all.

    Programming your replacements into the import itself would be faster. But it's kind of nice using the replacement tool because it allows you to supervise the process to make sure the results are what you want.
     
    teletubbi, viper357, GliX and 7 others like this.
  3. Filetrip

    Filetrip Member

    Thank you for the prompt reply Jake, much appreciated.

    I am going to proceed with my own methods, if I can come up with something more conclusive I'll come back and paste the source.
     
  4. Renegade

    Renegade Well-Known Member

    Hey Jake, I tried this but set the number of records to 10 to test the expression given by EQnoble. It worked for two runs but now it doesn't give any results.

    I think I have to set (increase) this limit manually after every run. Is that so?
     
  5. Jake Bunce

    Jake Bunce XenForo Moderator Staff Member

    Set it higher. The first query searches for the Quick Find value, not the Regular Expression value. It's possible that none of the selected records will match the regex, so you need to select more (increase the number) or refine your Quick Find value. 10 is way low.
     
  6. Renegade

    Renegade Well-Known Member

    I am increasing it by 5K in each run. It finds around 3K posts, gets 30 sec timed out and finishes in 3-4 runs. I think I will take 2 months to complete the correction considering I do 6 batches of 5K each day.
     
  7. Shakir

    Shakir Member

    Thank you very much.

    I was getting "Fatal Error: Allowed Memory so on and so fore". After adding the above code it resolved the issue.

    Thanks again.
     
    Jake Bunce likes this.
  8. eagle eyes

    eagle eyes Active Member

    How can this be changed for Xenforo 1.2 to handle Kier's replacement tool?

    Edit: Nevermind, it is an addon file.
     
    Last edited: Oct 18, 2013
  9. viper357

    viper357 Active Member

    Thanks for this.
     
  10. Robert9

    Robert9 Active Member

    I have set up the Limit in the source Code; so i work always with 20 ids; that is enough for me.
     

Share This Page