• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Import from IPB 3.2: posts not parsed at all?

#1
Hi everyone,

I just purchased a Xenforo licence and started the Import process from my former big-ass IPB-based website (5 million posts, 300k users).

Seeing as importing threads and posts is taking a while (got 5-6 GB worth to import) I am looking at the first imported posts -- about 30% processed so far. This is what I'm seeing:



I hid a few names but that's not what I'm concerned about, lol. All the posts appear as complete raw HTML. None of the styling, bbcode tags or anything was imported.

Is this supposed to happen? I couldn't find any information on the forums, doesn't look like anyone had major issues like this while importing their IPB...

Looking into the source code of the importer--
PHP:
        $message = preg_replace('/<br( \/)?>(\s*)/si', "\n", $message);
...
 
        $search = array(
            // HTML image <img /> smilies
            "/<img\s+src='([^']+)'\s+class='bbc_emoticon'\s+alt='([^']+)'\s+\/>/siU"
                => '\2',
 
            // strip anything after a comma in [FONT]
            '/\[(font)=(\'|"|)([^,\]]+)(,[^\]]*)(\2)\]/siU'
                => '[\1=\2\3\2]'
        );
 
        return preg_replace(array_keys($search), $search, $message);
 
...
 
        return preg_replace_callback('#\[media[^\]]*\](http://.*)\[/media\]#siU', array($this, '_convertIPBoardMediaTag'), $message);
is this all that the Importer does when it comes to converting IPB posts?
I've got what it takes to do the rest of the processing on my own, but if there is a hidden/secret function that I overlooked I'd be glad if someone would let me know.

I've already seen the quick plugin developed by Kier that does a regexp search & replace in the posts, but no, thank you very much. Running a query like that on 5 million posts at once? You can't be serious. I'll have to modify the importer and create my own custom step for post processing. Having read the code a bit, it doesn't look too difficult in comparison of what I've done before.

Thanks for reading this!
 

Jake Bunce

XenForo moderator
Staff member
#2
I've already seen the quick plugin developed by Kier that does a regexp search & replace in the posts, but no, thank you very much. Running a query like that on 5 million posts at once? You can't be serious. I'll have to modify the importer and create my own custom step for post processing. Having read the code a bit, it doesn't look too difficult in comparison of what I've done before.
Yeah. Kier's Post Content Find / Replace works well but it doesn't do batch processing which would definitely be a problem with 5 million posts. You could modify it to limit the number of posts:

library/PostReplace/ControllerAdmin/PostReplace.php

Code:
	public function actionReplace()
	{
		$this->_assertPostOnly();

		$input = $this->_input->filter(array(
			'find' => XenForo_Input::STRING,
			'regex' => XenForo_Input::STRING,
			'replace' => XenForo_Input::STRING,
			'commit' => XenForo_Input::UINT,
			'page' => XenForo_Input::UINT,
		));

		$posts = $this->_getPRPostModel()->getPostsContaining($input['find'], array(
			'limit' => 5000,
			'offset' => 0
		));

		foreach ($posts AS $postId => &$post)
		{
			if (preg_match_all($input['regex'], $post['message'], $matches))
			{
That should work. It limits the set of matching posts to 5000. You can potentially go much higher depending on your server limits. The idea here is that you would run each replacement multiple times until it matches zero posts which means it has done them all.

Programming your replacements into the import itself would be faster. But it's kind of nice using the replacement tool because it allows you to supervise the process to make sure the results are what you want.
 
#3
Thank you for the prompt reply Jake, much appreciated.

I am going to proceed with my own methods, if I can come up with something more conclusive I'll come back and paste the source.
 

Renegade

Well-known member
#4
Hey Jake, I tried this but set the number of records to 10 to test the expression given by EQnoble. It worked for two runs but now it doesn't give any results.

I think I have to set (increase) this limit manually after every run. Is that so?
 

Jake Bunce

XenForo moderator
Staff member
#5
Hey Jake, I tried this but set the number of records to 10 to test the expression given by EQnoble. It worked for two runs but now it doesn't give any results.

I think I have to set (increase) this limit manually after every run. Is that so?
Set it higher. The first query searches for the Quick Find value, not the Regular Expression value. It's possible that none of the selected records will match the regex, so you need to select more (increase the number) or refine your Quick Find value. 10 is way low.
 

Renegade

Well-known member
#6
I am increasing it by 5K in each run. It finds around 3K posts, gets 30 sec timed out and finishes in 3-4 runs. I think I will take 2 months to complete the correction considering I do 6 batches of 5K each day.
 
#8
Yeah. Kier's Post Content Find / Replace works well but it doesn't do batch processing which would definitely be a problem with 5 million posts. You could modify it to limit the number of posts:

library/PostReplace/ControllerAdmin/PostReplace.php

Code:
    public function actionReplace()
    {
        $this->_assertPostOnly();

        $input = $this->_input->filter(array(
            'find' => XenForo_Input::STRING,
            'regex' => XenForo_Input::STRING,
            'replace' => XenForo_Input::STRING,
            'commit' => XenForo_Input::UINT,
            'page' => XenForo_Input::UINT,
        ));

        $posts = $this->_getPRPostModel()->getPostsContaining($input['find'], array(
            'limit' => 5000,
            'offset' => 0
        ));

        foreach ($posts AS $postId => &$post)
        {
            if (preg_match_all($input['regex'], $post['message'], $matches))
            {
That should work. It limits the set of matching posts to 5000. You can potentially go much higher depending on your server limits. The idea here is that you would run each replacement multiple times until it matches zero posts which means it has done them all.

Programming your replacements into the import itself would be faster. But it's kind of nice using the replacement tool because it allows you to supervise the process to make sure the results are what you want.
How can this be changed for Xenforo 1.2 to handle Kier's replacement tool?

Edit: Nevermind, it is an addon file.
 
Last edited:

viper357

Active member
#9
Yeah. Kier's Post Content Find / Replace works well but it doesn't do batch processing which would definitely be a problem with 5 million posts. You could modify it to limit the number of posts:

library/PostReplace/ControllerAdmin/PostReplace.php

Code:
    public function actionReplace()
    {
        $this->_assertPostOnly();

        $input = $this->_input->filter(array(
            'find' => XenForo_Input::STRING,
            'regex' => XenForo_Input::STRING,
            'replace' => XenForo_Input::STRING,
            'commit' => XenForo_Input::UINT,
            'page' => XenForo_Input::UINT,
        ));

        $posts = $this->_getPRPostModel()->getPostsContaining($input['find'], array(
            'limit' => 5000,
            'offset' => 0
        ));

        foreach ($posts AS $postId => &$post)
        {
            if (preg_match_all($input['regex'], $post['message'], $matches))
            {
That should work. It limits the set of matching posts to 5000. You can potentially go much higher depending on your server limits. The idea here is that you would run each replacement multiple times until it matches zero posts which means it has done them all.

Programming your replacements into the import itself would be faster. But it's kind of nice using the replacement tool because it allows you to supervise the process to make sure the results are what you want.
Thanks for this.