Fixed horribly slow import of vB posts when a thread has an asinine amount of posts

mazdas247

Member
On my board the members amuse themselves trying to make the longest useless thread ever, enter some wonderful gems:

"The Longest Thread II": 333,175 replies
"The Longest Thread": 130,808 replies

PHP memory limit side, the latest vb4 importer for XF2 blows up spectacularly on this, as for each thread it loads *all the posts* in a single query. By the time this select has completed, one post is imported (or more depending on dateline) and then the import scripts exits and starts with the next post because it exceeded 8 sec. It was hilarious to see the import running at 1 post per 14 seconds, I knew there was no way this could be right. Importing "The longest thread II" would have taken 53 days LOL...

Below is my patch that works around this. I strongly suggest you implement something to this effect. With this patch 5,778,400 posts imported in 7 hours, 8 minutes which is 229 posts per second, a very acceptable rate considering the mysql to 2 different database servers.

Diff:
diff --git a/htdocs/xenforo/src/addons/XFI/Import/Importer/vBulletin.php b/htdocs/xenforo/src/addons/XFI/Import/Importer/vBulletin.php
index 54c1da3..3ccc063 100644
--- a/htdocs/xenforo/src/addons/XFI/Import/Importer/vBulletin.php
+++ b/htdocs/xenforo/src/addons/XFI/Import/Importer/vBulletin.php
@@ -18,6 +18,8 @@ class vBulletin extends AbstractForumImporter
         */
        protected $userFields;

+       protected const MAX_POSTS = 500;
+
        public static function getListInfo()
        {
                return [
@@ -3073,6 +3075,21 @@ class vBulletin extends AbstractForumImporter

                        $this->lookup('user', $this->pluck($posts, ['userid', 'edituserid']));

+                       $continueSameThread = false;
+                       if (count($posts)==vBulletin::MAX_POSTS)
+                       {
+                               $continueSameThread = true;
+                               $lastDateline = $posts[vBulletin::MAX_POSTS-1]["dateline"];
+                               while (count($posts) && ($posts[count($posts)-1]["dateline"]==$lastDateline))
+                               {
+                                       // since we limited the retrieved posts, we don't know
+                                       // if there are further posts in the database with
+                                       // the same dateline, so drop posts until we find one
+                                       // with an earlier dateline.
+                                       array_pop($posts);
+                               }
+                       }
+
                        foreach ($posts AS $i => $post)
                        {
                                $state->extra['postDateStart'] = $post['dateline'];
@@ -3126,6 +3143,10 @@ class vBulletin extends AbstractForumImporter
                                }
                        }

+                       if ($continueSameThread) {
+                               break; // calls resumeIfNeeded and fetches the rest of the thread
+                       }
+
                        $state = $this->setStateNextThread($state, $oldThreadId);

                        if ($timer->limitExceeded())
@@ -3151,6 +3172,8 @@ class vBulletin extends AbstractForumImporter

        protected function getPosts($threadId, $startDate)
        {
+               $MAX_POSTS = vBulletin::MAX_POSTS;
+
                return $this->sourceDb->fetchAll($this->prepareImportSql($this->prefix, "
                        SELECT post.*,
                                IF(user.username IS NULL, post.username, user.username) AS username,
@@ -3167,7 +3190,7 @@ class vBulletin extends AbstractForumImporter
                                editlog ON (editlog.postid = post.postid)
                        WHERE post.threadid = ?
                        AND post.dateline > ?
-                       ORDER BY post.dateline
+                       ORDER BY post.dateline limit 0,$MAX_POSTS
                "), [$threadId, $startDate]);
        }
 
Ha, I was just coming here to post this although I'm importing from vB5. At a certain point in the import posts process, it slows way the hell down to like 1 post per 10 seconds and disk usage by MySQL skyrockets. I bet it's the 123k post chit chat thread that exists on my forums.

Total run time
143.65 hours

Posts
2,462,581 (143 hours, 37 minutes)
 
Last edited:
Thank you for posting your patch :) I'll be importing from vB3, also have 100,000k post threads, and if I see the same behaviour I'll now know what might be causing it. Cheers.
 
Just a note, maybe it helps: my longest thread was more than 300 000 posts and was imported from 3.8 without problems, although that was with the big board importer.
 
Just a note, maybe it helps: my longest thread was more than 300 000 posts and was imported from 3.8 without problems, although that was with the big board importer.

Big board importer isn't comparable in how they work. Though as above, limiting the query for the batch on the stock importer is a good idea.
 
It still using a single core and taking sooooo long :sleep::(

1530209498462.webp

1530209839358.webp

Step 10 of 17: Threads - [1,881] 0.11%

It takes 7 seconds to import a single thread. I have more than 1 million threads to go.
oh boy... life is short...
3RGSIJ6.jpg
 
Oh man i laughed out loud, respect for having gone through the trouble. But more seriously, I'd suggest you use the command line importer running inside a linux "screen" command, you get better feedback from there, and you don't have the overhead of the HTTP connections.
 
Last edited:
This is exactly the reason @alexD built us a custom CLI importer (for custom forumsoftware) for XF2.
That importer is WAY faster than the GUI XF1 version he previously built.

It's not cheap but (for us) necessary to make the migration actually possible. (It helps that the CLI importer supports incremental imports ;) )
 
@duderuud Your 20M+ posts and hundreds of GBs of Media Gallery items is a beast of another level!
If someone happens to be able to invest on a incremental importer like yours but for vBulletin->XF2 (instead of React->XF2), I'll be happy to undertake it if I'm available.

--

I was recently hired to do a vBulletin->XF2 migration for some much fewer posts (4M) using the official beta XFI in CLI mode.

I have the same problem as the OP: Long threads are importing very slowly.
I had to increase the php memory limit so it doesn't throw errors anymore, but the apache+php server started swapping like crazy.
No visible progress was happening, so my client decided to upgrade their RDS instance and have me retry the import on that.
Does anybody have experience on how much beefed up the php app and mysql servers have to be in cases like these?
 
Freshly installed XF v2.0.10 and XFI 1.0.2
and it still uses only one single core :(
XF v1 Importer is better i guess, it uses multiple cores tho, I was able to import all the forum in two days with some big hardware.
(I had to edit the default $limits in vBulletin.php)

Is it okay to install XF1 first, and then import from vBulletin, then again upgrade to XenForo2?
but still I have to wait this time:
Only 81 days to go!
Because, I have to run the XF2 importer again, right?

Oh man i laughed out loud, respect for having gone through the trouble. But more seriously, I'd suggest you use the command line importer running inside a linux "screen" command, you get better feedback from there, and you don't have the overhead of the HTTP connections.
But it still uses 1 single core. Look:
MSH0hKPl.png

I'm running this test forum, and the importer for 3 days.
 
Is it okay to install XF1 first, and then import from vBulletin, then again upgrade to XenForo2?

That is fine.

I'm running this test forum, and the importer for 3 days.

If its only got through 14k threads in 56 hours I would suggest you have a server config problem at play. With a forum of your size I would be expecting even the default importer to be done within 24 hours.

The big board importer should only give you a few hours downtime as another option.
 
Top Bottom