XF 2.2 Migration from vB3 to XenForo: 23 days and 10 hours to complete 😅

Rsync

Member
Hello,
We conducted a test to migrate a large vB3 forum to XenForo. Using a multithreaded CLI importer, it took over 23 days of continuous 24-hour work to complete o_O

Captura de pantalla 2023-10-23 a les 10.43.23.webp
 
This is unusual.

This is a large import, but I wouldn't expect it to take anywhere near this amount of time. Even the 25 hours for the private messages strikes me as unusual.

This would strongly imply there is something in the setup which isn't performing optimally.

Can you explain more about the server hardware and software involved? Is MySQL running on the same server as the XenForo import?

I think it's worth checking too - is it expected that there are 0 avatars and attachments imported?
 
This is unusual.

This is a large import, but I wouldn't expect it to take anywhere near this amount of time. Even the 25 hours for the private messages strikes me as unusual.

This would strongly imply there is something in the setup which isn't performing optimally.

Can you explain more about the server hardware and software involved? Is MySQL running on the same server as the XenForo import?

I think it's worth checking too - is it expected that there are 0 avatars and attachments imported?

Hi Chris, we are the tech team behind that import. I can share more details on private message, I'll collect some technical information and send you a message asap. We can share the cause and results on this thread to help others (but I think that not many people are dealing with importing +400M posts 😅). Thanks!
 
There may be ways to speed it up, but 221 posts per second passing through all the validators and entities, it's not that terrible (about 3 hours per million posts). While I don't have one for XF2, you might want to consider building something along the same lines as this one I made for XF1:


It effectively bypassed the entire application layer, everything that went along with it like entities (data writers in XF1) and all the validation and other things the application does to the incoming records. Instead, it did the conversion with faster tools at the operating system level and then simply imported the raw tables straight into MySQL with LOAD DATA INFILE.

That link has the stats, but to summarize, it took 33 minutes to do a vBulletin 4 -> XenForo import that included 18M posts and 650k users (74.5M database records across all tables that were converted).

Passing 439M posts individually through the application-level importer is never going to be "blazingly fast"... even if you manage to speed it up 10x to 2,210 posts per second (which probably isn't going to happen), you are still on the scale of "days" for an import with that many posts.
 
Last edited:
Wow...
Congratulations!!!

I had VB 3.8, 70 000 000 posts ~ 36 hours

I want to give advice.
Xenforo provides only a few PHP files that support old links. I recommend creating the remaining files and folders yourself. For example, ...

profile.php

PHP:
<?php
if (isset($_GET['u'])) {
    $user_id = $_GET['u'];
    header("Location: https://www.domain.com/members/$user_id/");
    exit();
} else {
    header("Location: https://www.domain.com/");
    exit();
}
?>

Folder goto
.htaccess file

Code:
RewriteEngine On
RewriteCond %{QUERY_STRING} ^id=(\d+)$
RewriteRule ^post$ /posts/%1/? [R=301,L]

It's better to redirect the remaining files to the main domain.
If this is not done, there will be a huge number of 404 errors that Google will complain about.
 
Last edited:
There may be ways to speed it up, but 221 posts per second passing through all the validators and entities, it's not that terrible (about 3 hours per million posts).
That's certainly a lot slower than it needs to be - I imported ~ 3.7M posts today in 65 minutes with the standard importer framework on a "not-that-powerful" VM (3 Cores, 4 GB) which equals to ~ 900 posts/s.

With beefier hardware I don't see a reason why it shouldn't be possible to do a few thousand posts/second, maybe not 10K+ though on a single machine.

If the importer code itself is the bottleneck I'd try to scale out to multiple nodes instead of developing an entirely new importer.
I did that with XF 1.5/2.0 and it worked pretty well (but was quite some effort), should be a lot easier by now as the importer framework is already capable of parallel processing - just needs a little plumbing to not run all processes local (but on other nodes).

But before doing that I'd investigate MySQL / MariaDB config, check if there are slow queries and fix those as necessary.
 
the command

Code:
php cmd.php xf:import --processes 12

= speeds things up. 12 being your available cores, although - i would recommend going half.
 
Would love to hear the stats on the server... But yeah, 400M+ posts is unreal. Largest I've dealt with is the 50M mark.
We had 4 CPU cores, ran the importer with 20 processes. 60-70% CPU usage because of network overhead. See more info about that at the quotes below.

There may be ways to speed it up, but 221 posts per second passing through all the validators and entities, it's not that terrible (about 3 hours per million posts). While I don't have one for XF2, you might want to consider building something along the same lines as this one I made for XF1:


It effectively bypassed the entire application layer, everything that went along with it like entities (data writers in XF1) and all the validation and other things the application does to the incoming records. Instead, it did the conversion with faster tools at the operating system level and then simply imported the raw tables straight into MySQL with LOAD DATA INFILE.

That link has the stats, but to summarize, it took 33 minutes to do a vBulletin 4 -> XenForo import that included 18M posts and 650k users (74.5M database records across all tables that were converted).

Passing 439M posts individually through the application-level importer is never going to be "blazingly fast"... even if you manage to speed it up 10x to 2,210 posts per second (which probably isn't going to happen), you are still on the scale of "days" for an import with that many posts.
We're exploring some ways to speed things up, definitely bypassing the application layer will do it. Our database is much larger (Gb scale). However, using the application layer allows us to have control of what changes. We'll try to do the import using the application layer, when we achieve the perfect optimized setup, if it is too slow we'll consider bypassing it.

Is your migration related to the closure of the vbulletin.org website?
We didn't know about that, but it one more reason to migrate. vB became so hard to work on it as it does not work well with dev teams and new infrastructure scaling options. And vB 3 it's aging, no good PHP 8 support, hard to migrate to newer vB versions...

That's certainly a lot slower than it needs to be - I imported ~ 3.7M posts today in 65 minutes with the standard importer framework on a "not-that-powerful" VM (3 Cores, 4 GB) which equals to ~ 900 posts/s.

With beefier hardware I don't see a reason why it shouldn't be possible to do a few thousand posts/second, maybe not 10K+ though on a single machine.

If the importer code itself is the bottleneck I'd try to scale out to multiple nodes instead of developing an entirely new importer.
I did that with XF 1.5/2.0 and it worked pretty well (but was quite some effort), should be a lot easier by now as the importer framework is already capable of parallel processing - just needs a little plumbing to not run all processes local (but on other nodes).

But before doing that I'd investigate MySQL / MariaDB config, check if there are slow queries and fix those as necessary.
The server where we have the vB database and the new XF database is the same, but the importer runs on another machine. It's an infrastructure design choice. We've identified that probably the bottleneck was the network overhead. When importing posts with 20 processes, it did ~1000 posts/s. That's fine for smaller forums, but for us, we need at least 10x more speed to do the migration in a reasonable amount of time. The importer has a hard-coded limit batch size of 500, we'll increase that limit taking into account database timeouts and others to reduce network overhead, we'll see if that improves things.
We'll do a test run on a server that has both the databases and the importer to see what is the best case scenario.

the command

Code:
php cmd.php xf:import --processes 12

= speeds things up. 12 being your available cores, although - i would recommend going half.
Of course, we did use that option, when running single threaded the import rate was very, very low.

Thank you all for your comments! 🙌 Just to clarify, this was our first test run to set the benchmark to beat and check for bottlenecks. We'll optimize the infrastructure and code to have a faster import. As you can imagine, +400M posts is a large forum and we cannot do a standard import. We knew that from the start, and we'll work to improve that and have a migration process that suits our needs. We'll share the improvements here, in case anybody has to import another large forum like ours.
 
Back
Top Bottom