XF 2.2 Parallel import tapers near the end

jasonm

Member
Hi all,

I am importing a large phpBB forum into Xenforo and have tried a few test runs to ensure minimum downtime with the process. With a very large EC2 instance and a very large RDS database, I can start the import with between 256 and 512 processes. This goes super fast in the beginning, but processes begin dropping soon after. I can get through 20M of 28M posts in less than an hour, but the remainder takes forever, eventually tapering down to just a couple of processes for the last few million posts. Breaking and restarting the process yields the same results.. by the time the importer catches up to rows that need to be moved, my 512 processes are down to a measly 6. Starting with a smaller number of processes eventually leads to the same result. It always tapers to nothing at the end.

I think I've located the source of the issue in the ParallelProcessManager class related to the const DEFAULT_MAX_PER_PROCESS = 3000; ...but I haven't tried tinkering with anything here yet.

The point of this post is to see if anyone has any hot tips on keeping the amount of processes stable throughout the import. It's the difference between 1 hour and an excruciating several while watching the posts/second drop precipitously toward the finish line. The chart below illustrates the downward curve of DB connections that correlates with php processes running on the machine. You can see where I tried restarting the process in the second spike.

Any insights would be greatly appreciated!

1667576364174.png
 

jasonm

Member
Just a quick update for anyone else that might be running into a similar situation. I found much, much better performance with taking exports from the remote database (serverless RDS in this case) and doing all the manipulation on a large machine running its own MySQL server. From hours to < 20 min. for 28M post forum. Even factoring the export and reloading of data to the remote database, it saves many hours.
 
Top