XF 2.0 VB4.2.x to XF2.0.10 - Import comes to a near-screeching halt at posts import

bctrainers

Member
Hi all,

I've been in the slow process of importing a VB4 database to Xenforo 2 so that folks at the company I'm with can demo XF2 via an old snapshot. However, for the past couple of weeks, I've been running into an issue on my local development environment.

On the importer, I used the general setup steps via the admin panel to begin the import process, and continued it over on SSH with `screen`. I opted to not include the attachments table on my most recent import trial/test, so have 25 steps. At 16 of 25 steps, I am on the posts import segment. Right around 4,065,000-ish posts is when the importer comes to a near halt. Posts are no longer imported by the thousands or hundreds, but by one.... every 11 seconds. Every step prior to this particular step has been quite fast.

VB4 Post Table rows: 7,107,461.
VB4 Post Table size: 7.4GB. Index is 1.4GB, with 6GB of Data.

Current trial run is via a non-virtual machine, and directly on a Fresh Debian installation.

Machine #1 - Virtual Machine Host, VMWare. Front-end only, not in use with the current trial-run.
- Intel i7-6800k @ 3.4GHz.
- 64GB DDR4 RAM
- Samsung 970 EVO 1TB m2
- Ethernet: Dual nic bonded to 2Gbps.
- VM installation for the front end consists of the following applications only: nginx, php7.0-fpm (and associated php7 modules).
- VM's "hardware" was set to the following: 16GB RAM, 4 processors, 50GB NVMe preallocated, Bridged Ethernet to lan, no VM logging.



Machine #2 - mySQL/MariaDB, nginx, php7.0-fpm, and associated modules. No VM's, direct on server.

CPU:
Code:
root@box004:/var/www/xf2# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 158
Model name:            Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
Stepping:              9
CPU MHz:               4400.207
CPU max MHz:           4500.0000
CPU min MHz:           800.0000
BogoMIPS:              8400.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7
RAM: 32GB DDR4, 2400MHz. RAM can't be expanded past this due to motherboard limits (mini itx).
Code:
root@box004:/var/www/xf2# free
              total        used        free      shared  buff/cache   available
Mem:       32835636      581860      532828       29304    31720948    31759016
Swap:      33439740           0    33439740
Storage: Samsung 860 PRO 1TB 2.5 Inch SATA III Internal SSD (MZ-76P1T0BW)
All on the Gigabyte Z270N-WIFI motherboard.

Current output from the terminal...
Code:
- Step  4 of 25: Avatars                   00:30:52 [22,467] 99.28%
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
libpng warning: iCCP: known incorrect sRGB profile
- Step  4 of 25: Avatars                   00:31:06 [22,591]
- Step  5 of 25: Buddy and ignore lists    00:01:11 [10,177]
- Step  6 of 25: Paid subscriptions        00:00:00 [0]
- Step  7 of 25: Custom BB codes           00:00:00 [10]
- Step  8 of 25: Private messages          00:28:48 [276,044]
- Step  9 of 25: Profile comments          00:00:54 [16,324]
- Step 10 of 25: Forums                    00:00:05 [315]
- Step 11 of 25: Moderators                00:00:01 [63]
- Step 12 of 25: Thread prefixes           00:00:05 [478]
- Step 13 of 25: RSS feed sources          00:00:00 [0]
- Step 14 of 25: Threads                   00:26:35 [428,409]
- Step 15 of 25: Tags                      00:02:50 [15,162]
- Step 16 of 25: Posts                     10:34:23 [4,069,043] 55.25%

Importer on left, SQL on right.
MobaXterm_2018-10-17_04-07-45.webp

At this point, it seems as if mySQL has consumed up mostly-all RAM as cache, which might be resulting in this slow process? Or is there something more going on here with how the importer is importing content and not freeing memory?

On mySQL Workbench, this doesn't seem all that healthy either...
MySQLWorkbench_2018-10-17_04-13-32.webp
 
Last edited:
As an update to my earlier post... It appears as if with every post parsed from the VB4 database and inserted to the XF2 install, the Reads per second increases exponentially.

iotop doesn't appear to be showing disk throttling (this SSD is in use for the SQL), shows most of this is happening in the RAM/Memory space of the mySQL. And even then, this is less than or equal to 1MB/sec.

htop shows most of the CPU time is being consumed by the mySQL process. Eight threads constantly running to all eight cores of the CPU.

The cmd.php screen session for the import process isn't using any CPU time (0.0%) like it was at the get go, which was at about 5 to 7%.

Monitoring the connections to and from the cmd.php between vb4 and xf2, I am constantly seeing "SELECT post.*,..." active in the vb4 database, as a sending data state. While the xf2 database is being held open in a Sleep state. Right now, looking at 8-10 seconds between each query restart of the same "SELECT post.*..."

Is this process actually full scanning the post table to add one by one?
 
It certainly shouldn't be, though the exact query would be helpful. Can you run a SHOW FULL PROCESSLIST; and either show the query here (or send it to me via a conversation if it has private info)?

It may be helpful to run an EXPLAIN version of that query as well, in case it's doing something odd that we aren't expecting it to do.
 
Im testing the importer today on my 10million-posts-forum
Im using a local machine.
I can confirm importing slows down the more posts have been imported.

first 10% took 20mins
next 10% took 30mins
next 10% took 50mins
next 10% (still working...)
...
 
The issue here was mostly related to MySQL picking a bad index, particularly for long threads.

There is a workaround involving a query edit, though it's still not as efficient as possible. The most efficient workaround should be to add an index to the post table on the (threadid, dateline) columns (as in one index on both of those columns).
 
Hi Claudio, tried both web based and CLI both were taking hours. I've read somewhere about a MariaDB bug that slows things down so looking for a solution and guidance on how to implement it.
 
Back
Top Bottom