vb 3.8 -> XF (via CLI) Importer discrepancies.

MGSteve

Well-known member
In the vB database, we have 429,387 threads and 4,723,436 posts. After import in the XF database we have 280,712 posts and 3,035,844 posts.

The VB data was tided up before, all soft deleted threads were removed, all orphaned threads & posts were removed, so as far as I'm concerned, the two numbers should more or less match.

Any suggestions as to why they don't - incase it matters, we did use the CLI Importer & the threads it stated as importing was along the same lines as the VB count - i.e. 400,000+
 
I've run into this before and I've got data to try to track it down, but haven't yet.

The CLI importer is alpha/beta for this reason. You may just need to use the web importer for now.
 
Yup, quite a few Mike. Do you want me to post it or email it over?

Ah, just checked, its 6.2mb! Probably best not to post it!

e.g.

import.log said:
01:41:18 #2 Starting
01:41:18 #4 Starting
01:41:18 #7 Starting
01:41:18 #3 Starting
01:41:18 #6 Starting
01:41:18 #8 Starting
01:41:18 #0 Starting
01:41:18 #5 Starting
01:41:18 #11 Starting
01:41:18 #9 Starting
01:41:18 #10 Starting
01:41:18 #13 Starting
01:41:18 #12 Starting
01:41:19 #1 Starting
01:41:19 #14 Starting
01:41:19 #15 Starting

01:41:24 #11 @ DEADLOCK while importing starting from 854. Retrying...

01:41:24 #11 @ FAILED to import 100 threads starting from thread ID 854. Retrying...
Mysqli statement execute error : Deadlock found when trying to get lock; try restarting transaction

01:41:24 #15 @ DEADLOCK while importing starting from 1613. Retrying...

01:41:24 #15 @ FAILED to import 100 threads starting from thread ID 1613. Retrying...
Mysqli statement execute error : Deadlock found when trying to get lock; try restarting transaction

01:41:24 #9 @ DEADLOCK while importing starting from 959. Retrying...

Actually, I didn't use 16 threads last night - I take it the log isn't truncated with each import run?

Edit: I located the start of the last run and its full of the same deadlock errors anyway Mike.
 
Bah, I see a bug. In library/XFCliImporter/import.php, change:
Code:
if (++$failed == 5)
to:
Code:
else if (++$failed == 5)

It's supposed to just keep retrying if it gets deadlocks. That said, trying to track down deadlock causes can be helpful. You can get info on the last deadlock with this query:

SHOW INNODB STATUS;
 
Hope this makes sense to you Mike ;)

Code:
=====================================
120215 17:33:34 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 8 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 221691, signal count 185494
Mutex spin waits 0, rounds 14232546, OS waits 131489
RW-shared spins 62666, OS waits 28090; RW-excl spins 67050, OS waits 39498
------------------------
LATEST DETECTED DEADLOCK
------------------------
120214 22:18:19
*** (1) TRANSACTION:
TRANSACTION 0 1144322, ACTIVE 2 sec, process no 27586, OS thread id 1195571520 inserting
mysql tables in use 1, locked 1
LOCK WAIT 45 lock struct(s), heap size 6752, undo log entries 383
MySQL thread id 1010049, query id 185684478 localhost orgf update
INSERT INTO `xf_post` (`post_id`, `thread_id`, `user_id`, `username`, `post_date`, `message`, `attach_count`, `message_state`, `position`, `ip_id`, `likes`, `like_users`, `warning_id`, `warning_message`) VALUES ('3963150', '375854', '52690', 'Gummibear1986', '1280610763', '[QUOTE=LORDPSK;3961616]Sounds like the auto adjuster on the clutch cable to me[/QUOTE]\r\n\r\nI have a similar problem. If it is the adjuster how easy is this to fix?', '0', 'visible', '3', '0', '0', 'a:0:{}', '0', '')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 438 n bits 112 index `PRIMARY` of table `orgf_forums/xf_post` trx id 0 1144322 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
 
*** (2) TRANSACTION:
TRANSACTION 0 1144321, ACTIVE 7 sec, process no 27586, OS thread id 1213143360 setting auto-inc lock
mysql tables in use 1, locked 1
226 lock struct(s), heap size 30704, undo log entries 3972
MySQL thread id 1010051, query id 185684482 localhost orgf update
INSERT INTO `xf_post` (`post_id`, `thread_id`, `user_id`, `username`, `post_date`, `message`, `attach_count`, `message_state`, `position`, `ip_id`, `likes`, `like_users`, `warning_id`, `warning_message`) VALUES ('3963016', '376022', '57935', 'Pete27011', '1280606602', 'I would start with the spark plug electrode gaps and work backwards.check everything for damage or wear and replace, a missfire is definately something electrical . Ht leads often look ok and are actually damaged inside. with ignition parts i always buy the best,cheap inferior parts like 3 quid for a moto factors rotor arm never pays off.', '0', 'deleted', '1', '0', '0', 'a:0:{}', '0', '')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 438 n bits 112 index `PRIMARY` of table `orgf_forums/xf_post` trx id 0 1144321 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;
 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `orgf_forums/xf_post` trx id 0 1144321 lock mode AUTO-INC waiting
*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 0 1439824
Purge done for trx's n:eek: < 0 1439824 undo n:eek: < 0 0
History list length 2
Total number of lock structs in row lock hash table 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 27586, OS thread id 1220598080
MySQL thread id 2297507, query id 234454497 localhost 127.0.0.1 root
SHOW INNODB STATUS
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
44919 OS file reads, 3626227 OS file writes, 512535 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 5, seg size 7,
7343 inserts, 7343 merged recs, 1801 merges
Hash table size 6424837, used cells 5326439, node heap has 16733 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 8 863900298
Log flushed up to 8 863900298
Last checkpoint at 8 863900298
0 pending log writes, 0 pending chkp writes
378885 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 3627964234; in additional pool allocated 1048576
Buffer pool size 198144
Free buffers 1
Database pages 181410
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 60522, created 216708, written 5142094
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread process no. 27586, id 1175603520, state: waiting for server activity
Number of rows inserted 19608649, updated 10966681, deleted 79953, read 786624100
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
 
Bah, I see a bug. In library/XFCliImporter/import.php, change:
Code:
if (++$failed == 5)
to:
Code:
else if (++$failed == 5)

It's supposed to just keep retrying if it gets deadlocks. That said, trying to track down deadlock causes can be helpful. You can get info on the last deadlock with this query:

SHOW INNODB STATUS;
bah, i didn't see your bug until I re-read the post. I'll make the change and re-run it tonight.

Any suggestions for avoiding the deadlocks though? I can post the INNODB config if that would help?
 
bah, I've only got 5.0.77 installed and it needs to be 5.1.22 to use that automatic lock mode.

Yikes, upgrading mysql on the live machine isn't something I really want to do without testing 5.1.22 first! 5.0.95 is the latest version available via YUM, so I'd have to install 5.1.22 manually :(
 
Well, see what happens with the bug fix. You'll still run into the deadlocks, but it should just ignore them and continue trying to import until it can get it through. This will slow it down though obviously.
 
Just to confirm, I've restarted the process and so far I haven't seen a failure to import, just loads of deadlocks again. Will confirm the number of posts & threads when its completed.
 
Sorry to report Mike, that code change somehow made it worse, its now lost over 4m posts :(

It only actually imported around 600,000 posts and 63,000 threads.

If it is mySQL version related, quite alot of people may well have issues upgrading mySQL - e.g. the server I've got is running centos 5.6 - all that is available for this release is 5.0.95 without having to manually install the later version, which some hosting providers may well not be in favour of.

Looks like its back to the default import method for me :(

if you haven't already, it may well be worth testing the CLI on the 5.0 version of mySQL. (I'm on 5.0.22)
 
Great, now my sql backup won't restore due to a 'Got a packet bigger than 'max_allowed_packet' bytes' error.

-edit, I'll start a new thread, just found an issue that may be a problem to some people...
 
As far as I can read, deadlocking shouldn't happen there. Depending on your MySQL version, you may be able to set innodb_autoinc_lock_mode to 1 via my.cnf (http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html) and that may get rid of many of the deadlocks.
I've just tested mySQL 5.0.45 and 5.1.42 on my dev box at home and they both have deadlock issues, so the innodb_autoinc_lock_mode setting made no difference.

am I right in thinking that the posts are imported in transactions of 100 posts? I presume its having the problem in that its having to wait for 1 block to finish before it creates the other.

I seem to recall, but can't find the reference now that one possible way to solve it may be to change post_id from an auto increment and set the post_id in the query.

Don't know if that's worth a try? - Although looking through the importer code, it appears you're doing that anyway?

Perhaps it could be worth importing fewer rows in bulk?
 
Top Bottom