CLI importer handles 3rd party import models?

Marcus

Well-known member
I heard there is an inofficial cli importer used from some members. Would this importer also handle 3rd party import models as the vb 4 importer from Paul M? As I am planning to go live today and I noticed that there were some problems during the first time of import (3-4 hours), a faster import method would definitely help :)
 
It's included in the .zip package under /library/XFCliImporter.

However, it's not recommended for use yet as it's not ready.

Kier has just confirmed that the B3 version has some improvements to improve stability.
 
(Though I would not recommend that anyone actually do that yet, as the CLI importer is still at its alpha stage, and is subject to major changes yet.)
 
Are there plans to extend the CLI importer to more than just posts/threads? For me, I'm thinking users and private messages. 600,000 users took about 6 hours with the normal importer. We also have 1.5M PMs (we even auto-delete them after 90 days), which I didn't even bother trying.

I couple things I would love to see with it... have the process monitoring be done by name instead of pid. This would allow it to be extended to work on multiple servers a little easier. Just "ssh hostname nohup ..." instead of just "nohup ..." Would be interesting to see what it could do if you spread 40 threads around 10 servers.

The other thing would be a "retention" mode you could select for retaining important IDs from the import (obviously would only work if you were importing into a clean install). postid, threadid, forumid and userid are the important ones for me. An import without retaining those isn't really even an option for me unfortunately... I have a ton of custom stuff I've built that references those IDs from non-XF/vB tables. But even side from that, I'm not really willing to keep a 25 million record database forever that cross-references the old to new ID. It would also make my "ultra fast" import for certain types of data a lot easier if I don't need to go in after the fact and reset the new IDs in all the tables to the old IDs.

As an added bonus, it would make importing things after the fact (like maybe you want to import warnings from the infraction system) infinitely easier.

BTW - this is how I'm doing the post table for my fast mode import, in case it helps anyone...

Code:
exec('mysql -h' . $slave_db_host . ' ' . $source_db . ' -N -e "SET @position = -1, @previous = 0;SELECT post.postid AS post_id, post.threadid AS thread_id, post.userid AS user_id, post.username, post.dateline AS post_date, post.pagetext AS message, post.postid AS ip_id, IF(post.visible = 0, \'moderated\', IF(post.visible = 2, \'deleted\', \'visible\')) AS message_state, attach AS attach_count, IF(@previous = post.threadid, @position:=@position+1, @position:=(0 * @previous:=post.threadid)) AS position, (SELECT COUNT(*) FROM reputation WHERE reputation.postid = post.postid AND reputation > 0) AS likes, \'a:0:{}\' AS like_users, COALESCE(infraction.infractionid, 0) AS warning_id, \'\' AS warning_message FROM post AS post LEFT JOIN infraction ON(infraction.postid = post.postid) WHERE post.threadid ORDER BY post.threadid, post.postid" | sed -r \'s/\[quote="?([^]";]*?);([^]";]*?)"?\]/[quote="\1, post: \2"]/ig;s/\[noparse\]/\[plain\]/ig;s/\[\/noparse\]/\[\/plain\]/ig;s/\[strike\]/\[s\]/ig;s/\[\/strike\]/\[\/s\]/ig;s/\[video\]http\:\/\/www.youtube.*v=([^&[]*?).*\[\/video\]/\[media=youtube\]\1\[\/media\]/ig\' > /tmp/postdump.txt');

Just dumping the results of a query that formats everything for XF directly, and then piping the results through sed to reformat certain BBCode as needed on the fly... quote, strike, noparse and video.

You end up with a dump of posts ready for a direct import into the table, with the exception being you need spin through posts that received a like to build the like_user column.
 
Sorry to beat a dead horse here, but just a follow up on another reason I think being able to optionally retain a few key primary keys is important... to make importing stuff into third party addons easier (for example one of those blog addons I saw)... or even "official" stuff that may not have been supported before, but is now (like people that still have their old backups could have imported infractions into warnings without too much headache).

But the biggest thing for me with is just the fact we have built a ton of stuff around our old framework... stuff that references things like userid. It's all stuff I need to rebuild the front-end for XF, but if all the keys are changing needing to rebuild the backend just makes it all that much more difficult.
 
I'm going to look into making a 'retain keys' option for the importer system. Obviously, this will only work for an import into a clean install with no existing data.
 
So I *finally* got around to working on my "really fast importer" that just bypasses the data writers (and all done via CLI). There is still a lot of work to do, but it ended up being A LOT faster than I had hoped... Was able to import posts (with their parent threads) at a rate of about 1,000,000 posts per minute (18M posts and 2.6M threads in slightly less than 18 minutes). Since I have multiple DB servers, I think I'm going to do each major content type on different servers in parallel (users, posts, threads, private messages), and build a custom daemon for the servers that listens for commands on what to do from the master controller.

It seems I'm only bottlenecked by how fast data can be read/written to the disks. It would be pretty sexy to be able to do a complete conversion of everything in under an hour.
 
This is turning out to be quite a fun little side project. :)

Able to import 615,852 users in 3 minutes... lol Retaining Facebook connectivity for existing users using it, contacts turn into follows, our nearly 400k infractions into warnings, etc. :)

Image%202012.07.19%205:15:05%20PM.png
 
Top Bottom