7 days to upgrade Spacebattles

Xenforo Importer Daemon mode ?
Maybe for very large forums, xenforo could have a daemon importer feature,
  1. you setup a separate mysql slave server which does replication only the listening to master vb database name and replicating to slave vB database name.
  2. you setup xenforo destination database name to connect from mysql slave server
  3. xenforo daemon mode continuously imports (ssh cli script what Floren is suggesting) reading from mysql slave server's vB database name into xenforo database on the slave
  4. live vB forum would be still online for the duration of the import as all updates/writes and reads would come from mysql master server as you don't configure vB config.php for slave usage at all.
  5. the live vB forum data gets replicated to mysql slave which xenforo daemon mode continuously listens and imports from into it's own xenforo database.

You would still end up bottlenecking at the DataWriters... since essentially you are spinning every individual content type and associated records through DataWriters. You certainly could speed it up exponentially by running that stuff in parallel across multiple machines... but still wouldn't be fast enough imo. Once I'm done rewriting all the core stuff I have, I'm gonna see about making a really, really fast importer (I'm shooting for a sub-2 hour downtime to import ~20M posts, 3M threads, 750k users, etc.... including necessary cache rebuilding in that time frame).

I've mucked with it a little bit, and was able to do 15M posts (just raw posts, not threads and things) in about 15 minutes, so... We'll see what I can come up with when I really have time to work on it.
 
I've mucked with it a little bit, and was able to do 15M posts (just raw posts, not threads and things) in about 15 minutes, so... We'll see what I can come up with when I really have time to work on it.

Is this doing the database level changes you hinted at?
 
Is this doing the database level changes you hinted at?
Yeah... the biggest issue is dealing with a source DB that has incorrect collation (how vBulletin uses Latin collation for unicode data). But I figured out how to fix that without importing/exporting everything.

The rest is really just doing raw SQL dumps and manipulating the data as it comes out of MySQL. For example, this is what I was using for the post table... you end up with a .txt file you can load direct into the XF table:

PHP:
exec('mysql -h' . $slave_db_host . ' ' . $source_db . ' -N -e "SET @position = -1, @previous = 0;SELECT post.postid AS post_id, post.threadid AS thread_id, post.userid AS user_id, post.username, post.dateline AS post_date, post.pagetext AS message, post.postid AS ip_id, IF(post.visible = 0, \'moderated\', IF(post.visible = 2, \'deleted\', \'visible\')) AS message_state, attach AS attach_count, IF(@previous = post.threadid, @position:=@position+1, @position:=(0 * @previous:=post.threadid)) AS position, (SELECT COUNT(*) FROM reputation WHERE reputation.postid = post.postid AND reputation > 0) AS likes, \'a:0:{}\' AS like_users, COALESCE(infraction.infractionid, 0) AS warning_id, \'\' AS warning_message FROM post AS post LEFT JOIN infraction ON(infraction.postid = post.postid) WHERE post.threadid ORDER BY post.threadid, post.postid" | sed -r \'s/\[quote="?([^]";]*?);([^]";]*?)"?\]/[quote="\1, post: \2"]/ig;s/\[noparse\]/\[plain\]/ig;s/\[\/noparse\]/\[\/plain\]/ig;s/\[strike\]/\[s\]/ig;s/\[\/strike\]/\[\/s\]/ig;s/\[video\]http\:\/\/www.youtube.*v=([^&[]*?).*\[\/video\]/\[media=youtube\]\1\[\/media\]/ig\' > /tmp/postdump.txt');
 
You would still end up bottlenecking at the DataWriters... since essentially you are spinning every individual content type and associated records through DataWriters. You certainly could speed it up exponentially by running that stuff in parallel across multiple machines... but still wouldn't be fast enough imo. Once I'm done rewriting all the core stuff I have, I'm gonna see about making a really, really fast importer (I'm shooting for a sub-2 hour downtime to import ~20M posts, 3M threads, 750k users, etc.... including necessary cache rebuilding in that time frame).

I've mucked with it a little bit, and was able to do 15M posts (just raw posts, not threads and things) in about 15 minutes, so... We'll see what I can come up with when I really have time to work on it.
wow Shawn to the rescue :D(y)
 
Top Bottom