• This forum has been archived. New threads and replies may not be made. All add-ons/resources that are active should be migrated to the Resource Manager. See this thread for more information.

How to use the CLI Importer

MGSteve

Well-known member
This is a quick guide on how to use the CLI importer with your forum. Currently its in Alpha and only for vBulletin importers.

This guide is for Linux as well, but should be easy enough to use for windows people.

You will need SSH access to your server for this. CLI means command line interface, so you need to log into your server via SSH, a great client if you haven't got one is PuTTY - just google it.

Ok, first thing - install the addon into xf, its located at <xf_install_root>/library/XFCliImporter/addon-XFCliImporter.xml

Now start your import as usual, just make sure you select the 'multithread' option when selecting where you're importing it from.

Currently the only thing that is multi-threaded is importing the threads and posts. Once you get to that step, you'll be asked a few questions

xf_cli_1.webp

Now you need to know what your CPU is! a good way is to run
cat /proc/cpuinfo, which will output all the CPU info.

Code:
processor      : 15
vendor_id      : GenuineIntel
cpu family      : 6
model          : 44
model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
stepping        : 2
cpu MHz        : 2400.145
cache size      : 12288 KB
physical id    : 1
siblings        : 8
core id        : 9
cpu cores      : 4
apicid          : 51
fpu            : yes
fpu_exception  : yes
cpuid level    : 11
wp              : yes
flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips        : 4800.11
clflush size    : 64
cache_alignment : 64
address sizes  : 40 bits physical, 48 bits virtual
power management: [8]
Example output.
Now, I know I've got 2 Quad core Xeons, which makes 8 cores, 16 with hyper threading. In the above grab you can see its detailing 'processor 15', as the first one is 0, that's 16 processors as far as Linux is concerned.
So, to go back to the page...
xf_cli_1.webp
You need to click the 'Use Command Line Interface' option and then enter the path to the php library. This can be tricky as I found out - I appear to have two installs of PHP on the machine, but only one of them is used in Apache.
So, I just ran 'whereis php' and got this reply

php: /usr/bin/php /etc/php.ini /etc/php.d /usr/local/bin/php /usr/local/lib/php /usr/share/man/man1/php.1.gz
So, its found it in /usr/bin/php and /usr/local/bin/php, in addition to the ini file, the config directory and the lib directory.

So its either /usr/bin/php or /usr/local/bin/php. I ran the two commands with the -v parameter to get the version so I could compare them and it turns out /usr/local/bin/php is the right one in my case.

So put your path to php into the box.

The next question determines the number of processes to run. Ideally you want one per CPU core, so I entered 16 in there.

Now, if you have taskset on your system using this will help ensure that the processes don't bounce from core to core, taskset tells linux to run the process on a specific core and not to change it. So this also wants to be the number of cores in your system, again I set this to 16.
How do you know if you have taskset installed? Run it of course. type in taskset -v into the shell and you should see something like:

taskset (util-linux 2.13-pre7)
usage: taskset [options] [mask | cpu-list] [pid | cmd [args...]]
set or get the affinity of a process
-p, --pid operate on existing given pid
-c, --cpu-list display and specify cpus in list format
-h, --help display this help
-v, --version output version information
Once you've set all that, simply click the import button. Next you'll see the command you need to paste into the shell window.
Then all you do is sit back and relax! You'll see the world tick by in the form of updates every 30 secs.
00:08:54 Approximately 248,600 threads remaining to import.
00:09:24 Approximately 246,300 threads remaining to import.
00:09:54 Approximately 244,100 threads remaining to import.
00:10:24 Approximately 242,300 threads remaining to import.
00:10:54 Approximately 240,900 threads remaining to import.
00:11:24 Approximately 238,900 threads remaining to import.
If you get the last line repeated time and time again then the processes have probbaly stopped running. I had this initially because I was running the wrong php version. In this case you need to check the import log, located in /tmp/import.log and then your usual php error logs to find out what's gone wrong.
Once its completed, just click the button on the page to continue the import.

its not really that hard to use, to be honest, but I thought that it may help some people to see whats involved before they start on it.
 
Furthermore, I did a bit of an experiment to see if we could speed it up further.

We can - use a ram drive :)

I setup a ram drive and mounted it in the folder for the mysql database for XF. By default of course only a few tables are in myISAM, the rest are in innodb. This of course negates the use of a ram drive as no data is stored in the folder used by the ram drive for innodb tables, so I converted a few tables to myISAM format.

xf_users, xf_user_field_val, xf_post, xf_thread, xf_ip & the conversation tables as well.

Now, if you were going to do it now, on what I found out - don't bother with the conversation tables, unless you have a massive PM table in vB, we have around 82,000 messages in ours and it made no difference whether it was in innodb or myISAM.

The key ones to put into myISAM would be xf_thread, xf_post and xf_ip.

This were the times normally, without using a ramdrive

xf_imp_2.webp

And this is using a ramdrive...
xf_imp_2ram.webp

That's a hell of a speed difference! That's 415,800 threads and 4.5m posts. As you can see, most of the other items were almost exactly the same timings, so differences in server load can be ruled out. Bear in mind that the normal importer (i.e. not the CLI version) takes around 3 hours to import the threads & posts.

Just a quick note re: the attachments - during the time the attachments were imported, I converted the post table which promptly stalled the attachments import, so you can really take around 15 mins off that time.

You do of course have to factor in the time it takes mySQL to convert the tables to InnoDB afterwards, which is not insignificant. The IPs table with around 4.5m records and 640mb took around 10 mins for mySQL to convert, the post table which was 1.7GB took around 15 minutes, its still a decent time saving anyway. The thread table only too a couple of minutes to do, so that wasn't too bad. All in all, around 20-25 mins for the table conversion. So even with that in mind, there's still about an hour's time saved.
 
Top Bottom