1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. This forum has been archived. New threads and replies may not be made. All add-ons/resources that are active should be migrated to the Resource Manager. See this thread for more information.

How to use the CLI Importer

Discussion in 'Tips and Guides [Archive]' started by MGSteve, Oct 19, 2011.

  1. MGSteve

    MGSteve Well-Known Member

    This is a quick guide on how to use the CLI importer with your forum. Currently its in Alpha and only for vBulletin importers.

    This guide is for Linux as well, but should be easy enough to use for windows people.

    You will need SSH access to your server for this. CLI means command line interface, so you need to log into your server via SSH, a great client if you haven't got one is PuTTY - just google it.

    Ok, first thing - install the addon into xf, its located at <xf_install_root>/library/XFCliImporter/addon-XFCliImporter.xml

    Now start your import as usual, just make sure you select the 'multithread' option when selecting where you're importing it from.

    Currently the only thing that is multi-threaded is importing the threads and posts. Once you get to that step, you'll be asked a few questions

    xf_cli_1.png

    Now you need to know what your CPU is! a good way is to run
    cat /proc/cpuinfo, which will output all the CPU info.

    Code:
    processor      : 15
    vendor_id      : GenuineIntel
    cpu family      : 6
    model          : 44
    model name      : Intel(R) Xeon(R) CPU          E5620  @ 2.40GHz
    stepping        : 2
    cpu MHz        : 2400.145
    cache size      : 12288 KB
    physical id    : 1
    siblings        : 8
    core id        : 9
    cpu cores      : 4
    apicid          : 51
    fpu            : yes
    fpu_exception  : yes
    cpuid level    : 11
    wp              : yes
    flags          : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
    bogomips        : 4800.11
    clflush size    : 64
    cache_alignment : 64
    address sizes  : 40 bits physical, 48 bits virtual
    power management: [8]
    
    Example output.
    Now, I know I've got 2 Quad core Xeons, which makes 8 cores, 16 with hyper threading. In the above grab you can see its detailing 'processor 15', as the first one is 0, that's 16 processors as far as Linux is concerned.
    So, to go back to the page...
    xf_cli_1.png
    You need to click the 'Use Command Line Interface' option and then enter the path to the php library. This can be tricky as I found out - I appear to have two installs of PHP on the machine, but only one of them is used in Apache.
    So, I just ran 'whereis php' and got this reply

    php: /usr/bin/php /etc/php.ini /etc/php.d /usr/local/bin/php /usr/local/lib/php /usr/share/man/man1/php.1.gz
    So, its found it in /usr/bin/php and /usr/local/bin/php, in addition to the ini file, the config directory and the lib directory.

    So its either /usr/bin/php or /usr/local/bin/php. I ran the two commands with the -v parameter to get the version so I could compare them and it turns out /usr/local/bin/php is the right one in my case.

    So put your path to php into the box.

    The next question determines the number of processes to run. Ideally you want one per CPU core, so I entered 16 in there.

    Now, if you have taskset on your system using this will help ensure that the processes don't bounce from core to core, taskset tells linux to run the process on a specific core and not to change it. So this also wants to be the number of cores in your system, again I set this to 16.
    How do you know if you have taskset installed? Run it of course. type in taskset -v into the shell and you should see something like:

    Once you've set all that, simply click the import button. Next you'll see the command you need to paste into the shell window.
    Then all you do is sit back and relax! You'll see the world tick by in the form of updates every 30 secs.
    If you get the last line repeated time and time again then the processes have probbaly stopped running. I had this initially because I was running the wrong php version. In this case you need to check the import log, located in /tmp/import.log and then your usual php error logs to find out what's gone wrong.
    Once its completed, just click the button on the page to continue the import.

    its not really that hard to use, to be honest, but I thought that it may help some people to see whats involved before they start on it.
     
    Jake Bunce, Walter, Thomas P and 6 others like this.
  2. MGSteve

    MGSteve Well-Known Member

    Furthermore, I did a bit of an experiment to see if we could speed it up further.

    We can - use a ram drive :)

    I setup a ram drive and mounted it in the folder for the mysql database for XF. By default of course only a few tables are in myISAM, the rest are in innodb. This of course negates the use of a ram drive as no data is stored in the folder used by the ram drive for innodb tables, so I converted a few tables to myISAM format.

    xf_users, xf_user_field_val, xf_post, xf_thread, xf_ip & the conversation tables as well.

    Now, if you were going to do it now, on what I found out - don't bother with the conversation tables, unless you have a massive PM table in vB, we have around 82,000 messages in ours and it made no difference whether it was in innodb or myISAM.

    The key ones to put into myISAM would be xf_thread, xf_post and xf_ip.

    This were the times normally, without using a ramdrive

    xf_imp_2.png

    And this is using a ramdrive...
    xf_imp_2ram.png

    That's a hell of a speed difference! That's 415,800 threads and 4.5m posts. As you can see, most of the other items were almost exactly the same timings, so differences in server load can be ruled out. Bear in mind that the normal importer (i.e. not the CLI version) takes around 3 hours to import the threads & posts.

    Just a quick note re: the attachments - during the time the attachments were imported, I converted the post table which promptly stalled the attachments import, so you can really take around 15 mins off that time.

    You do of course have to factor in the time it takes mySQL to convert the tables to InnoDB afterwards, which is not insignificant. The IPs table with around 4.5m records and 640mb took around 10 mins for mySQL to convert, the post table which was 1.7GB took around 15 minutes, its still a decent time saving anyway. The thread table only too a couple of minutes to do, so that wasn't too bad. All in all, around 20-25 mins for the table conversion. So even with that in mind, there's still about an hour's time saved.
     

Share This Page