I read a post at some point yesterday where someone asked about importing MediaWiki content. I can't find it now, but thought I'd post this anyway. I have a need to import our existing MediaWiki, so I'm giving it a go. So far, I've gotten basic content imported.
I did this by creating a MySQL view from the MW tables, which produces the structure of the XC pages table, as follows.
0) Back everything up. Do it now. You Have Been Warned. Here Be Table Eating Dragons.
1) Create a new column in the EWRcarta_pages table called mw_page_id, type INT, allow NULL, default NULL. This is just a temporary modification, we just need it in step 7.
2) (Optional) Copy the four mw_ tables references in the query in step 3 to your XF database. This isn't essential, you could just run the query on your MW database. But I'm just nervous about doing anything to my live wiki tables, even if in theory it is perfectly safe.
3) Either create a view with the following query, or just run the query and export the results as your format of choice. Depends what database tool you run. I use Navicat, so using stored queries is easier. If I was working in phpMyAdmin I'd probably create a view.
Code:
SELECT
mp.page_id AS mw_page_id,
mp.page_title AS page_slug,
mp.page_title AS page_name,
UNIX_TIMESTAMP(STR_TO_DATE(mp.page_touched, '%Y%m%d%H%i%s')) AS page_date,
'bbcode' AS page_type,
CONVERT(mwt.old_text USING utf8) AS page_content,
mpl.pl_from AS page_parent,
'0' AS page_index,
'0' AS page_protect,
'1' AS page_sidebar,
'1' AS page_sublist,
'0' AS page_likes,
'' AS page_like_users,
mp.page_counter AS page_views,
'0' AS thread_id,
'' AS page_groups,
'' AS page_users,
'' AS page_admins
FROM
mw_page AS mp
LEFT JOIN
mw_pagelinks AS mpl ON mp.page_title = mpl.pl_title
INNER JOIN
mw_revision AS mr ON mp.page_latest = mr.rev_id
INNER JOIN
mw_text AS mwt ON mr.rev_text_id = mwt.old_id
WHERE mpl.pl_from IS NOT NULL
GROUP BY mp.page_title
ORDER BY mp.page_title
4) Sanity check the results. If it looks like your full set of most recent revisions for each page which has an active link to it, you are good to go.
5) Export the view / query results from step 3 as CSV or .sql, or whatever you find easiest to work with.
6) Import it into your EWRcart_pages table. Up to you whether you append or empty the XC table first. I appended.
7) Run this query one time, which re-maps the page_parent from the MW page id to the new XC page id in the page_parent.
Code:
UPDATE `EWRcarta_pages` AS e
LEFT JOIN EWRcarta_pages AS p ON p.mw_page_id = e.page_parent
SET e.page_parent = p.page_id
WHERE e.mw_page_id IS NOT NULL;
8) Check your XenCarta front end. It should now have your MW content, although most likely with the index / link structure kinda messed up.
9) Optionally delete the mw_page_id column added in step 1.
The problem I am having which may not be solveable, is established "the" (singular) parent of any given page.
MediaWiki using a many-to-many relationship, with no concept of which of those relationships between pages is "the" parent. Whereas XenCarta has a much simpler one-to-many link structure, so each page has a singular parent.
So the query in step 3 doesn't address that, and just uses whichever link id pops out of the "GROUP BY" that condenses the multiple revision <-> page relationships down to a single row. I'm playing around to see if I can improve this, but I'm at the point where it'll just be quicker to do it by hand in the XC front end, as I only have about 250 pages.
[BTW one benefit of doing this is it has gotten rid of several thousand 'orphaned' spam pages laying around in our MW tables]
The next step is obviously the Markdown to BBCode issue. There doesn't seem to be an existing conversion utility, just a few code fragments scattered around the Googlewebz. So I guess I'll go ahead and have a go at writing one myself. The basic formatting stuff will be easy enough, it's the images and links which will require some work, cross referencing the existing MW image and page tables.
I'll follow up soon(ish) with news on that, and maybe some code.
-- hugh