Partial fix User merging loses content

Xon

Well-known member
The summary table; xf_thread_user_post becomes out of sync if you merge two users who have posted in the same thread.

A number of new tables are also skipped:
  • xf_draft
  • xf_email_bounce_log
  • xf_email_bounce_soft
  • xf_notice_dismissed
  • xf_forum_watch
  • xf_forum_read
  • xf_thread_read
  • xf_upgrade_log
  • xf_user_field_value
  • xf_user_follow
  • xf_user_ignored
  • xf_user_tfa
  • xf_user_tfa_trusted
 
Some of these are intentional, though for potentially varying reasons.

As an example, the email bounce tables are generally tied to the email first and foremost. The user ID then follows from that. In the log table, the user ID is explicitly a snapshot in time. When merging users, we don't change the email address so it doesn't necessarily make sense to attach this record to a user that never had the email address. The soft bounce table uses user ID, but it's an extension of the main log table.

The TFA elements are similar. Indeed merging that could entirely break someone's account. It would be equivalent to updating their password with the source user's. Custom user fields also fit this pattern as we don't merge profile content.

In other cases, there's some debate as to whether we should be moving data over and it may vary depending on the specific use case. I think notice dismissal, drafts and read records follow here. I think with merging we still need to recognize that the users have been separate and a merge isn't always going to be identical to the same user having taken every single action/page view.

However, given that we move thread watch records, forum watch records should come over as well. Upgrade log is a reasonable change as well. We do actually move follow and ignore records, so this could just require a cache rebuild. The thread user post summary could be handled better.
 
Last edited:
That definitely sounds reasonable.

Hopefully in XF2, how users are merged will use the Job system to-do it in chunks. Updating tens of thousands of rows with a string replace is horrible for everyone else on the forum (ie Like data changes)

Additionally, xf_thread_user_post table needs special consideration as it gets out of date if you merge two users who have posted in the same thread. But probably beyond this bug report.
 
XF2 already uses a job system for merges and user renames that allows the queries to all be split up, though we don't currently split up the like and conversation queries (they could be though).

I've updated the areas I mentioned. I have updated the thread user post records as well. Ironically, XF2 was already handling this table, though the query was written differently; it was doing a self join on the update. MySQL has some issues with self joins in updates, though it may be version/context specific. I wrote the XF1 version with a sub-query (to a derived table) which may be safer on older versions of MySQL. It's:
Code:
UPDATE xf_thread_user_post AS target
INNER JOIN (
   SELECT s.thread_id, s.post_count
   FROM xf_thread_user_post AS s
   WHERE s.user_id = ? 
) AS source ON (source.thread_id = target.thread_id)
SET target.post_count = target.post_count + source.post_count
WHERE target.user_id = ?
versus:
Code:
UPDATE xf_thread_user_post AS source, xf_thread_user_post AS target
SET target.post_count = target.post_count + source.post_count
WHERE source.user_id = ?
   AND source.thread_id = target.thread_id
   AND target.user_id = ?
 
Actually it looks like self joins may be preferred, so I'll go with that. Self sub-queries do have issues, though as long as they get materialized to a table, they work. That is more likely to change unexpectedly.
 
MySQL's optimizer can do wonky things sometimes especially on self-joins sometimes.

I attempted to implement something like this (Deferrable changes across tables and slices of tables) for my XF1 User Essentials add-on (unreleased version), but it was tricky avoiding some bizarre optimizer behaviour where each update would trigger a table scan to find the sub-selected rows to update.

I ended up creating a queue table, and content inserted id's in backwards. Then paging through that queue table an applying matching id's to the content to update worked very efficiently.
 
Last edited:
Top Bottom