mattrogowski
Well-known member
- Affected version
- 2.2.16
On a large forum with some 22 million posts and very high reaction usage, coupled with high volume of username changes (mostly from a high volume of account deletions), the process to update the reaction user cache on content is painfully inefficient and leads to frequent table locks that then cause performance and stability issues on the site.
We have an addon that allows users to delete their own accounts, which are queued and fed in in small batches (3 at a time one an hour) so as not to overwhelm the database. However even with queuing 3 at a time, we were seeing table locks every time it ran.
The query that seems to always be causing table locks as part of
This was recently updated following this report from @Xon: https://xenforo.com/community/threa...andler-updaterecentcacheforuserchange.217245/
However, this query might work on smaller forums but on large ones, probably the number of rows returned in the join coupled with the seemingly expensive
Would it not be better to not try and do this in one query with a sub-query and finding rows where
As a proof of concept I changed it to this:
I’m not sure you actually need the
This seems to work as expected, and some 72 hours after making the change, with account deletes being processed at a rate of 5 every 5 minutes instead of 3 every 60 minutes, I have seen exactly 0 table locks logged by this new query. I think it’s a case of more queries being better than fewer if they’re all more efficient.
We have an addon that allows users to delete their own accounts, which are queued and fed in in small batches (3 at a time one an hour) so as not to overwhelm the database. However even with queuing 3 at a time, we were seeing table locks every time it ran.
The query that seems to always be causing table locks as part of
UserDeleteCleanUp
is this from XF\Reaction\AbstractHandler::updateRecentCacheForUserChange
:
SQL:
UPDATE (
SELECT content_id FROM xf_reaction_content
WHERE content_type = ?
AND reaction_user_id = ?
) AS temp
INNER JOIN xf_post AS reaction_table ON (reaction_table.`post_id` = temp.content_id)
SET reaction_table.`reaction_users` = REPLACE(reaction_table.`reaction_users`, ?, ?)
WHERE reaction_table.`reaction_users` LIKE ?
This was recently updated following this report from @Xon: https://xenforo.com/community/threa...andler-updaterecentcacheforuserchange.217245/
However, this query might work on smaller forums but on large ones, probably the number of rows returned in the join coupled with the seemingly expensive
LIKE
condition mean that it frequently causes table locks. We had 650+ instances of this in one week alone and it causes slowdowns and errors that causes a negative experience for users.Would it not be better to not try and do this in one query with a sub-query and finding rows where
reaction_users
contains a string, and instead update based on content ID? I think it would be more efficient to query the content that needs updating and target the rows by ID to update. Even if there were several thousand IDs, chunking them into batches and running the query multiple times would be more efficient.As a proof of concept I changed it to this:
PHP:
$contentIds = \XF::db()->fetchAllColumn("
SELECT content_id FROM xf_reaction_content
WHERE content_type = ?
AND reaction_user_id = ?
", [$this->contentType, $newUserId], 'content_id');
foreach (array_chunk($contentIds, 500) as $ids) {
$ids = implode(',', $ids);
\XF::db()->query("
UPDATE {$table}
SET `{$recentField}` = REPLACE(`{$recentField}`, ?, ?)
WHERE `{$primaryKey}` IN ({$ids})
", [$oldFind, $newReplace]);
}
I’m not sure you actually need the
LIKE
on it as it's only selected content IDs we know the user has already reacted to, and now that the WHERE
condition is simply based on a subset of content IDs it shouldn’t need to try and limit the query scope any further.This seems to work as expected, and some 72 hours after making the change, with account deletes being processed at a rate of 5 every 5 minutes instead of 3 every 60 minutes, I have seen exactly 0 table locks logged by this new query. I think it’s a case of more queries being better than fewer if they’re all more efficient.