PaulB
Well-known member
- Affected version
- 2.2.7 Patch 1
We recently upgraded from XF1.5.x to XF2.2.7 Patch 1. This was quite an ordeal, as we had a lot of custom add-ons to port, but it's gone relatively well so far. That being said, xf_session_activity has become an issue. Today I woke up to 18 pages of error logs, and that's been happening consistently since we upgraded.
As a disclaimer, we're not using the MEMORY engine, which seems to be the default for xf_session_activity. We can't use it: except under very limited circumstances, MEMORY can't be used in conjunction with replication:
It also shouldn't matter--in fact, in our case, it's quite likely that using the MEMORY engine would exacerbate the issue:
Here's the problematic query:
And the schema:
Differences from vanilla:
xf_session_activity doesn't need to be perfect, especially for guests. If someone loads multiple pages simultaneously, it matters very little which ends up running the query last--since it runs post-dispatch and doesn't check that the new timestamp is ahead of the old timestamp, requests that take longer to serve can already overwrite data from newer requests that didn't take as long to execute.
This issue could likely be resolved by running three separate queries outside of a transaction--there's no need for this to be atomic:
Additional debugging info:
As a disclaimer, we're not using the MEMORY engine, which seems to be the default for xf_session_activity. We can't use it: except under very limited circumstances, MEMORY can't be used in conjunction with replication:
When a replica server shuts down and restarts, its MEMORY tables become empty. This causes the replica to be out of synchrony with the source and may lead to other failures...
It also shouldn't matter--in fact, in our case, it's quite likely that using the MEMORY engine would exacerbate the issue:
Despite the in-memory processing for MEMORY tables, they are not necessarily faster than InnoDB tables on a busy server, for general-purpose queries, or under a read/write workload. In particular, the table locking involved with performing updates can slow down concurrent usage of MEMORY tables from multiple sessions.
Here's the problematic query:
SQL:
-- XFDB=noForceAllWrite
INSERT INTO xf_session_activity
(`user_id`, `unique_key`, `ip`, `controller_name`, `controller_action`, `view_state`, `params`, `view_date`, `robot_key`)
VALUES
(?, ?, ?, ?, ?, ?, ?, ?, ?)
ON DUPLICATE KEY UPDATE ip = VALUES(ip),
controller_name = VALUES(controller_name),
controller_action = VALUES(controller_action),
view_state = VALUES(view_state),
params = VALUES(params),
view_date = VALUES(view_date),
robot_key = VALUES(robot_key)
And the schema:
SQL:
CREATE TABLE `xf_session_activity` (
`user_id` int unsigned NOT NULL,
`unique_key` varbinary(16) NOT NULL,
`ip` varbinary(16) NOT NULL DEFAULT '',
`controller_name` varchar(100) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`controller_action` varchar(75) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`view_state` enum('valid','error') CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`params` varbinary(100) NOT NULL,
`view_date` int unsigned NOT NULL,
`robot_key` varbinary(25) NOT NULL DEFAULT '',
PRIMARY KEY (`user_id`,`unique_key`) USING BTREE,
KEY `view_date` (`view_date`) USING BTREE,
KEY `user_id_view_date` (`user_id`,`view_date`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin
Differences from vanilla:
- Engine is InnoDB
- Collation, which shouldn't really affect anything here other than reducing the size of each row.
- An additional index
ON DUPLICATE KEY UPDATE
, REPLACE INTO
, and INSERT IGNORE
lately. ON DUPLICATE KEY UPDATE seems to be affected more than the other two. The current behavior is still probably better than the full table locks used by MEMORY, but it's far from ideal for a large site. There has been quite a bit of thrashing: in one MySQL release the locks will be more aggressive; people complain and the fix gets reverted, which leads to insufficient locking, and the cycle repeats.xf_session_activity doesn't need to be perfect, especially for guests. If someone loads multiple pages simultaneously, it matters very little which ends up running the query last--since it runs post-dispatch and doesn't check that the new timestamp is ahead of the old timestamp, requests that take longer to serve can already overwrite data from newer requests that didn't take as long to execute.
This issue could likely be resolved by running three separate queries outside of a transaction--there's no need for this to be atomic:
- SELECT to check whether it already exists. As part of this query, retrieve
view_date
. - If
view_date
is less than a second ago, don't bother performing any sort of write. Stop here. - If the record exists and
view_date
is old and close to the pruning cutoff, use REPLACE INTO. - If the record exists and
view_date
is not close to the pruning cutoff, use UPDATE. Unlike REPLACE INTO, this is not affected by the locking bug in MySQL, and this will be the most common write performed on most large sites. - If the record doesn't exist, use INSERT IGNORE.
Additional debugging info:
- We're running MySQL 8.0.26.
- We've ruled out issues with other queries that interact with xf_session_activity. The locks that are causing problems are the exclusive result of the query pasted above running multiple times in parallel with the same primary key.