Fixed ALLOW_DEADLOCK_RERUN combined with Entity#save

PaulB

Well-known member
Affected version
2.2.8
Several calls to AbstractAdapter#executeTransaction in the UserAlert repository pass the ALLOW_DEADLOCK_RERUN option, but that option can't be combined with entities unless forceSet is set on the affected entities. Any attempt to re-run the closure in response to a deadlock will result in a LogicException, thereby masking the underlying deadlock and making it difficult to debug. This is especially painful given how difficult it is to debug deadlocks in the first place.

There are safe uses of ALLOW_DEADLOCK_RERUN elsewhere in the vanilla XenForo codebase, such as in the Forum repository.

Note that although this is related to https://xenforo.com/community/threads/logicexception-attempted-to-set-alerts_unviewed.188223/, it's not the cause of that issue, and fixing this bug will not fix that bug; it will just change the error message.
 
Most of the calls inside explicit transactions for User Alerts can be easily rewritten into direct update statements which avoid transaction read/modify/write cycle.

This would reduce deadlocks, and also lock-times
 
I've haven't dived too deep into everything but, looking at the outstanding issues as a whole, it seems like we'll need to:
  • Run SELECT user_id FROM xf_user WHERE user_id = ? FOR UPDATE first to make lock orders more consistent
  • Replace entity updates with update queries to avoid race conditions (and just call Entity::setAsSaved after)
 
I've haven't dived too deep into everything but, looking at the outstanding issues as a whole, it seems like we'll need to:
  • Run SELECT user_id FROM xf_user WHERE user_id = ? FOR UPDATE first to make lock orders more consistent
  • Replace entity updates with update queries to avoid race conditions (and just call Entity::setAsSaved after)
Something like how I've implemented markAlertIdsAsReadAndViewed in my Alert Improvements add-on could be used as inspiration.

Basically decompose into the following steps;
  • If in a transaction:
    • Use for update on the xf_user record to hold a lock
  • Update xf_user_alert.view_date (or read_date), and capture the changed rows counts.
  • Update xf_user record (if needed!) with an update statement with the previous changed row counts.
    • If not needed, refresh the alert counts?
  • Update in-memory objects via Entity::setAsSaved (this may include alert entities).
All the steps can be done without any explicit transactions and is very resistant to deadlocking.
 
Sadly there are a few functions, not just the reported one which need updating to have the same pattern and lock ordering:
  • markUserAlertsViewed
  • markUserAlertsRead
  • markUserAlertViewed
  • markUserAlertRead
  • markSpecificUserAlertsRead
  • markUserAlertsReadForContent
  • markUserAlertUnread
Implementing these in my Alert Improvements add-on basically stopped deadlocks from the alert feature.
 
In 2.3.3, we've made several changes to alert read/view marking:
  • User counts are updated with direct queries rather than via the entity system
  • Marking particular alerts read or unread does not happen inside a transaction
  • When marking all alerts read/unread, or already inside a transaction, the user table is locked or updated first to help ensure a consistent locking order
Many thanks to @Xon and Alert Improvements for the inspiration :) It's possible we can still make further improvements here, so feel free to open up a new report if so.
 
From memory deleteAlertsInternal (especially when called by pruneViewedAlerts / pruneUnviewedAlerts) is rather deadlock prone as it deleted the alerts and then updates many xf_user records in the same transaction.
 
Yeah, I have that report on my radar:

 
Yeah, I have that report on my radar:

Yeah, fixing the design flaws for that is going to require a job (or 2) I feel.

Alert improvement's implements it something like this;
  1. Adjust various alert fetching queries to ignore alerts which are too old.
    • This is important! It prevents those alerts ever risking being touched by row-locks or being inside a transaction.
  2. Dump users who have too old alerts into an alert total rebuild queue (insert ignore, and on deadlock try again a little later).
  3. Via a job, batch delete alerts which are too old.
    • Limiting deleting to about 50k rows at a time.
    • If a deadlock occurs, reduce the batch size by half and reschedule the job for +1 second)
  4. Drain the alert total rebuild queue in a job.
    • Separate users are in different transactions
This is likely a overly sensitive design around deadlock handling, but it has successfully scaled up SpaceBattles to handle over 560k-600k alerts per day.

I really need update push notifications to go into a queue so I can throttle them. Since they are repeatedly hitting 429 responses from the providers for too many requests during peak.
 
FWIW 2.3 will retry rate-limited push notifications for ~10 hours before giving up, though maybe your needs are more robust.
 
Back
Top Bottom