Fixed ALLOW_DEADLOCK_RERUN combined with Entity#save

PaulB · Mar 24, 2022

Several calls to AbstractAdapter#executeTransaction in the UserAlert repository pass the ALLOW_DEADLOCK_RERUN option, but that option can't be combined with entities unless forceSet is set on the affected entities. Any attempt to re-run the closure in response to a deadlock will result in a LogicException, thereby masking the underlying deadlock and making it difficult to debug. This is especially painful given how difficult it is to debug deadlocks in the first place.

There are safe uses of ALLOW_DEADLOCK_RERUN elsewhere in the vanilla XenForo codebase, such as in the Forum repository.

Note that although this is related to https://xenforo.com/community/threads/logicexception-attempted-to-set-alerts_unviewed.188223/, it's not the cause of that issue, and fixing this bug will not fix that bug; it will just change the error message.

Xon · Mar 24, 2022

Most of the calls inside explicit transactions for User Alerts can be easily rewritten into direct update statements which avoid transaction read/modify/write cycle.

This would reduce deadlocks, and also lock-times

Jeremy P · Mar 24, 2022

I've haven't dived too deep into everything but, looking at the outstanding issues as a whole, it seems like we'll need to:

Run SELECT user_id FROM xf_user WHERE user_id = ? FOR UPDATE first to make lock orders more consistent
Replace entity updates with update queries to avoid race conditions (and just call Entity::setAsSaved after)

Xon · Mar 24, 2022

Jeremy P said:
I've haven't dived too deep into everything but, looking at the outstanding issues as a whole, it seems like we'll need to:

Run SELECT user_id FROM xf_user WHERE user_id = ? FOR UPDATE first to make lock orders more consistent

Replace entity updates with update queries to avoid race conditions (and just call Entity::setAsSaved after)

Something like how I've implemented markAlertIdsAsReadAndViewed in my Alert Improvements add-on could be used as inspiration.

Basically decompose into the following steps;

If in a transaction:
- Use for update on the xf_user record to hold a lock
Update xf_user_alert.view_date (or read_date), and capture the changed rows counts.
Update xf_user record (if needed!) with an update statement with the previous changed row counts.
- If not needed, refresh the alert counts?
Update in-memory objects via Entity::setAsSaved (this may include alert entities).

All the steps can be done without any explicit transactions and is very resistant to deadlocking.

Xon · Mar 24, 2022

Sadly there are a few functions, not just the reported one which need updating to have the same pattern and lock ordering:

markUserAlertsViewed
markUserAlertsRead
markUserAlertViewed
markUserAlertRead
markSpecificUserAlertsRead
markUserAlertsReadForContent
markUserAlertUnread

Implementing these in my Alert Improvements add-on basically stopped deadlocks from the alert feature.

Jeremy P · Mar 24, 2022

Yeah, I was looking at making those structural changes to the alert repo as a whole.

Xon said:
Something like how I've implemented markAlertIdsAsReadAndViewed in my Alert Improvements add-on could be used as inspiration.

That's helpful, thanks.

Jeremy P · Aug 20, 2024

In 2.3.3, we've made several changes to alert read/view marking:

User counts are updated with direct queries rather than via the entity system
Marking particular alerts read or unread does not happen inside a transaction
When marking all alerts read/unread, or already inside a transaction, the user table is locked or updated first to help ensure a consistent locking order

Many thanks to @Xon and Alert Improvements for the inspiration

It's possible we can still make further improvements here, so feel free to open up a new report if so.

Xon · Aug 20, 2024

From memory deleteAlertsInternal (especially when called by pruneViewedAlerts / pruneUnviewedAlerts) is rather deadlock prone as it deleted the alerts and then updates many xf_user records in the same transaction.

Jeremy P · Aug 20, 2024

Yeah, I have that report on my radar:

M

Confirmed Thread 'Alert pruning leads to memory exhaustion on a busy forum'

Jul 19, 2021

On a busy forum with a very busy alert table -around 8m records- hourly cron task fails due to memory exhaustion. The responsible part of the task is the alert pruning. The memory usage easily exceeds 256 MB which is the limit I use which is not low. I was able to overcome this only by using @Xon's Alert Improvements add-on which has a solution for this.

Xon · Aug 20, 2024

Jeremy P said:
Yeah, I have that report on my radar:

M

Confirmed Thread 'Alert pruning leads to memory exhaustion on a busy forum'

Jul 19, 2021

On a busy forum with a very busy alert table -around 8m records- hourly cron task fails due to memory exhaustion. The responsible part of the task is the alert pruning. The memory usage easily exceeds 256 MB which is the limit I use which is not low. I was able to overcome this only by using @Xon's Alert Improvements add-on which has a solution for this.

MilkyMeda

Replies: 2

Forum: Bug reports

Yeah, fixing the design flaws for that is going to require a job (or 2) I feel.

Alert improvement's implements it something like this;

Adjust various alert fetching queries to ignore alerts which are too old.
- This is important! It prevents those alerts ever risking being touched by row-locks or being inside a transaction.
Dump users who have too old alerts into an alert total rebuild queue (insert ignore, and on deadlock try again a little later).
Via a job, batch delete alerts which are too old.
- Limiting deleting to about 50k rows at a time.
- If a deadlock occurs, reduce the batch size by half and reschedule the job for +1 second)
Drain the alert total rebuild queue in a job.
- Separate users are in different transactions

This is likely a overly sensitive design around deadlock handling, but it has successfully scaled up SpaceBattles to handle over 560k-600k alerts per day.

I really need update push notifications to go into a queue so I can throttle them. Since they are repeatedly hitting 429 responses from the providers for too many requests during peak.

Jeremy P · Aug 20, 2024

FWIW 2.3 will retry rate-limited push notifications for ~10 hours before giving up, though maybe your needs are more robust.

Fixed ALLOW_DEADLOCK_RERUN combined with Entity#save

PaulB

Well-known member

Xon

Well-known member

Jeremy P

XenForo developer

Xon

Well-known member

Xon

Well-known member

Jeremy P

XenForo developer

Jeremy P

XenForo developer

Xon

Well-known member

Jeremy P

XenForo developer

Confirmed Thread 'Alert pruning leads to memory exhaustion on a busy forum'

Xon

Well-known member

Confirmed Thread 'Alert pruning leads to memory exhaustion on a busy forum'

Jeremy P

XenForo developer

Similar threads

We value your privacy