Design issue Soft-deleting posts in active threads is deadlock prone

Xon

Well-known member
I've got a case where a user is number of soft-deleting posts in a busy thread which reliably causing a dozen or so deadlocks.

Stack trace:
Code:
#0 /var/www/html/library/Zend/Db/Statement.php(297): Zend_Db_Statement_Mysqli->_execute(Array)
#1 /var/www/html/library/Zend/Db/Adapter/Abstract.php(479): Zend_Db_Statement->execute(Array)
#2 /var/www/html/library/Zend/Db/Adapter/Abstract.php(632): Zend_Db_Adapter_Abstract->query('UPDATE `xf_post...', Array)
#3 /var/www//html/library/XenForo/DataWriter.php(1654): Zend_Db_Adapter_Abstract->update('xf_post', Array, '(post_id = 1726...')
#4 /var/www/html/library/XenForo/DataWriter.php(1623): XenForo_DataWriter->_update()
#5 /var/www/html/library/XenForo/DataWriter.php(1419): XenForo_DataWriter->_save()
#6 /var/www/html/library/SV/DeadlockAvoidance/XenForo/DataWriter/DiscussionMessage/Post.php(11): XenForo_DataWriter->save()
#7 /var/www/html/library/XenForo/Model/Post.php(1216): SV_DeadlockAvoidance_XenForo_DataWriter_DiscussionMessage_Post->save()
#8 /var/www/html/library/XenForo/ControllerPublic/Post.php(351): XenForo_Model_Post->deletePost(17263383, 'soft', Array, Array)
#9 /var/www/html/library/XenForo/FrontController.php(351): XenForo_ControllerPublic_Post->actionDelete()
#10 /var/www/html/library/XenForo/FrontController.php(134): XenForo_FrontController->dispatch(Object(XenForo_RouteMatch))
#11 /var/www/html/index.php(13): XenForo_FrontController->run()
#12 {main}

Note; SV/DeadlockAvoidance just defers _afterPostSaveTransaction() till after the transaction if the DataWriter is in a nested another DataWriter, and isn't doing anything in this code path.

My guess is there is another delete operation running at the same time as this delete operation is updating the post. Besides rate limiting deleting posts, I'm not sure how to reduce the chance of these errors happening with the DataWriters and how much stuff runs in a Post's _postSave()/_messagePostSave() methods.

Moving search indexing & deleting alerts out of _postSave to _postSaveAfterTransaction would reduce the size and scope of what is pulled inside the transaction.

This might just get marked as a design issue, but I thought I'ld raise it as I've seen it consistently being generated.
 
Ah, there is not assertNotFlooding() call in the XenForo_ControllerPublic_Post::actionDelete(), despite deleting posts in a large thread being slow.
 
I think this does need to be marked as a design issue as it's not really viable to make sweeping changes to how a lot of this works at this point. There are things we have in mind for the future that can make it easy to move components out of specific transactions though... :whistle:

That said, if it happens again, can you get the deadlock details from InnoDB? I'm curious which queries are holding locks (though it can be difficult to analyze).
 
That said, if it happens again, can you get the deadlock details from InnoDB? I'm curious which queries are holding locks (though it can be difficult to analyze).
I haven't managed to build a reproduction test case, but after looking at the usage pattern I believe this is the result of a long running update against the xf_post.position for virtually all the posts in a very long thread causing the xf_post.position update to cause lock timeout which is raised as a deadlock exception.

A further complication is I have the xf_post table compressed for nearly 70-80% disk space savings, which increases the cost of large updates to the posts table.

The only real fix is to batch update positions to reduce the transaction size, and just allow the transient post positions to be viewed incorrectly.
 
I haven't managed to build a reproduction test case, but after looking at the usage pattern I believe this is the result of a long running update against the xf_post.position for virtually all the posts in a very long thread causing the xf_post.position update to cause lock timeout which is raised as a deadlock exception
FWIW, I believe this should manifest itself as a lock_wait_timeout error. I can certainly understand the fairly slow update (particularly if there's compression involved), I'm just not clear what the initial xf_post update is doing to trigger a deadlock on its first access, though there are some non-atomic things that happen within InnoDB. If you happen to notice it reasonably soon after it happens, checking the InnoDB engine status should still show it (it shows the last deadlock).
 
Back
Top Bottom