1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Design Issue Soft-deleting posts in active threads is deadlock prone

Discussion in 'Resolved Bug Reports' started by Xon, Jan 21, 2016.

  1. Xon

    Xon Well-Known Member

    I've got a case where a user is number of soft-deleting posts in a busy thread which reliably causing a dozen or so deadlocks.

    Stack trace:
    #0 /var/www/html/library/Zend/Db/Statement.php(297): Zend_Db_Statement_Mysqli->_execute(Array)
    #1 /var/www/html/library/Zend/Db/Adapter/Abstract.php(479): Zend_Db_Statement->execute(Array)
    #2 /var/www/html/library/Zend/Db/Adapter/Abstract.php(632): Zend_Db_Adapter_Abstract->query('UPDATE `xf_post...', Array)
    #3 /var/www//html/library/XenForo/DataWriter.php(1654): Zend_Db_Adapter_Abstract->update('xf_post', Array, '(post_id = 1726...')
    #4 /var/www/html/library/XenForo/DataWriter.php(1623): XenForo_DataWriter->_update()
    #5 /var/www/html/library/XenForo/DataWriter.php(1419): XenForo_DataWriter->_save()
    #6 /var/www/html/library/SV/DeadlockAvoidance/XenForo/DataWriter/DiscussionMessage/Post.php(11): XenForo_DataWriter->save()
    #7 /var/www/html/library/XenForo/Model/Post.php(1216): SV_DeadlockAvoidance_XenForo_DataWriter_DiscussionMessage_Post->save()
    #8 /var/www/html/library/XenForo/ControllerPublic/Post.php(351): XenForo_Model_Post->deletePost(17263383, 'soft', Array, Array)
    #9 /var/www/html/library/XenForo/FrontController.php(351): XenForo_ControllerPublic_Post->actionDelete()
    #10 /var/www/html/library/XenForo/FrontController.php(134): XenForo_FrontController->dispatch(Object(XenForo_RouteMatch))
    #11 /var/www/html/index.php(13): XenForo_FrontController->run()
    #12 {main}
    Note; SV/DeadlockAvoidance just defers _afterPostSaveTransaction() till after the transaction if the DataWriter is in a nested another DataWriter, and isn't doing anything in this code path.

    My guess is there is another delete operation running at the same time as this delete operation is updating the post. Besides rate limiting deleting posts, I'm not sure how to reduce the chance of these errors happening with the DataWriters and how much stuff runs in a Post's _postSave()/_messagePostSave() methods.

    Moving search indexing & deleting alerts out of _postSave to _postSaveAfterTransaction would reduce the size and scope of what is pulled inside the transaction.

    This might just get marked as a design issue, but I thought I'ld raise it as I've seen it consistently being generated.
    semprot likes this.
  2. Xon

    Xon Well-Known Member

    Ah, there is not assertNotFlooding() call in the XenForo_ControllerPublic_Post::actionDelete(), despite deleting posts in a large thread being slow.
  3. Mike

    Mike XenForo Developer Staff Member

    I think this does need to be marked as a design issue as it's not really viable to make sweeping changes to how a lot of this works at this point. There are things we have in mind for the future that can make it easy to move components out of specific transactions though... :whistle:

    That said, if it happens again, can you get the deadlock details from InnoDB? I'm curious which queries are holding locks (though it can be difficult to analyze).
    Divvens and NixFifty like this.
  4. Xon

    Xon Well-Known Member

    I haven't managed to build a reproduction test case, but after looking at the usage pattern I believe this is the result of a long running update against the xf_post.position for virtually all the posts in a very long thread causing the xf_post.position update to cause lock timeout which is raised as a deadlock exception.

    A further complication is I have the xf_post table compressed for nearly 70-80% disk space savings, which increases the cost of large updates to the posts table.

    The only real fix is to batch update positions to reduce the transaction size, and just allow the transient post positions to be viewed incorrectly.
  5. Mike

    Mike XenForo Developer Staff Member

    FWIW, I believe this should manifest itself as a lock_wait_timeout error. I can certainly understand the fairly slow update (particularly if there's compression involved), I'm just not clear what the initial xf_post update is doing to trigger a deadlock on its first access, though there are some non-atomic things that happen within InnoDB. If you happen to notice it reasonably soon after it happens, checking the InnoDB engine status should still show it (it shows the last deadlock).
    Xon likes this.

Share This Page