Massive performance issue in banning user with many messages in approval queue

Affected version
v2.3.7
We recently got a new pattern of spammers that causes some annoying fallout when trying to spam clean them.

They will register and then continuously post spam messages, while these land all in the approval queue (good!) it gets relatively big, well, not really that big, just close to triple digits.
When one then tries to use the spam cleaner that just times out, or maybe just needs extremely long to run in the backend.
This then leads to any further action trying to manually ban the user to run into a transaction lock timeout, like:

Code:
XF\Db\Exception: Job XF\Job\ApprovalQueueProcess: MySQL query error [1205]: Lock wait timeout exceeded; try restarting transaction src/XF/Db/AbstractStatement.php:230

INSERT  INTO [ICODE]xf_user_ban[/ICODE] ([ICODE]user_id[/ICODE], [ICODE]ban_user_id[/ICODE], [ICODE]end_date[/ICODE], [ICODE]triggered[/ICODE], [ICODE]user_reason[/ICODE], [ICODE]ban_date[/ICODE]) VALUES (?, ?, ?, ?, ?, ?)
------------

#0 src/XF/Db/Mysqli/Statement.php(207): XF\Db\AbstractStatement->getException('MySQL query err...', 1205, 'HY000')
#1 src/XF/Db/Mysqli/Statement.php(83): XF\Db\Mysqli\Statement->getException('MySQL query err...', 1205, 'HY000')
#2 src/XF/Db/AbstractAdapter.php(96): XF\Db\Mysqli\Statement->execute()
#3 src/XF/Db/AbstractAdapter.php(219): XF\Db\AbstractAdapter->query('INSERT  INTO `x...', Array)
#4 src/XF/Mvc/Entity/Entity.php(1578): XF\Db\AbstractAdapter->insert('xf_user_ban', Array, false)
#5 src/XF/Mvc/Entity/Entity.php(1306): XF\Mvc\Entity\Entity->_saveToSource()
#6 src/XF/Repository/BanningRepository.php(61): XF\Mvc\Entity\Entity->save(false)
#7 src/XF/Spam/Cleaner.php(89): XF\Repository\BanningRepository->banUser(Object(ThemeHouse\UIX\XF\Entity\User), 0, 'Spam. Please co...', NULL)
#8 src/XF/ApprovalQueue/AbstractHandler.php(265): XF\Spam\Cleaner->banUser()
#9 src/XF/ApprovalQueue/ThreadHandler.php(45): XF\ApprovalQueue\AbstractHandler->_spamCleanInternal(Object(ThemeHouse\UIX\XF\Entity\User))
#10 src/XF/ApprovalQueue/AbstractHandler.php(201): XF\ApprovalQueue\ThreadHandler->actionSpamClean(Object(XF\Entity\Thread))
#11 src/XF/Job/ApprovalQueueProcess.php(71): XF\ApprovalQueue\AbstractHandler->performAction('spam_clean', Object(XF\Entity\Thread))
#12 src/XF.php(906): XF\Job\ApprovalQueueProcess->XF\Job\{closure}()
#13 src/XF/Job/ApprovalQueueProcess.php(57): XF::asVisitor(Object(ThemeHouse\UIX\XF\Entity\User), Object(Closure))
#14 src/XF/Job/Manager.php(275): XF\Job\ApprovalQueueProcess->run(8)
#15 src/XF/Job/Manager.php(205): XF\Job\Manager->runJobInternal(Array, 8)
#16 src/XF/Job/Manager.php(121): XF\Job\Manager->runJobEntry(Array, 8)
#17 job.php(25): XF\Job\Manager->runByIds(Array, 8)
#18 {main}

The lock timeout of the DB is 50s, which should be more than enough to process the spam cleaning of a single user in the DB.

The rest of the system responds fast, the load on server and pressure stall information is far from being high, we host this ourselfes and also see that the underlying dedicated server is rather far from being overloaded.
So this is definitively not the result of an overloaded DB or server, rather it seems that you start a transaction early when banning a user, and then do something that scales really badly if there are multiple messages of that users being in the approval queue.
I did not check closely, but it might do some recursively check/do something during the started DB transaction that then has some bad big-O notation scaling and thus causes massive delays if a User got high double-digits messages in the approval queue; maybe some steps are also done multiple times when they could be done only once (like IP checking, but now I'm just guessing around here to be honest)

FWICT, the transaction timeout lock errors are then not an issue per se, they are just symptoms of what is a way to long transaction for spam cleaning a single user with a high double-digit message count in the approval queue.

You know your code best, but IMO it might be good to set a user to being banned in the DB very early when spam cleaning and ideally outside the transaction (or as separate transaction), ensuring that this user cannot cause any new messages to be posted while the spam cleaner is running.
Then it might be worth into looking if the spam cleaner has some (nested) loops that it should not have, if the user is marked as banned early, the processing could be also done in smaller transaction batches, which would ensure that some progress is made when cleaning up a user.
Orthogonal to that it might be nice to block a user from posting any new message if they have over (rolls dice) 5 messages already queued for approval, as in such cases there's likely a spam problem with that user anyway.

In any case, I'd appreciate it if you could look into the architecture of the spam cleaner to see if there are some design problems that might cause a much higher runtime than possible, especially if there are many messages from the user one tries to spam-clean in the approval queue.
 
Oh, one thing that might be relevant here too, we configured the spam cleaner to move threads into a hidden forum to still be able to restore threads while ordinary moderator users not getting their forum list views crammed with the soft-deleted spam messages, which is really annoying for all our staff to wade through.

When checking the DB log for slow queries, I saw that a UPDATE `xf_thread` SET `node_id` = 'XX', `discussion_state` = 'visible' WHERE `thread_id` = YYY; statement required long time to run, so I figured that it might be stemming from that "Spam cleaner thread action" option. I changed it to "permanently delete" and restarted the DB to ensure all pending transactions are cleared. After that, using the spam cleaner was much quicker (<< 3s, maybe even less).

So this might be only really an issue in combination with the "Move to forum" spam cleaner thread action option.
 
Back
Top Bottom