tlamprecht
Member
- Affected version
- v2.3.7
We recently got a new pattern of spammers that causes some annoying fallout when trying to spam clean them.
They will register and then continuously post spam messages, while these land all in the approval queue (good!) it gets relatively big, well, not really that big, just close to triple digits.
When one then tries to use the spam cleaner that just times out, or maybe just needs extremely long to run in the backend.
This then leads to any further action trying to manually ban the user to run into a transaction lock timeout, like:
The lock timeout of the DB is 50s, which should be more than enough to process the spam cleaning of a single user in the DB.
The rest of the system responds fast, the load on server and pressure stall information is far from being high, we host this ourselfes and also see that the underlying dedicated server is rather far from being overloaded.
So this is definitively not the result of an overloaded DB or server, rather it seems that you start a transaction early when banning a user, and then do something that scales really badly if there are multiple messages of that users being in the approval queue.
I did not check closely, but it might do some recursively check/do something during the started DB transaction that then has some bad big-O notation scaling and thus causes massive delays if a User got high double-digits messages in the approval queue; maybe some steps are also done multiple times when they could be done only once (like IP checking, but now I'm just guessing around here to be honest)
FWICT, the transaction timeout lock errors are then not an issue per se, they are just symptoms of what is a way to long transaction for spam cleaning a single user with a high double-digit message count in the approval queue.
You know your code best, but IMO it might be good to set a user to being banned in the DB very early when spam cleaning and ideally outside the transaction (or as separate transaction), ensuring that this user cannot cause any new messages to be posted while the spam cleaner is running.
Then it might be worth into looking if the spam cleaner has some (nested) loops that it should not have, if the user is marked as banned early, the processing could be also done in smaller transaction batches, which would ensure that some progress is made when cleaning up a user.
Orthogonal to that it might be nice to block a user from posting any new message if they have over (rolls dice) 5 messages already queued for approval, as in such cases there's likely a spam problem with that user anyway.
In any case, I'd appreciate it if you could look into the architecture of the spam cleaner to see if there are some design problems that might cause a much higher runtime than possible, especially if there are many messages from the user one tries to spam-clean in the approval queue.
They will register and then continuously post spam messages, while these land all in the approval queue (good!) it gets relatively big, well, not really that big, just close to triple digits.
When one then tries to use the spam cleaner that just times out, or maybe just needs extremely long to run in the backend.
This then leads to any further action trying to manually ban the user to run into a transaction lock timeout, like:
Code:
XF\Db\Exception: Job XF\Job\ApprovalQueueProcess: MySQL query error [1205]: Lock wait timeout exceeded; try restarting transaction src/XF/Db/AbstractStatement.php:230
INSERT INTO [ICODE]xf_user_ban[/ICODE] ([ICODE]user_id[/ICODE], [ICODE]ban_user_id[/ICODE], [ICODE]end_date[/ICODE], [ICODE]triggered[/ICODE], [ICODE]user_reason[/ICODE], [ICODE]ban_date[/ICODE]) VALUES (?, ?, ?, ?, ?, ?)
------------
#0 src/XF/Db/Mysqli/Statement.php(207): XF\Db\AbstractStatement->getException('MySQL query err...', 1205, 'HY000')
#1 src/XF/Db/Mysqli/Statement.php(83): XF\Db\Mysqli\Statement->getException('MySQL query err...', 1205, 'HY000')
#2 src/XF/Db/AbstractAdapter.php(96): XF\Db\Mysqli\Statement->execute()
#3 src/XF/Db/AbstractAdapter.php(219): XF\Db\AbstractAdapter->query('INSERT INTO `x...', Array)
#4 src/XF/Mvc/Entity/Entity.php(1578): XF\Db\AbstractAdapter->insert('xf_user_ban', Array, false)
#5 src/XF/Mvc/Entity/Entity.php(1306): XF\Mvc\Entity\Entity->_saveToSource()
#6 src/XF/Repository/BanningRepository.php(61): XF\Mvc\Entity\Entity->save(false)
#7 src/XF/Spam/Cleaner.php(89): XF\Repository\BanningRepository->banUser(Object(ThemeHouse\UIX\XF\Entity\User), 0, 'Spam. Please co...', NULL)
#8 src/XF/ApprovalQueue/AbstractHandler.php(265): XF\Spam\Cleaner->banUser()
#9 src/XF/ApprovalQueue/ThreadHandler.php(45): XF\ApprovalQueue\AbstractHandler->_spamCleanInternal(Object(ThemeHouse\UIX\XF\Entity\User))
#10 src/XF/ApprovalQueue/AbstractHandler.php(201): XF\ApprovalQueue\ThreadHandler->actionSpamClean(Object(XF\Entity\Thread))
#11 src/XF/Job/ApprovalQueueProcess.php(71): XF\ApprovalQueue\AbstractHandler->performAction('spam_clean', Object(XF\Entity\Thread))
#12 src/XF.php(906): XF\Job\ApprovalQueueProcess->XF\Job\{closure}()
#13 src/XF/Job/ApprovalQueueProcess.php(57): XF::asVisitor(Object(ThemeHouse\UIX\XF\Entity\User), Object(Closure))
#14 src/XF/Job/Manager.php(275): XF\Job\ApprovalQueueProcess->run(8)
#15 src/XF/Job/Manager.php(205): XF\Job\Manager->runJobInternal(Array, 8)
#16 src/XF/Job/Manager.php(121): XF\Job\Manager->runJobEntry(Array, 8)
#17 job.php(25): XF\Job\Manager->runByIds(Array, 8)
#18 {main}
The lock timeout of the DB is 50s, which should be more than enough to process the spam cleaning of a single user in the DB.
The rest of the system responds fast, the load on server and pressure stall information is far from being high, we host this ourselfes and also see that the underlying dedicated server is rather far from being overloaded.
So this is definitively not the result of an overloaded DB or server, rather it seems that you start a transaction early when banning a user, and then do something that scales really badly if there are multiple messages of that users being in the approval queue.
I did not check closely, but it might do some recursively check/do something during the started DB transaction that then has some bad big-O notation scaling and thus causes massive delays if a User got high double-digits messages in the approval queue; maybe some steps are also done multiple times when they could be done only once (like IP checking, but now I'm just guessing around here to be honest)
FWICT, the transaction timeout lock errors are then not an issue per se, they are just symptoms of what is a way to long transaction for spam cleaning a single user with a high double-digit message count in the approval queue.
You know your code best, but IMO it might be good to set a user to being banned in the DB very early when spam cleaning and ideally outside the transaction (or as separate transaction), ensuring that this user cannot cause any new messages to be posted while the spam cleaner is running.
Then it might be worth into looking if the spam cleaner has some (nested) loops that it should not have, if the user is marked as banned early, the processing could be also done in smaller transaction batches, which would ensure that some progress is made when cleaning up a user.
Orthogonal to that it might be nice to block a user from posting any new message if they have over (rolls dice) 5 messages already queued for approval, as in such cases there's likely a spam problem with that user anyway.
In any case, I'd appreciate it if you could look into the architecture of the spam cleaner to see if there are some design problems that might cause a much higher runtime than possible, especially if there are many messages from the user one tries to spam-clean in the approval queue.