Fixed Thread view counter update causing poor performance on galera cluster

Erik P. · Jan 23, 2016

The use of TRUNCATE TABLE in XenForo_Model_Thread::updateThreadViews is causing seizures on our galera cluster leading to poor performance. I replaced 'TRUNCATE TABLE xf_thread_view' with 'DELETE FROM xf_thread_view' and attached a graph of the results. The point where my change was deployed can be seen. The change above is the only one I made.

Screen Shot 2016-01-23 at 1.16.30 PM.webp

and a few moments later...

Screen Shot 2016-01-23 at 1.36.56 PM.webp

TRUNCATE TABLE causes a table lock which stalls the cluster. DELETE FROM locks only the rows its deleting and doesn't block new rows from being inserted at the same time. I recommend removing all uses of truncate table to make Xenforo scale better in the future.

Also, SELECT ... FOR UPDATE causes a table lock leading to a cascading failure when a bunch of people try to vote on a poll. That should also be refactored to avoid that lock.

Cheers.

Xon · Jan 24, 2016

@HWS
I think the TRUNCATE TABLE may also be what is responsible for cause deadlocks when inserting thread view counter rows.

FOR UPDATE should only be locking the rows that it touches assuming it is using an index, but I guess this behaves differently in a cluster.

Mike · Feb 12, 2016

Erik P. said:
TRUNCATE TABLE causes a table lock which stalls the cluster. DELETE FROM locks only the rows its deleting and doesn't block new rows from being inserted at the same time. I recommend removing all uses of truncate table to make Xenforo scale better in the future

It seems that the performance in Galera is effectively opposite of what you'd see in InnoDB, where truncate is documented to be much faster (https://dev.mysql.com/doc/innodb/1.1/en/innodb-other-changes-truncate.html). As this would be a delete without a where, the implication is that we'd have a table level lock in InnoDB with that approach. From http://dev.mysql.com/doc/refman/5.7/en/innodb-locks-set.html, "If you have no indexes suitable for your statement and MySQL must scan the entire table to process the statement, every row of the table becomes locked, which in turn blocks all inserts by other users to the table."

That said, what I've written isn't totally accurate in our case because the default approach is actually a MEMORY table. Conceptually though, truncation is usually expected to be faster than needing to do row-wise work on the table.

The situation in general is difficult if you're using different table types than the expected (I know you can't use a memory table with the cluster) that means different performance considerations than we designed for. When the performance seems to inverted in this situation, it makes it doubly difficult. This is quite possibly a situation where, since you're using a non-standard setup, your needs would need to be resolved with environment-specific changes.

I'd be curious if anyone using the standard memory setup has noticed any performance blips from thread view updates that would be down to the truncate table statement?

Erik P. said:
Also, SELECT ... FOR UPDATE causes a table lock leading to a cascading failure when a bunch of people try to vote on a poll. That should also be refactored to avoid that lock.

As noted by Xon, there's no way this should be triggering a table lock. The specific query has a where matching the primary key, so it should only be locking the row (or the page) and maybe the gaps around it. A lock here is explicitly needed for consistency (across several tables) when voting. It should ensure that poll votes are serialized in a consistent way. I don't think there's really a way around that.

Erik P. · Feb 12, 2016

@Mike thanks for your reply. The issue with FOR UPDATE is one of write consistency in the cluster. It's explained in detail in this article: https://www.mirantis.com/openstack-...anding-reservations-concurrency-locking-nova/

The crux is the FOR UPDATE write intent locks are an InnoDB feature unknown to the cluster. Under stress only one node can update the row, the others fail with a deadlock error. FOR UPDATE is effectively ignored by the cluster but still causes an issue with cluster stalls which do not occur when the FOR UPDATE cause is removed from the code. The time to fail to gain certification from the cluster is non-trivial, especially under load.

Alternative strategies for coding these queries to avoid the penalty of deadlock are offered in that article. Hopefully it helps.

digitalpoint · Feb 21, 2016

Personally I'd love to see a setting for this (to use TRUNCATE or DELETE FROM to purge a table). See this suggestion: https://xenforo.com/community/threads/method-for-truncating-tables.96268/

In my situation, we use ndbcluster storage engine which doesn't allow schema changes while a hot backup is being performed. Internally, ndbcluster does a drop table and recreates it when you truncate a table (which is a schema change). Long story short is you end up with a transaction/table lock if you try to truncate a table while a backup is happening. I've worked around it by making an addon that replaces all the truncate table queries with DELETE FROM.

Erik P. · Mar 4, 2016

digitalpoint said:
Personally I'd love to see a setting for this (to use TRUNCATE or DELETE FROM to purge a table). See this suggestion: https://xenforo.com/community/threads/method-for-truncating-tables.96268/

In my situation, we use ndbcluster storage engine which doesn't allow schema changes while a hot backup is being performed. Internally, ndbcluster does a drop table and recreates it when you truncate a table (which is a schema change). Long story short is you end up with a transaction/table lock if you try to truncate a table while a backup is happening. I've worked around it by making an addon that replaces all the truncate table queries with DELETE FROM.

We also worked around the cluster compatibility issues with an addon.

Mike · Oct 9, 2017

2.0 has a config.php option (db-emptyMethod: truncate/delete) that controls what style of query is used when we want to empty a table.

Fixed Thread view counter update causing poor performance on galera cluster

Erik P.

Member

Xon

Well-known member

Mike

XenForo developer

Erik P.

Member

digitalpoint

Well-known member

Erik P.

Member

Mike

XenForo developer

We value your privacy