XF 1.3 Elasticsearch + Locked DB Strange Errors

Wesker · Apr 7, 2016

Please note one of the well known custom development 3rd party teams at xenForo have requested we post a ticket here as they are stuck on this issue.

Earlier today, we had an issue with a disk drive reaching it's max causing the server to go briefly down before we truncated the DB disk drive.

Since the server has been restored, everything appears to be running fine except the server error log is going crazy with errors.

Error 1: Elasticsearch

Elasticsearch is installed and appears to be running

ES running
[root@server config]# curl http://127.0.0.1:9200
{
"status" : 200,
"name" : "Arclight",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.4.5",
"build_hash" : "2aaf797f2a571dcb779a3b61180afe8390ab61f9",
"build_timestamp" : "2015-04-27T08:06:06Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

However it's being bombarded with pages and pages of errors

XenForo_Exception: Elasticsearch server returned no response. Is it running? Elasticsearch indexing failed - library/XenES/Search/SourceHandler/ElasticSearch.php:833

After this 3rd party team reviewed the

Error 2: Mysqli statement execute error : Lock wait timeout exceeded; try restarting transaction

The next issue is the db is being bombarded with the same sql issues over and over. They have informed me the server looks fine.

[2016-04-07 14:49:26,269][DEBUG][action.index ] [Arclight] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]

Also note I'm still use 1.3 (will upgrade to 1.5) once one more mod has been fixed to support 1.5.

Wesker · Apr 7, 2016

Members having a difficult time posting new threads and posts. Slow and locks up is the biggest problem.

Mike · Apr 7, 2016

Both of these make me think there's significant server load. What are the load averages?

Is there anything in Elasticsearch's log?

In terms of MySQL, what is showing in the process list (SHOW FULL PROCESSLIST; ) ? Running queries? Long running connections? How about InnoDB (SHOW INNODB STATUS; )

If this all happened due to a disk issue, then the side effects of that are likely the issue. This could be, for example, data corruption in either location, which may be causing various issues as a knock on.

Wesker · Apr 7, 2016

Mike said:
Both of these make me think there's significant server load. What are the load averages?

Will check on this in a moment. It doesn't appear we're being DDOS. The server is running fine it's just processing db queries.

Mike said:
Is there anything in Elasticsearch's log?

Elasticsearch log = /admin.php?brsql-log/ correct?

If so yes many searches. We can still load elasticsearch but errors everywhere so I assume others maybe getting errors.

Mike said:
In terms of MySQL, what is showing in the process list (SHOW FULL PROCESSLIST; ) ? Running queries? Long running connections? How about InnoDB (SHOW INNODB STATUS; )

Long running connections. Let me check on the rest for you.

Mike said:
If this all happened due to a disk issue, then the side effects of that are likely the issue. This could be, for example, data corruption in either location, which may be causing various issues as a knock on.

This is probably our best lead. What do you recommend we do here so we can quickly check this?

Wesker · Apr 7, 2016

By the way Mike I just want to say thank you for replying. You guys do a great job support wise.

MattW · Apr 7, 2016

Wesker said:
Elasticsearch log = /admin.php?brsql-log/ correct?

Try /var/log/elasticsearch/ on the server itself, not via the XenForo ACP.

Wesker · Apr 7, 2016

MattW said:
Try /var/log/elasticsearch/ on the server itself, not via the XenForo ACP.

[root@server ~]# /var/log/elasticsearch/
-bash: /var/log/elasticsearch/: is a directory

MattW · Apr 7, 2016

That's the directory where the logs are stored

Wesker · Apr 7, 2016

Okay waiting for more logs to compile here. File was accidentally removed. Will send that in a bit.

Main issue though is the db locks. It could all be connected to one thing.

Wesker · Apr 7, 2016

Wesker · Apr 8, 2016

1). Still no luck so far on the lock/deadlock in mySQL
2). I have yet to experience any issues with the search. We're getting errors but everything seems to be running smoothly with it.

#1 obviously the biggest issue.

Wesker · Apr 8, 2016

The server load is normal.

Wesker · Apr 8, 2016

Where an I send the inno status and process list to you guys?

Mike · Apr 8, 2016

You can send it in a conversation.

You'll also want to look at any MySQL logs, particularly from any point after the disk issue. They'll be in /var/log most likely.

Wesker · Apr 8, 2016

Sent list to your conversation inbox
Notified 3rd party team about reviewing sql logs

Wesker · Apr 8, 2016

Per developers:

"We have already looked into the sql logs as requested, but nothing there was helpful in resolving the issue."

Wesker · Apr 8, 2016

Just to confirm there is no data corruption.

Wesker · Apr 8, 2016

Is there anyway I can pay a fee to get this expedited by the xF team?

Mike · Apr 8, 2016

This isn't really something that's officially covered by our support (server config). The fact that everything happened after a hardware/disk issue rather than any change to XenForo certainly points to that as the underlying problem. I don't know what knock on effects it could have had to the data stores of Elasticsearch and MySQL, or even to the general functioning of the server.

Based on the logs you sent me, everything is just running very slow. I see evidence of queries waiting 30 seconds for a particular lock to be released. It's a trivial query itself, though I can't necessarily know what else the potential query is doing. There are a couple transactions running over 30 seconds (without showing them waiting in MySQL). The vast majority of XF pages would be under a second in normal operation.

Unfortunately, I don't have a great recommendation. It's worth pointing out that in general, the issues seem to be with writes. They're slow in MySQL and your Elasticsearch error was failed indexing. That might even point to something with the underlying file system. You could try restarting the services, maybe even wiping your Elasticsearch index and rebuilding it (you could do the similar by dumping and restoring the MySQL DB as well), though I'm not convinced that would help if there's not data corruption.

It certainly won't solve the issue, but you probably want to turn off your add-on that shows the online user status in posts. It will exacerbate the issues a bit.

Wesker · Apr 8, 2016

This is resolved now.

XF 1.3 Elasticsearch + Locked DB Strange Errors

Well-known member

Well-known member

XenForo developer

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

XenForo developer

Well-known member

Well-known member

Well-known member

Well-known member

XenForo developer

Well-known member

Similar threads

We value your privacy