Cache Rebuild Error - ES Stopped Working, Again

Anthony Parsons · Apr 19, 2012

ES has been having issues on and off for the past few days, even though ES is running at the server every single time checked, now ES has completely stopped working.

The server software has not updated, XF has not been updated... yet all sites suddenly have issues with ES running. At present I've had to revert search back to mysql on my main sites, though using a small forum to play around with and test ES. So far, no avail to fixing this issue and any help is certainly welcome.

I have read through many threads and posts here about this same issue, tried many of the suggestions with adding limits, configurations, etc... still no avail.

Every time I go to rebuild the cache it gives the error: No response returned from Elasticsearch. Is it running?

A server error on a smaller site gives:

PHP:

Error Info
 
XenForo_Exception: Elasticsearch server returned no response. Is it running? Elasticsearch indexing failed for post- - library/XenES/Search/SourceHandler/ElasticSearch.php:721
Generated By: Anthony, 17 minutes ago
 
Stack Trace
 
#0 /home/mycombat/public_html/library/XenES/Search/SourceHandler/ElasticSearch.php(748): XenES_Search_SourceHandler_ElasticSearch->_logSearchResponseError(false, true, 'Elasticsearch i...')
#1 /home/mycombat/public_html/library/XenES/Search/SourceHandler/ElasticSearch.php(67): XenES_Search_SourceHandler_ElasticSearch->_assertIndexSuccessful(false, 'post')
#2 /home/mycombat/public_html/library/XenForo/Search/Indexer.php(125): XenES_Search_SourceHandler_ElasticSearch->finalizeRebuildSet()
#3 /home/mycombat/public_html/library/XenForo/CacheRebuilder/SearchIndex.php(93): XenForo_Search_Indexer->finalizeRebuildSet()
#4 /home/mycombat/public_html/library/XenForo/ControllerHelper/CacheRebuild.php(26): XenForo_CacheRebuilder_SearchIndex->rebuild(0, Array, NULL)
#5 /home/mycombat/public_html/library/XenForo/ControllerAdmin/Tools.php(78): XenForo_ControllerHelper_CacheRebuild->rebuildCache(Array, 'http://www.myco...', 'admin.php?tools...', true)
#6 /home/mycombat/public_html/library/XenForo/FrontController.php(310): XenForo_ControllerAdmin_Tools->actionCacheRebuild()
#7 /home/mycombat/public_html/library/XenForo/FrontController.php(132): XenForo_FrontController->dispatch(Object(XenForo_RouteMatch))
#8 /home/mycombat/public_html/admin.php(13): XenForo_FrontController->run()
#9 {main}
 
Request State
 
array(3) {
  ["url"] => string(57) "http://www.mycombatptsd.com/admin.php?tools/cache-rebuild"
  ["_GET"] => array(1) {
    ["tools/cache-rebuild"] => string(0) ""
  }
  ["_POST"] => array(5) {
    ["process"] => string(1) "1"
    ["caches"] => string(63) "[["SearchIndex",{"content_type":"","batch":"500","delay":"5"}]]"
    ["position"] => string(1) "0"
    ["redirect"] => string(51) "http://www.mycombatptsd.com/admin.php?tools/rebuild"
    ["_xfToken"] => string(53) "1,1334789763,1581f0874c143b78b236b8d0ea26fdb852d380e7"
  }
}

I have no idea what is wrong with this thing or why suddenly now.

Anyone with ideas would be greatly appreciated.

I have restarted it over and over, it is running from the server.

Screen Shot 2012-04-19 at 9.09.49 AM.webp

Slavik · Apr 19, 2012

Have your elasticsearch logs picked anything up?

Anthony Parsons · Apr 19, 2012

What would I run to see the logs?

Slavik · Apr 19, 2012

I cant remember the exact paths, but it would be in one of the folders in /elasticsearch

Anthony Parsons · Apr 19, 2012

It has been getting a lot of this it seems recently:

[2012-04-17 08:49:15,519][WARN ][netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to accept a connection.
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:163)
at org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink$Boss.run(NioServerSocketPipelineSink.java:244)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-04-17 08:49:16,031][WARN ][index.shard.service ] [Shiver Man] [mycombat_***][4] Failed to perform scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException: [mycombat_***][4] Refresh failed
at org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:789)
at org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:412)
at org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:699)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.FileNotFoundException: /var/elasticsearch/elasticsearch/nodes/0/indices/mycombat_***/4/index/_rb.prx (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.<init>(FSDirectory.java:441)
at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:306)
at org.elasticsearch.index.store.Store$StoreDirectory.createOutput(Store.java:416)
at org.elasticsearch.index.store.Store$StoreDirectory.createOutput(Store.java:388)
at org.apache.lucene.index.FormatPostingsPositionsWriter.<init>(FormatPostingsPositionsWriter.java:43)
at org.apache.lucene.index.FormatPostingsDocsWriter.<init>(FormatPostingsDocsWriter.java:57)
at org.apache.lucene.index.FormatPostingsTermsWriter.<init>(FormatPostingsTermsWriter.java:33)
at org.apache.lucene.index.FormatPostingsFieldsWriter.<init>(FormatPostingsFieldsWriter.java:51)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:113)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:70)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:60)
at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:581)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3623)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3588)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:452)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:401)
at org.apache.lucene.index.DirectoryReader.doOpenFromWriter(DirectoryReader.java:428)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:448)
at org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:396)
at org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:520)
at org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:764)
... 5 more

CTXMedia · Apr 19, 2012

These stand out:

Rich (BB code):

[2012-04-17 08:49:15,519][WARN ][netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to accept a connection.
java.io.IOException: Too many open files
Caused by: java.io.FileNotFoundException: /var/elasticsearch/elasticsearch/nodes/0/indices/mycombat_***/4/index/_rb.prx (Too many open files)

Unless they are normal for ES?

Slavik · Apr 19, 2012

Too many open files error, however thats parcular because usually that error gets passed along to XenForo and it shows up in the logs.

Take a look here, I posted the fix, and hopefully it will help you.

http://xenforo.com/community/threads/elastic-search-building-cache-error.29729/#post-341357

Anthony Parsons · Apr 19, 2012

Yer, tried that... no such luck.

Anthony Parsons · Apr 19, 2012

For example, I just rebooted the server then tried to connect. ES is running, same error at Xenforo sites when trying a cache rebuild.

Slavik · Apr 19, 2012

Anthony Parsons said:
Yer, tried that... no such luck.

What does

Code:

lsof | wc -l

return?

Anthony Parsons · Apr 19, 2012

Slavik · Apr 19, 2012

I have an idea what it may be, unfortunately its late and im about to go to bed, but it may be that the transportclient is hanging on a corrupt ES node.

For now I would suggest (if you haven't already) completely killing the ES and Java PID's.

Ensure the raised file limits are correctly set for your java/es users.

Updating both java and es, clearing your current indexes and re-indexing everything.

x4rl · Apr 19, 2012

Anthony Parsons said:
It has been getting a lot of this it seems recently:

[2012-04-17 08:49:15,519][WARN ][netty.channel.socket.nio.NioServerSocketPipelineSink] Failed to accept a connection.
java.io.IOException: Too many open files

Try Raising the limit in the config

Post about it here

Slavik · Apr 19, 2012

Issue resolved now (hopefully).

It apears one of the shards/nodes had been corrupted. So clearing them down resolved the issue.

Anthony Parsons · Apr 19, 2012

Yep, Slavik fixed this up for me, whom I am extremely grateful for his very fast assistance. Wiredtree are good, but their techs have pretty much no real experience or knowledge with this specific software to provide much assistance with... so again, thank you very much Slavik for your help to rectify this on my sites.

CTXMedia · Apr 19, 2012

Good call Slavik - nice one!

webroxau · Jun 4, 2012

What did you do to resolve this?

I had a site throwing similar errors yesterday. Luckily it was a test site so no big deal. BUT at the same time it was saying ES wasn't running, I was able to re cache a 1 million post board on the same server.

I ended up removing the ES folder, installing a new copy of the latest ES, and reindex. Issues gone.

Slavik · Jun 4, 2012

graham_w said:
What did you do to resolve this?

I had a site throwing similar errors yesterday. Luckily it was a test site so no big deal. BUT at the same time it was saying ES wasn't running, I was able to re cache a 1 million post board on the same server.

I ended up removing the ES folder, installing a new copy of the latest ES, and reindex. Issues gone.

Pretty much what you've done, clear down the old node and re-create it.

Cache Rebuild Error - ES Stopped Working, Again

Anthony Parsons

Well-known member

Slavik

XenForo moderator

Anthony Parsons

Well-known member

Slavik

XenForo moderator

Anthony Parsons

Well-known member

CTXMedia

Well-known member

Slavik

XenForo moderator

Anthony Parsons

Well-known member

Anthony Parsons

Well-known member

Slavik

XenForo moderator

Anthony Parsons

Well-known member

Slavik

XenForo moderator

x4rl

Well-known member

Slavik

XenForo moderator

Anthony Parsons

Well-known member

CTXMedia

Well-known member

webroxau

Active member

Slavik

XenForo moderator

Similar threads

We value your privacy