Always rebuild search index after ElasticSearch crash?

DeltaHF

Well-known member
ElasticSearch crashed on my server, and it was down for about 12 hours before I was able to restart the service (it fixed the issue, though I still don't know the reason for the crash).

Of course, content added to the site while ES was down does not appear to be in the index. Do I need to rebuild the entire search index to get this missing content back into it?
 
Thanks, Andy.
I'm running 1.4.2. Looking back, the crash happened around the time a 5GB tar file was being made; my theory is that may have consumed a bit too much memory and caused ES to fail. This is not, however, the first time I have made such a large tar file, and ES has been running for months without issue.

Are there any log files I should check? The ones in /var/log/elasticsearch don't show anything useful.
 
I thought search index operations were queued in the event it was not possible to index an entry? @Mike can you comment on this? It's not unusual for Elasticsearch to fall over now and then and re-indexing every time it does so seems pretty extreme, especially on a 10 mill post forum. If this is the case, then adding failed items to a queue (for retrying) triggered via cron might overcome this on future updates to enhanced search.
 
They are queued, with increasing delays before repeating the action of 1, 2, 4, 8 and 16 hours (that's the time between each try). Unless you see errors in the server error log indicating that indexing failed more than 5 times and it was skipped, the data should eventually appear, but it could be 8 - 16 hours later. A reindex would bring it in immediately.
 
I thought that was case... so, to fail 5 times, lets do the math..... the elastic search instance would need to be offline for 31 hours straight for indexing to be missed - that's a pretty nice window @DeltaHF.
 
In that case your ES_HEAP_SIZE should be 10GB if I understand correctly.
Nah, I've got ~16 million posts in ~1.5gb of ram for a 3 node Elastic Search cluster on some Linode VPSs, and it works absolutely fine.

SSDs offer massive performance saving for Elastic Search as it will just trade IOPs for memory usage. And modern SSDs that Linode and such use have IOPs to spare.

https://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

Economically, copious amounts of RAM does not make sense. Yes, you guessed it, this is about Solid State Drives.
  • Their price is 1/10 of RAM (or 1/5 if you want RAID 1)
  • They suffer a lot less from the cleared disk cache problem
  • They can be easily RAIDed for TB-scale
  • They even draw less power than the same amount of RAM
...
Conclusion
Using SSDs as storage for search delivers near maximum performance at a fraction of the cost of an equivalent RAM solution.

Throwing more memory at Elastic Search isn't always desireable: https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

To begin with, do not use a very large java heap if you can help it: set it only as large as is necessary (ideally no more than half of the machine's RAM) to hold the overall maximum working set size for your usage of Elasticsearch
 
I haven't actually changed over to ES v2 (or XF 1.5.3 & XFES 1.1.3) yet due to time constraints.

I understand, I'm in the same situation, planning to switch later this week. Whenever you do eventually switch, if you experiment with heap sizes I'd be curious to hear the results.
 
Top Bottom