How Indexing Works?

sajal

Active member
We've around 5M posts, and I'm doing indexing it's been 5 hours now. We're using Elastic search with ES plugin.

I can't understand how it works, since when I was looking at the "Rebuiling XXXX", where XXXX (Post Id) was constantly increasing. And now it seems that it suddenly started from beginning. I mean currently the XXXX ID is much lesser than it was earlier.

Also, under elastic-search statistics, I can see "10 843 399 (11,7 GB)" records are indexed, while we do total have 5.2M records only, so why are there almost double records showing in the index...?? I guess if same record is indexed, it is supposed to update instead of insert??

Can someone please help how exactly this indexing works, I don't have any idea atm when my indexing will be finished....
 
Ok, the indexing is completed, and I realized why the XXXX number was suddenly decreased from the higher number. It might have been indexing different content types one by one, first Post then Page, so that made that indexing POST-ID varied.

But still, I'm wondering about the number of records that are shown as indexed: 10 843 399

We do have total records including post, profile_post, thread, profile_post_comment and page:

xf_page ==> 0
xf_post ==> 5218023
xf_profile_post ==> 102
xf_thread ==> 548785
xf_profile_post_comment ==> 7

So as you can see, it is not more than 6M, but the indexed record count shows almost 10.8M.

Could it be due to that, we had multiple failures earlier while indexing, and it could have left few stale records in the index??
 
@duderuud Also, I have found interesting thing, the admin elastic screen shows me almost the double records than it should be:

Total records should be in the index as per DB: 5718698
While it shows almost double of that: 11437396

So, for some reason, AWS Elastic instance shows twice the records that are indexed. So, if you index 100 articles, on the admin side, AWS Elastic instance shows you 200 articles indexed :)

Pretty interesting!
 
Top Bottom