How to bulk load? #127

alexkreidler · 2022-11-22T04:28:23Z

alexkreidler
Nov 22, 2022

I have a 4.5 GB JSON file with about 34 million records that I would like to index with lnx. The README shows/describes a 27 million record dataset that is 18 GB indexed. How did you load this data into lnx? Lnx currently doesn't have a bulk loading feature: #35. I'm guessing you chunked the input file then did multiple POST requests to /documents. My main question is how often you made a POST to /commit, if that has an impact on speed and what you recommend.

ChillFish8 · 2022-11-22T11:06:35Z

ChillFish8
Nov 22, 2022
Maintainer

Indeed, for large datasets it's easiest just to chunk the data into blocks, typically I upload 250,000 documents at a time but this will depend on the server you're running lnx on, also the smaller the size of the documents, the more you can send in one go.

As a rule of thumb when indexing lnx should be using the number of threads allowed to index at 90-100% CPU usage, i.e. If I allocate 8 threads to the indexer and I have an 8-core machine when indexing I would aim to be using 90-100% of those 8 cores while indexing for maximum throughput, if it's using less than that then you're probably not uploading documents to it as fast as it can ingest.

If you're bulk loading it's easiest to only /commit once, after all of the documents have been indexed. Obviously, if you want to see changes to the index as you go and start searching some documents then you'll need to call /commit more frequently, but for bulk loading, it's probably best to not commit until all documents are indexed at the end of the upload. This will be the fastest method and also mean if the server crashes for some reason mid-indexing, you can roll back and not be left with a situation where some documents exist but others do not.

As a rule of thumb, I think you should be looking at around 10,000 doc/sec in ingestion rate per CPU core if your disks are good, this realistically changes depending on the size of each document, but let's go with this number for docs < 1KB in size.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to bulk load? #127

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to bulk load? #127

alexkreidler Nov 22, 2022

Replies: 1 comment

ChillFish8 Nov 22, 2022 Maintainer

alexkreidler
Nov 22, 2022

ChillFish8
Nov 22, 2022
Maintainer