-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better encapsulate locking logic in HnswGraphBuilder #14016
base: main
Are you sure you want to change the base?
Conversation
This PR moves the locking logic from `HnswConcurrentMergeBuilder` to `HnswGraphBuilder`, which automatically picks the single-threaded vs. concurrent searcher based on whether a lock is used. This makes it possible to use concurrent graph building outside of the context of a merge process. This PR is a pure refactoring.
I'm not sure we need this because we guarantee that an index cannot be written while it is being read under normal circumstances, and I think it's actually good to expose less functionality if there is no purpose for it, to avoid traps. Did you have a use case in mind? |
Our use case is to speed-up indexing of larger segments. We want to build fewer segments, so it makes sense to build them on multiple cores. We build the segments directly, not building smaller segments first, and then merging. The I understand this PR is for a non-standard use of the API and you might not care, and that the current structure works for your case. But I think the new structure is more logical anyway. |
I see, thanks for the explanation: it makes more sense to me now. We want to be able to use concurrency while indexing the initial segment, even before flushing, sure. |
Even though this change does not alter the public API, I don't think it is a pure refactoring. Won't it change the execution plan used during indexing? Or -- maybe it would actually be a no-op because the IndexWriter creates a unique segment for each of its threads, so indexing a single segment is always single-threaded. Have you actually tried this out and seen some improvement? |
As far as I'm aware, it doesn't.
Creating a new segment is always single-threaded in the current version. It can be parallel when merging segments - to enable it, you need to pass a value >=1 to Checked the existing tests, I didn't find a test that would test the concurrent merging. However I tested it in my code, and it seems to work... |
This PR moves the locking logic from
HnswConcurrentMergeBuilder
toHnswGraphBuilder
, which automatically picks the single-threaded vs. concurrent searcher based on whether a lock is used. This makes it possible to use concurrent graph building outside of the context of a merge process.This PR is a pure refactoring.