Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should knn query rewrite parallelize on slices like ordinary search? #13952

Open
javanna opened this issue Oct 24, 2024 · 0 comments
Open

Should knn query rewrite parallelize on slices like ordinary search? #13952

javanna opened this issue Oct 24, 2024 · 0 comments
Labels
discussion Discussion

Comments

@javanna
Copy link
Contributor

javanna commented Oct 24, 2024

Search concurrency creates one task per slice. A slice is a collection of one or more segment partitions (although segments are not split into partitions by default just yet). The slicing mechanism allows users to plug in their own slicing logic via the IndexSearcher#slices method. Commonly, the slicing depends on the type of operation and number of targeted documents (e.g. grouping minimum amount of documents in the same slice, as too many small tasks are not going to increase overall performance), as well as number of threads available (it does not make a lot of sense to create more tasks than threads available).

The knn query rewrite parallelizes execution as well via TaskExecutor, but it does not rely on the slicing mechanism. The idea around that is that due to the nature of the workload, it's cost is determined by the number of segments rather than the number of docs. This has been discussed in the past, see #12385 . A direct consequence of this decision is that knn query rewrite does not provide any way of limiting the number of tasks created: either users don't provide an executor to the searcher, in which case there will be no concurrency anywhere, or knn query rewrite will create one task per segment. With this, it does not take many parallel knn query rewrite operations running for tasks to queue up.

Especially with the recent changes made in #13472 and its work stealing approach, when segment level tasks queue up, their work is in practice executed by the caller thread, and queued up items become no-op when their work was stolen, yet they still contribute to the queue size until they are "executed" and removed from the queue. This is ok for situations where a separate executor for parallel execution is used, as that may have an unbounded queue, yet it is apparent that we end up creating hundreds of tasks potentially for no gain. With a single executor model, you may incur in search rejections more often than you'd want, due to the no-op tasks queueing up: although segment level tasks are never rejected but rather executed on the caller thread upon rejection (see #13622), calls to IndexSearcher#search from the same executor will get rejected.

I think that there should be a way to somehow cap the number of created tasks for all parallelizable operations, at the very least based on the number of available threads: why create more tasks than number of threads to execute them? That may be a good reason to move back the knn query rewrite to rely on slicing, as that allows for more flexibility. Yet there are other scenarios where TaskExecutor is used to parallelize operations, and it seems like a general good idea to try and limit the number of tasks created for all these potential usecases that don't necessarily fit into the slicing logic. The next question though is: if we limit the number of created tasks, without relying on slicing, how do we batch together different segments? Do we just created a few tasks per segment until we reach the threshold, then execute sequentially whatever task we have?

@javanna javanna added the discussion Discussion label Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion
Projects
None yet
Development

No branches or pull requests

1 participant