Optimize DenseSet::Grow #3870

romange · 2024-10-05T09:43:35Z

DenseSet::Grow suffers from slow memory latency and it has a good potential for optimization.

introduce resharding algorithm that moves items to new buckets after the resize, but it does it in chunks of kBatchSize.

Similar to #3863

Must add benchmarking code to string_set_test to evaluate the performance
The benchmarking code must only evaluate Grow - which is not simple thing to do. For that it must add, say 2^15 items without mesuring and then use additional Add to trigger Grow event (when we cross 2^k we grow). The additional Add must be under measurement. See BM_AddMany for example, how we control timing with PauseTiming/ResumeTiming.
Once the benchmark exists you can run it with ./string_set_test --bench --benchmark_filter=.*BM_Grow - should be done in opt mode.
Now it is possible to improve the implementation of grow - introduce a state machine that takes a batch of buckets and iteratively reshards all the elements in them (GrowBatch). Grow should iterate over all old buckets and call GrowBatch.

I would expect ~50% CPU reduction for large sets

The text was updated successfully, but these errors were encountered:

romange added enhancement New feature or request important higher priority than the usual ongoing development tasks labels Oct 5, 2024

romange assigned BorysTheDev Oct 5, 2024

BorysTheDev linked a pull request Oct 9, 2024 that will close this issue

DenseSet::Grow optimization #3894

Open

Provide feedback