Optimize pool #36

qRoC · 2024-09-02T11:06:10Z

Global pool contains vectors only with capacity
Thread pool contains vectors from global pool (with capacity) OR without capacity
Vectors are returned to the global pool only if they have capacity
For the reason from p.3, the REUSABLE flag is not needed.
Also added the stub method - which allows not to access to the pool if we do not use attributes (events, impl Log, etc)

compare/fastrace/1      time:   [304.18 ns 305.64 ns 307.36 ns]
                        change: [-20.521% -19.831% -19.062%] (p = 0.00 < 0.05)
                        Performance has improved.

compare/fastrace/10     time:   [772.85 ns 778.27 ns 784.07 ns]
                        change: [-17.300% -16.547% -15.815%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
Vec::with_capacity/object-pool/1
                        time:   [2.4597 ns 2.4679 ns 2.4767 ns]
                        change: [-20.663% -20.296% -19.938%] (p = 0.00 < 0.05)
                        Performance has improved.
Vec::with_capacity/alloc/1
                        time:   [6.4638 ns 6.5043 ns 6.5405 ns]
                        change: [+3.4161% +4.4540% +5.4969%] (p = 0.00 < 0.05)
                        Performance has regressed.

Vec::with_capacity/object-pool/10
                        time:   [2.4700 ns 2.4789 ns 2.4874 ns]
                        change: [-21.023% -20.561% -20.111%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
Vec::with_capacity/object-pool/100
                        time:   [2.4590 ns 2.4682 ns 2.4777 ns]
                        change: [-19.639% -19.239% -18.845%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
Vec::with_capacity/alloc/100
                        time:   [23.783 ns 23.911 ns 24.035 ns]
                        change: [+2.1886% +2.8602% +3.5167%] (p = 0.00 < 0.05)
                        Performance has regressed.
                        
Vec::with_capacity/object-pool/1000
                        time:   [2.5000 ns 2.5112 ns 2.5226 ns]
                        change: [-22.018% -21.560% -21.106%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
Vec::with_capacity/object-pool/10000
                        time:   [2.4538 ns 2.4601 ns 2.4668 ns]
                        change: [-19.391% -19.038% -18.683%] (p = 0.00 < 0.05)
                        Performance has improved.

Vec::with_capacity/alloc/10000
                        time:   [28.905 ns 29.037 ns 29.179 ns]
                        change: [-2.5027% -2.0210% -1.5147%] (p = 0.00 < 0.05)
                        Performance has improved.

trace_wide_raw/10       time:   [1.1964 µs 1.2004 µs 1.2051 µs]
                        change: [-15.850% -15.476% -15.088%] (p = 0.00 < 0.05)
                        Performance has improved.

trace_wide_raw/100      time:   [11.310 µs 11.360 µs 11.408 µs]
                        change: [-7.1389% -6.7624% -6.3728%] (p = 0.00 < 0.05)
                        Performance has improved.

trace_wide_raw/1000     time:   [112.11 µs 112.49 µs 112.91 µs]
                        change: [-7.0186% -6.4420% -5.8764%] (p = 0.00 < 0.05)
                        Performance has improved.

andylokandy · 2024-09-02T14:13:26Z

For the reason from p.3, the REUSABLE flag is not needed.

When the Reuseable<T> is dropped in foreground thread, it should not be reused because of the synchronization overhead.

Global pool contains vectors only with capacity

Sounds reasonable

Thread pool contains vectors from global pool (with capacity) OR without capacity

thread pool can be the same as global pool: only with capacity

Also added the stub method - which allows not to access to the pool if we do not use attributes (events, impl Log, etc)

Instead of ReusableVec::stub(), I recommend to create a Vec::new() which is zero-overhead and wrap it in Reusable.

qRoC · 2024-09-02T14:47:36Z

When the Reuseable is dropped in foreground thread, it should not be reused because of the synchronization overhead.

Since we now have vectors with capacity in the global pool, we start to choose between a small overhead on the mutex or an overhead on allocations/reallocations.

I see #224 and foyer-rs/foyer#571, the main problem that was - when the thread does not use attributes, we spend a lot of time on synchronization. But now such threads will work as with the REUSABLE flag, because having zero capacity they will not synchronize.

thread pool can be the same as global pool: only with capacity

Yes, but attributes must be withOption, and main problem - current implementation does not require TrustedLen in API.

Instead of ReusableVec::stub(), I recommend to create a Vec::new() which is zero-overhead and wrap it in Reusable.

stub do that.

API changes may indeed improve performance, but they will break large projects

qRoC · 2024-09-03T13:04:31Z

thread pool can be the same as global pool: only with capacity

If the global pool is empty, every time we take from the local pool, we will have to try take from the global pool (with mutex synchronise)

impl<T> LocalVecPool<T> {
    pub fn take(&mut self) -> ReusableVec<T> {
        if self.storage.is_empty() {
            if self
                .global_pool
                .fill_empty_local(self.capacity, &mut self.storage)
            {
                return ReusableVec::new(self.global_pool, self.storage.pop().expect("not empty"));
            }
        }

        ReusableVec::new(self.global_pool, Vec::new())
    }
}

Maybe it's good, but need tests in real cases.

andylokandy · 2024-09-04T14:40:40Z

we start to choose between a small overhead on the mutex or an overhead on allocations/reallocations

I don't have a number on the overhead of the mutex when it's in heavy conflict situation, and it may cause long tail on worst case. On the contrary, dropping the vector without reuse is rare, and reallocating large vector when collecting large amount of spans is acceptable and will not cause long tail.

I see #224 and foyer-rs/foyer#571, the main problem that was - when the thread does not use attributes, we spend a lot of time on synchronization. But now such threads will work as with the REUSABLE flag, because having zero capacity they will not synchronize.

It's not totally correct. The real case is that, when LocalParentGuard drops, it drops the Reusable and causes synchronization. In the new version, LocalParentGuard always submit the Reusable to global collector so that the vectors are reused in the global collector thread -- result in no synchronization overhead.

In short, with REUSABLE flag, the sync should never happen. Reuse should only take place in the global collector thread.

Did this PR resolve this problem in the other way?

andylokandy · 2024-10-28T05:32:31Z

Will reopen if any progress is made on this thread.

qRoC added 3 commits September 2, 2024 13:59

feat: new 'VecPool'

fd8dd01

chore: fmt

ccc8248

feat: impl Default trait for GlobalVecPool

7314adb

andylokandy requested a review from zhongzc September 4, 2024 14:22

andylokandy closed this Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize pool #36

Optimize pool #36

qRoC commented Sep 2, 2024 •

edited

Loading

andylokandy commented Sep 2, 2024 •

edited

Loading

qRoC commented Sep 2, 2024 •

edited

Loading

qRoC commented Sep 3, 2024 •

edited

Loading

andylokandy commented Sep 4, 2024

andylokandy commented Oct 28, 2024

Optimize pool #36

Optimize pool #36

Conversation

qRoC commented Sep 2, 2024 • edited Loading

andylokandy commented Sep 2, 2024 • edited Loading

qRoC commented Sep 2, 2024 • edited Loading

qRoC commented Sep 3, 2024 • edited Loading

andylokandy commented Sep 4, 2024

andylokandy commented Oct 28, 2024

qRoC commented Sep 2, 2024 •

edited

Loading

andylokandy commented Sep 2, 2024 •

edited

Loading

qRoC commented Sep 2, 2024 •

edited

Loading

qRoC commented Sep 3, 2024 •

edited

Loading