Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pool #36

Closed
wants to merge 3 commits into from
Closed

Optimize pool #36

wants to merge 3 commits into from

Conversation

qRoC
Copy link
Contributor

@qRoC qRoC commented Sep 2, 2024

  1. Global pool contains vectors only with capacity
  2. Thread pool contains vectors from global pool (with capacity) OR without capacity
  3. Vectors are returned to the global pool only if they have capacity
  4. For the reason from p.3, the REUSABLE flag is not needed.
  5. Also added the stub method - which allows not to access to the pool if we do not use attributes (events, impl Log, etc)
compare/fastrace/1      time:   [304.18 ns 305.64 ns 307.36 ns]
                        change: [-20.521% -19.831% -19.062%] (p = 0.00 < 0.05)
                        Performance has improved.

compare/fastrace/10     time:   [772.85 ns 778.27 ns 784.07 ns]
                        change: [-17.300% -16.547% -15.815%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
Vec::with_capacity/object-pool/1
                        time:   [2.4597 ns 2.4679 ns 2.4767 ns]
                        change: [-20.663% -20.296% -19.938%] (p = 0.00 < 0.05)
                        Performance has improved.
Vec::with_capacity/alloc/1
                        time:   [6.4638 ns 6.5043 ns 6.5405 ns]
                        change: [+3.4161% +4.4540% +5.4969%] (p = 0.00 < 0.05)
                        Performance has regressed.

Vec::with_capacity/object-pool/10
                        time:   [2.4700 ns 2.4789 ns 2.4874 ns]
                        change: [-21.023% -20.561% -20.111%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
Vec::with_capacity/object-pool/100
                        time:   [2.4590 ns 2.4682 ns 2.4777 ns]
                        change: [-19.639% -19.239% -18.845%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
Vec::with_capacity/alloc/100
                        time:   [23.783 ns 23.911 ns 24.035 ns]
                        change: [+2.1886% +2.8602% +3.5167%] (p = 0.00 < 0.05)
                        Performance has regressed.
                        
Vec::with_capacity/object-pool/1000
                        time:   [2.5000 ns 2.5112 ns 2.5226 ns]
                        change: [-22.018% -21.560% -21.106%] (p = 0.00 < 0.05)
                        Performance has improved.
                        
Vec::with_capacity/object-pool/10000
                        time:   [2.4538 ns 2.4601 ns 2.4668 ns]
                        change: [-19.391% -19.038% -18.683%] (p = 0.00 < 0.05)
                        Performance has improved.

Vec::with_capacity/alloc/10000
                        time:   [28.905 ns 29.037 ns 29.179 ns]
                        change: [-2.5027% -2.0210% -1.5147%] (p = 0.00 < 0.05)
                        Performance has improved.

trace_wide_raw/10       time:   [1.1964 µs 1.2004 µs 1.2051 µs]
                        change: [-15.850% -15.476% -15.088%] (p = 0.00 < 0.05)
                        Performance has improved.

trace_wide_raw/100      time:   [11.310 µs 11.360 µs 11.408 µs]
                        change: [-7.1389% -6.7624% -6.3728%] (p = 0.00 < 0.05)
                        Performance has improved.

trace_wide_raw/1000     time:   [112.11 µs 112.49 µs 112.91 µs]
                        change: [-7.0186% -6.4420% -5.8764%] (p = 0.00 < 0.05)
                        Performance has improved.

@andylokandy
Copy link
Collaborator

andylokandy commented Sep 2, 2024

For the reason from p.3, the REUSABLE flag is not needed.

When the Reuseable<T> is dropped in foreground thread, it should not be reused because of the synchronization overhead.

Global pool contains vectors only with capacity

Sounds reasonable

Thread pool contains vectors from global pool (with capacity) OR without capacity

thread pool can be the same as global pool: only with capacity

Also added the stub method - which allows not to access to the pool if we do not use attributes (events, impl Log, etc)

Instead of ReusableVec::stub(), I recommend to create a Vec::new() which is zero-overhead and wrap it in Reusable.

@qRoC
Copy link
Contributor Author

qRoC commented Sep 2, 2024

When the Reuseable is dropped in foreground thread, it should not be reused because of the synchronization overhead.

Since we now have vectors with capacity in the global pool, we start to choose between a small overhead on the mutex or an overhead on allocations/reallocations.

I see #224 and foyer-rs/foyer#571, the main problem that was - when the thread does not use attributes, we spend a lot of time on synchronization. But now such threads will work as with the REUSABLE flag, because having zero capacity they will not synchronize.

thread pool can be the same as global pool: only with capacity

Yes, but attributes must be withOption, and main problem - current implementation does not require TrustedLen in API.

Instead of ReusableVec::stub(), I recommend to create a Vec::new() which is zero-overhead and wrap it in Reusable.

stub do that.


API changes may indeed improve performance, but they will break large projects

@qRoC
Copy link
Contributor Author

qRoC commented Sep 3, 2024

thread pool can be the same as global pool: only with capacity

If the global pool is empty, every time we take from the local pool, we will have to try take from the global pool (with mutex synchronise)

impl<T> LocalVecPool<T> {
    pub fn take(&mut self) -> ReusableVec<T> {
        if self.storage.is_empty() {
            if self
                .global_pool
                .fill_empty_local(self.capacity, &mut self.storage)
            {
                return ReusableVec::new(self.global_pool, self.storage.pop().expect("not empty"));
            }
        }

        ReusableVec::new(self.global_pool, Vec::new())
    }
}

Maybe it's good, but need tests in real cases.

@andylokandy
Copy link
Collaborator

we start to choose between a small overhead on the mutex or an overhead on allocations/reallocations

I don't have a number on the overhead of the mutex when it's in heavy conflict situation, and it may cause long tail on worst case. On the contrary, dropping the vector without reuse is rare, and reallocating large vector when collecting large amount of spans is acceptable and will not cause long tail.

I see #224 and foyer-rs/foyer#571, the main problem that was - when the thread does not use attributes, we spend a lot of time on synchronization. But now such threads will work as with the REUSABLE flag, because having zero capacity they will not synchronize.

It's not totally correct. The real case is that, when LocalParentGuard drops, it drops the Reusable and causes synchronization. In the new version, LocalParentGuard always submit the Reusable to global collector so that the vectors are reused in the global collector thread -- result in no synchronization overhead.

In short, with REUSABLE flag, the sync should never happen. Reuse should only take place in the global collector thread.

Did this PR resolve this problem in the other way?

@andylokandy
Copy link
Collaborator

Will reopen if any progress is made on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants