-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is Counter.Add() thread-safe? #1642
Comments
Hey there, @notrobpike! I'm here to help you out with any issues or questions you have. I can assist you with bug fixes, problem-solving, and even becoming a contributor. Let me know how I can assist you today! The Regarding your concern about potential starvation when multiple threads update the same counter, the current implementation does not include an adaptive locking mechanism. While no issues have been reported, your suggestion to introduce a mechanism like Solaris' adaptive lock could enhance fairness among threads. This would involve transitioning from a spinlock to a sleep state after a certain number of attempts, potentially using an atomic counter to manage waiters. However, if no problems have been experienced, it might not be necessary to implement such changes. |
Those methods are indeed thread-safe. But, you (and our GenAI dosubot friend) are right that you could have contention on extreme amount of writers. However, this only happens for adding You can watch an excellent talks explaining the details from @beorn7
Thanks for offering you help, but I would wait for any user actually hitting this problem in the real application, before committing to more complexity 🤗 What do you think? |
If there are provable better ways of handling thread contention here, I would like to hear (with experimental evidence). My conclusion many years ago was that we needed access to concepts like "core-local variables" (for lack of a better term, I'm sure there is one used by the actual experts). My vague understanding of the Go runtime is that it uses those concepts under the hood (for sync.Map etc.) but it doesn't expose those features in a way that it would be useable by Go code. I can imagine that this happens in the future, though, and then we could optimize the code for hardware with a lot of cores. |
@bwplotka I'm surprised you'd need thousands of goroutines but I'll take your word for it. Agreed this problem should be incredibly rare in practice and not worth the complexity. I only disagree with dosubot's characterization as a starvation issue. @beorn7 caveat: I'm speaking as very much a non-expert. I don't believe sync.Map uses anything exotic as you suggest -- it just expects concurrent threads to write disjoint keys and therefore not contend for the same mutex. At a quick glance, it does its magic by using atomic CAS operations just like the prom client does here. In this way threads writing to disjoint keys always succeed the first time on the CAS for their kv data. However, the docs state that if keys overlap by writing threads, than better performance is to be had using more conventional mutex/rwmutex. This tells me that when there is contention it's better to stack up waiters on a mutex than to let multiple threads run all-out competing on CAS. Whereas here, if there are multiple writing threads, that's exactly counter (get it? counter? heh) to the sync.Map use case recommendation. It should be easy to generate a benchmark for the prom client, or refer to existing benchmarks for sync.Map. I do believe the common (by far) use case would be a single or limited number of threads updating the same counter and therefore the current implementation using atomics is spot on. |
Actually this problem was a little bit reproduced by Cockroach DB folks and improved in this PR #1661 for context. Perhaps they might have an opinion on further improvement in this place (: |
https://github.com/prometheus/client_golang/blob/main/prometheus/counter.go#L137
Technically speaking there is no data race here. But if there are many threads trying to update the same counter, I can imagine that this never makes progress. Is it worth it to add a mechanism similar to Solaris' adaptive lock here? (Start from spinlock and after a time sleep on a lock?)
The impl here could be to loop some number of times, say 256, and then start sleeping a random amount of time between each loop. It's simple. The downside is that if the threads are stacking up then new threads -- that haven't reached the 256 limit yet -- will get serviced first and long-waiting threads get starved out. So to prevent that we add an atomic which is the number of waiters (anyone over 256 that is in the delayed loop) and if that is non-zero on entry then new threads also queue up right away. There's still some random factor and they aren't serviced in strict order but there's more fairness.
Same goes for
Gauge.Add()
.I haven't experienced any problem myself. I was just looking into the thread safety as I decided I want to call
Add()
concurrently in my program. So I mean if no one has ever complained about this it might not be worth any effort.The text was updated successfully, but these errors were encountered: