-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't spin on the main mutex while waiting for new work #8433
Open
abadams
wants to merge
16
commits into
main
Choose a base branch
from
abadams/cond_spin_wait
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Commits on Oct 8, 2024
-
Don't spin on the main mutex while waiting for new work
This is one solution to an issue identified by Marcos, opened for discussion. Here's the full description of the issue: Once they run out of work to do, Halide worker threads spin for a bit checking if new work has been enqueued before calling cond_wait, which puts them to sleep until signaled. Job owners also spin waiting for their job to complete before going to sleep on a different condition variable. I hate this, but all previous attempts I have made at removing or reducing the spinning have made things slower. One problem with this approach is that spinning is done by releasing the work queue lock, yielding, reacquiring the work queue lock, and doing the somewhat involved check to see if there's something useful for this thread to do, either because new work was enqueued, the last item on a job completed, or a semaphore was released. This hammering of the lock by idle worker threads can starve the thread that actually completed the last task, delaying its ability to tell the job owner the job is done, and can also starve the job owner, causing it to take extra time to realize the job is all done and return back into Halide code. So this adds some wasted time at the end of every parallel for loop. This PR gets these idle threads to spin off the main mutex. I did this by adding a counter to each condition variable. Any time they are signaled, the counter is atomically incremented. Before they first release the lock, the idlers atomically capture the value of this counter. Then in cond_wait they spin for a bit doing atomic loads of the counter in between yields until it changes, in which case they grab the lock and return, or until they reach the spin count limit, in which case they go to sleep. This improved performance quite a bit over main for the blur app, which is a fast pipeline (~100us) with fine-grained parallelism. The speed-up was 1.2x! Not much effect on the more complex apps. I'm not entirely sure it's correct, because I think the counter has to be incremented with the lock held, so that it serializes correctly with the idlers capturing the value of the counter before releasing the lock, and you can call cond_signal/broadcast without holding the mutex (though we don't do that currently). It also has the unfortunate effect of waking up all spinning threads when you signal, instead of just one of them. However we never actually call signal, just broadcast. It also increases the size of a cond var, which might be considered a breaking change in the Halide runtime API.
Configuration menu - View commit details
-
Copy full SHA for 79197e3 - Browse repository at this point
Copy the full SHA 79197e3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 44a1cd7 - Browse repository at this point
Copy the full SHA 44a1cd7View commit details
Commits on Oct 9, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c7ebdf9 - Browse repository at this point
Copy the full SHA c7ebdf9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6d46367 - Browse repository at this point
Copy the full SHA 6d46367View commit details
Commits on Oct 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 945e889 - Browse repository at this point
Copy the full SHA 945e889View commit details
Commits on Oct 28, 2024
-
Configuration menu - View commit details
-
Copy full SHA for edeae97 - Browse repository at this point
Copy the full SHA edeae97View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4b25a9e - Browse repository at this point
Copy the full SHA 4b25a9eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3c15e7f - Browse repository at this point
Copy the full SHA 3c15e7fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 299293f - Browse repository at this point
Copy the full SHA 299293fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 34a7cef - Browse repository at this point
Copy the full SHA 34a7cefView commit details
Commits on Oct 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c5a8bdc - Browse repository at this point
Copy the full SHA c5a8bdcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 45cfbe1 - Browse repository at this point
Copy the full SHA 45cfbe1View commit details
Commits on Oct 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for f0df1c7 - Browse repository at this point
Copy the full SHA f0df1c7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b9b6a7 - Browse repository at this point
Copy the full SHA 3b9b6a7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6eb29b9 - Browse repository at this point
Copy the full SHA 6eb29b9View commit details -
Merge remote-tracking branch 'origin/abadams/less_noisy_performance_s…
…cenarios' into abadams/cond_spin_wait
Configuration menu - View commit details
-
Copy full SHA for 6a93dd6 - Browse repository at this point
Copy the full SHA 6a93dd6View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.