-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement automatic warmup timing #135
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks convenient but I'm worried/concerned that it will not work well for some JITs and some benchmarks.
For instance if there is a lot to compile the performance might look stable for some time but might not be peak performance.
For a practical example let's look at this warmup plot:
(from this blog post)
We can see multiple regions where it's flat but yet it's far from peak performance.
This is also what papers like https://arxiv.org/pdf/1602.00602.pdf found:
Kalibera and Jones 2013 convincingly show the limitations of such approaches, presenting instead a manual approach to determining if and when a steady state has been reached
that in the general case it is not possible to detect warmup reliably.
On a given set of benchmarks and Ruby engines it might be, but on unknown benchmarks it's not.
prev / per | ||
end | ||
|
||
if diff - 1.0 <= 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if diff - 1.0 <= 0.1 | |
if diff - 1.0 <= 0.01 |
For 1%
Auto-detecting warmup is always tempting and certainly convenient when it works and for instance I wrote this (PR) for yjit-bench. So I think the safest would be to not add auto warmup to benchmark-ips, because it will fail in some cases and then it would be misleading. |
@eregon Good context. Perhaps another way to look at this is, by having a simple autowarm by default that isn't sophisticated, it's better than the default (strictly 2 seconds) without making people overly confident that it's always correct. Folks testing JITs, etc are always going to want to specify the warmup, I don't we could come up with a general purpose algorithm that would work for those folks and folks using benchmark-ips to just test out 2 different chunks of code. This would be aimed at that later group, where the warmup can catch potential areas where their default warmup wasn't sufficiant. |
I think these two groups overlap quite a bit, e.g., folks just using benchmark-ips and running on YJIT. For CRuby interpreter no-JIT, then autowarmup is not needed, and the default 2 seconds is fine. I think we'd need to look at some bigger benchmarks and see if the autowarmup is better than the default 2 seconds. |
Plenty of people use benchmark-ips to test application-level code which may be ~0.5-1second per iteration. I like autowarmup because it will automatically give these users longer, more useful warmups. |
The idea is to run the item until the cycles per 100ms timing
is within 1% of the previous run. This means that the default is still
2 seconds like it was originally, but now if those 2 seconds didn't
yield runs that were close enough together, the warmup will continue to
run.
It will run for a maximum of 30 seconds.
There are undoubtedly more sophisticated ways to terminate the warmup, but this is a simple metric that is easy for end users to understand.
Fixes #92