Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement automatic warmup timing #135

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Implement automatic warmup timing #135

wants to merge 1 commit into from

Conversation

evanphx
Copy link
Owner

@evanphx evanphx commented Jan 13, 2024

The idea is to run the item until the cycles per 100ms timing
is within 1% of the previous run. This means that the default is still
2 seconds like it was originally, but now if those 2 seconds didn't
yield runs that were close enough together, the warmup will continue to
run.

It will run for a maximum of 30 seconds.

There are undoubtedly more sophisticated ways to terminate the warmup, but this is a simple metric that is easy for end users to understand.

Fixes #92

@evanphx evanphx requested a review from nateberkopec January 13, 2024 00:52
Copy link
Contributor

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks convenient but I'm worried/concerned that it will not work well for some JITs and some benchmarks.
For instance if there is a lot to compile the performance might look stable for some time but might not be peak performance.

For a practical example let's look at this warmup plot:
plot
(from this blog post)
We can see multiple regions where it's flat but yet it's far from peak performance.

This is also what papers like https://arxiv.org/pdf/1602.00602.pdf found:

Kalibera and Jones 2013 convincingly show the limitations of such approaches, presenting instead a manual approach to determining if and when a steady state has been reached

that in the general case it is not possible to detect warmup reliably.
On a given set of benchmarks and Ruby engines it might be, but on unknown benchmarks it's not.

prev / per
end

if diff - 1.0 <= 0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if diff - 1.0 <= 0.1
if diff - 1.0 <= 0.01

For 1%

@eregon
Copy link
Contributor

eregon commented Jan 13, 2024

Auto-detecting warmup is always tempting and certainly convenient when it works and for instance I wrote this (PR) for yjit-bench.
It starts with a 0.1% goal and uses the median absolute deviation over all iterations so far.
This means it has a better idea of warmup because it considers all iterations and not just the last two (the current approach in this PR).
It also increases the goal by 0.01% per iteration so that it allows more margin for more noisy benchmarks (and gives up after 20 minutes).
It's not perfect though and e.g. can fail to find steady state if the benchmark is quite noisy or if there is a very long flat region (i.e. approximately as long as half of all iterations so far) which is not actually peak.

So I think the safest would be to not add auto warmup to benchmark-ips, because it will fail in some cases and then it would be misleading.
If we still want to add it then I think we should try to detect when it fails and warn in such a case and check it on many benchmarks and Ruby implementations.

@evanphx
Copy link
Owner Author

evanphx commented Jan 13, 2024

@eregon Good context. Perhaps another way to look at this is, by having a simple autowarm by default that isn't sophisticated, it's better than the default (strictly 2 seconds) without making people overly confident that it's always correct.

Folks testing JITs, etc are always going to want to specify the warmup, I don't we could come up with a general purpose algorithm that would work for those folks and folks using benchmark-ips to just test out 2 different chunks of code.

This would be aimed at that later group, where the warmup can catch potential areas where their default warmup wasn't sufficiant.

@eregon
Copy link
Contributor

eregon commented Jan 15, 2024

I think these two groups overlap quite a bit, e.g., folks just using benchmark-ips and running on YJIT.
Or of course folks just using benchmark-ips and running on JRuby/TruffleRuby (although I guess less frequent).

For CRuby interpreter no-JIT, then autowarmup is not needed, and the default 2 seconds is fine.
For YJIT it's less clear, we would need to look at warmup curves on bigger benchmarks and see if autowarmup detects it correctly there.
On small benchmarks, the 2 seconds should be enough for YJIT and probably for TruffleRuby Native too, while it might not be enough for TruffleRuby JVM and JRuby (where it's likely most difficult to autowarmup correctly).

I think we'd need to look at some bigger benchmarks and see if the autowarmup is better than the default 2 seconds.

@nateberkopec
Copy link
Collaborator

  1. Should there me a message printed if you hit the full 30 seconds? It's kind of a "failure" in some sense.
  2. 1% feels aggressive to me. The only time I see 1% variance in a benchmark-ips result is when I'm testing the most braindead low-level Ruby stuff... certainly any Rails bench involving database interaction at all usually ends up with a few % in either direction. Maybe changing the goal over time (as eregon suggested) is the solution.

CRuby interpreter no-JIT, then autowarmup is not needed, and the default 2 seconds is fine.

Plenty of people use benchmark-ips to test application-level code which may be ~0.5-1second per iteration. I like autowarmup because it will automatically give these users longer, more useful warmups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fully Automatic Mode
3 participants