-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Profile-Guided Optimization page #55
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it a lot, yet still have many suggestions.
Give them a thought and I hope they will help you to make this even more polished and worthwhile for the reader.
explanation/performance/perf-pgo.md
Outdated
|
||
* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%. | ||
|
||
* There were some huge outliers, though. For example, `sha512` showed a gain of more than 13% in one of the runs, but also showed a loss of 11% in another run! This is likely noise due to external factors, even though we tried to isolate our test environment as much as possible. Still, it was surprising to see such discrepancies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the noise as an excuse is not really helpful, but we can make it helpful.
For example we can show "but by disabling this and that in background" and "by running it N times excluding the outliers" and ... we have found this to be ...
And then, as you did well in touching but not explaining all elude to more like "but this is about geting reliable statistical data out of measurements which shall not be the topic here in all its detail"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be helpful to suggest an order of magnitude number of tests that could be needed to provide a decent signal-to-noise ratio. I assume you'll be reporting the number of tests you ran in your blog post, but still I think it's worth reporting on what could be considered a statistically useful number of tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpaelzer and @s-makin, thanks for the suggestions.
@cpaelzer, would you have some time to look deeper into this with me? I'd like your input regarding the statistical side of things, especially when it comes to determining a good number of tests to be run.
Unfortunately I have already released the machine I was using for testing this, so if more tests are needed they will take some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the discussion - go for:
- positive tone
- explain that non-improvement can happen even if doing right
- hint at any general statistic trick to help like what sally and I mentioned (multi run, exclude outlier, background load, not picking the most optimized assembly, ...) - This helps people, it is a mistake they could otherwise make too
- link/hint at blog post for an example of the same with improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cpaelzer . I've updated the last sections, please take a look when you can.
explanation/performance/perf-pgo.md
Outdated
|
||
We obtained some interesting results while performing the tests. Here are the highlights: | ||
|
||
* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That as the first message is a bit non-inspiring.
We might want to mention that this is due to these hot code paths already being some that are highly optimized (they are) and that you've found with more complex software (in your other tests much more than a hand optimized hot loop is active) you've seen usually 5%-15%
Without framing it this way of "we chosen this for simplicity of the test, but thereby accepting we won't see much - yet you can have much we are shooting the messaging and helpfulness in the foot".
Once the blog exists which is more reporting what we did vs showing how one could recreate the same you can even point to it from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone is following along, won't their results depend on their hardware? If you're presenting a blog with the results of your testing, I agree that those should go here as an example of what could be seen. Since during the article you're using we/our throughout, it's not necessarily clear that "our results" here actually refers to the results of your testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice article! I agree with all the points cpaelzer has already raised. Added a few extra suggestions as well.
explanation/performance/perf-pgo.md
Outdated
|
||
We obtained some interesting results while performing the tests. Here are the highlights: | ||
|
||
* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone is following along, won't their results depend on their hardware? If you're presenting a blog with the results of your testing, I agree that those should go here as an example of what could be seen. Since during the article you're using we/our throughout, it's not necessarily clear that "our results" here actually refers to the results of your testing
explanation/performance/perf-pgo.md
Outdated
|
||
* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%. | ||
|
||
* There were some huge outliers, though. For example, `sha512` showed a gain of more than 13% in one of the runs, but also showed a loss of 11% in another run! This is likely noise due to external factors, even though we tried to isolate our test environment as much as possible. Still, it was surprising to see such discrepancies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be helpful to suggest an order of magnitude number of tests that could be needed to provide a decent signal-to-noise ratio. I assume you'll be reporting the number of tests you ran in your blog post, but still I think it's worth reporting on what could be considered a statistically useful number of tests
c20a183
to
344f9fd
Compare
BTW, after reading all your comments I've been thinking whether using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional review on the parts that have been changed
As I said, yes it isn't the most catchy example. But many readers might do the same mistake by picking an already highly optimized case and then be sad to not see benefit. PGO has the biggest potential if there was not 25 years of optimization yet. Hence you can use your mistake to the benefit of others - explain that it can be normal to not see a benefit if e.g. the workload is not good for it (similar if things are entriely I/O bound). And for motivation then Refer to your blog post which has examples where it really helped despite the case being quite complex, too complex for manual tuning for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've before replied suggestions and opinion on a few threads that have been open between you and Sally. Then I've done another full pass leaving a few hints.
But these are polishing, this is already awesome and hence you get my Approval even before considering my next barrage of suggestions. If you implement them, fine - if not for good reason this is still good to go and help the world that want to experiment with it.
explanation/performance/perf-pgo.md
Outdated
|
||
## More information | ||
|
||
LINK TO BLOG POST HERE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to link :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Since the blog post will be available later, I have temporarily removed this subsection and will reintroduce it when I have the final link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :) thanks for all your work on this!
I have taken the liberty of adding a few suggestions on adjusted header levels, and also I noticed in the rendered RTD page that the bash code blocks (when applied to the output) lead to some weird and distracting colours being applied. You can either split up the command + output, or change the code block language to "text" to avoid this. I'll leave it up to you to decide which you prefer :)
I'll let @cpaelzer take the approving review, I'm happy for it to be merged once he's happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
Signed-off-by: Sergio Durigan Junior <[email protected]>
This adds a page explaining what Profile-Guided Optimization (PGO) is. It also provides an example of an application being PGO'd.
I tried to hit a balance between explaining everything important in detail while not going into a deep dive on things that I don't consider worth it. I realize that this is a complex topic, so I would appreciate a review. Please let me know if there's anything that could be improved.
Thanks!