Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Profile-Guided Optimization page #55

Merged
merged 1 commit into from
Nov 7, 2024
Merged

Conversation

sergiodj
Copy link
Contributor

This adds a page explaining what Profile-Guided Optimization (PGO) is. It also provides an example of an application being PGO'd.

I tried to hit a balance between explaining everything important in detail while not going into a deep dive on things that I don't consider worth it. I realize that this is a complex topic, so I would appreciate a review. Please let me know if there's anything that could be improved.

Thanks!

Copy link
Contributor

@cpaelzer cpaelzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it a lot, yet still have many suggestions.

Give them a thought and I hope they will help you to make this even more polished and worthwhile for the reader.

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved

* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%.

* There were some huge outliers, though. For example, `sha512` showed a gain of more than 13% in one of the runs, but also showed a loss of 11% in another run! This is likely noise due to external factors, even though we tried to isolate our test environment as much as possible. Still, it was surprising to see such discrepancies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the noise as an excuse is not really helpful, but we can make it helpful.
For example we can show "but by disabling this and that in background" and "by running it N times excluding the outliers" and ... we have found this to be ...
And then, as you did well in touching but not explaining all elude to more like "but this is about geting reliable statistical data out of measurements which shall not be the topic here in all its detail"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be helpful to suggest an order of magnitude number of tests that could be needed to provide a decent signal-to-noise ratio. I assume you'll be reporting the number of tests you ran in your blog post, but still I think it's worth reporting on what could be considered a statistically useful number of tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpaelzer and @s-makin, thanks for the suggestions.

@cpaelzer, would you have some time to look deeper into this with me? I'd like your input regarding the statistical side of things, especially when it comes to determining a good number of tests to be run.

Unfortunately I have already released the machine I was using for testing this, so if more tests are needed they will take some time.

Copy link
Contributor

@cpaelzer cpaelzer Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the discussion - go for:

  1. positive tone
  2. explain that non-improvement can happen even if doing right
  3. hint at any general statistic trick to help like what sally and I mentioned (multi run, exclude outlier, background load, not picking the most optimized assembly, ...) - This helps people, it is a mistake they could otherwise make too
  4. link/hint at blog post for an example of the same with improvements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cpaelzer . I've updated the last sections, please take a look when you can.


We obtained some interesting results while performing the tests. Here are the highlights:

* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That as the first message is a bit non-inspiring.

We might want to mention that this is due to these hot code paths already being some that are highly optimized (they are) and that you've found with more complex software (in your other tests much more than a hand optimized hot loop is active) you've seen usually 5%-15%

Without framing it this way of "we chosen this for simplicity of the test, but thereby accepting we won't see much - yet you can have much we are shooting the messaging and helpfulness in the foot".

Once the blog exists which is more reporting what we did vs showing how one could recreate the same you can even point to it from here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone is following along, won't their results depend on their hardware? If you're presenting a blog with the results of your testing, I agree that those should go here as an example of what could be seen. Since during the article you're using we/our throughout, it's not necessarily clear that "our results" here actually refers to the results of your testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpaelzer Good suggestion. I'll add some text expanding on the reason why we've seen such a low performance gain.

@s-makin The results will depend on a bunch of things, including their hardware, yes. I still have to prepare the blog post, but I'll certainly link it here once it's done.

Copy link
Collaborator

@s-makin s-makin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice article! I agree with all the points cpaelzer has already raised. Added a few extra suggestions as well.

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance.rst Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved

We obtained some interesting results while performing the tests. Here are the highlights:

* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone is following along, won't their results depend on their hardware? If you're presenting a blog with the results of your testing, I agree that those should go here as an example of what could be seen. Since during the article you're using we/our throughout, it's not necessarily clear that "our results" here actually refers to the results of your testing


* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%.

* There were some huge outliers, though. For example, `sha512` showed a gain of more than 13% in one of the runs, but also showed a loss of 11% in another run! This is likely noise due to external factors, even though we tried to isolate our test environment as much as possible. Still, it was surprising to see such discrepancies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be helpful to suggest an order of magnitude number of tests that could be needed to provide a decent signal-to-noise ratio. I assume you'll be reporting the number of tests you ran in your blog post, but still I think it's worth reporting on what could be considered a statistically useful number of tests

@sergiodj sergiodj force-pushed the pgo-doc branch 2 times, most recently from c20a183 to 344f9fd Compare October 1, 2024 21:15
@sergiodj
Copy link
Contributor Author

sergiodj commented Oct 1, 2024

@cpaelzer @s-makin Hi, I've finally force-pushed my branch with almost all of the modifications requested by you. Could you please take another look? There's one specific change that looks more involved, so I left a comment asking for some help.

Thanks!

@sergiodj sergiodj requested review from cpaelzer and s-makin October 1, 2024 21:17
@sergiodj
Copy link
Contributor Author

sergiodj commented Oct 1, 2024

BTW, after reading all your comments I've been thinking whether using openssl speed as an example was indeed a good idea, given that it did not provide a concrete example of optimization generated by PGO. WDYT?

Copy link
Collaborator

@s-makin s-makin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional review on the parts that have been changed

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
@cpaelzer
Copy link
Contributor

cpaelzer commented Oct 8, 2024

BTW, after reading all your comments I've been thinking whether using openssl speed as an example was indeed a good idea, given that it did not provide a concrete example of optimization generated by PGO. WDYT?

As I said, yes it isn't the most catchy example. But many readers might do the same mistake by picking an already highly optimized case and then be sad to not see benefit. PGO has the biggest potential if there was not 25 years of optimization yet. Hence you can use your mistake to the benefit of others - explain that it can be normal to not see a benefit if e.g. the workload is not good for it (similar if things are entriely I/O bound).

And for motivation then Refer to your blog post which has examples where it really helped despite the case being quite complex, too complex for manual tuning for example.

Copy link
Contributor

@cpaelzer cpaelzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've before replied suggestions and opinion on a few threads that have been open between you and Sally. Then I've done another full pass leaving a few hints.

But these are polishing, this is already awesome and hence you get my Approval even before considering my next barrage of suggestions. If you implement them, fine - if not for good reason this is still good to go and help the world that want to experiment with it.

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Show resolved Hide resolved
explanation/performance/perf-pgo.md Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved

## More information

LINK TO BLOG POST HERE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to link :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Since the blog post will be available later, I have temporarily removed this subsection and will reintroduce it when I have the final link.

@sergiodj
Copy link
Contributor Author

sergiodj commented Nov 7, 2024

@cpaelzer @s-makin Thanks for all your feedback. I have addressed them and force-pushed the results. I believe this is ready to be published now, but I will let you take another look just in case.

Copy link
Collaborator

@s-makin s-makin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) thanks for all your work on this!

I have taken the liberty of adding a few suggestions on adjusted header levels, and also I noticed in the rendered RTD page that the bash code blocks (when applied to the output) lead to some weird and distracting colours being applied. You can either split up the command + output, or change the code block language to "text" to avoid this. I'll leave it up to you to decide which you prefer :)

I'll let @cpaelzer take the approving review, I'm happy for it to be merged once he's happy.

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@s-makin s-makin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Signed-off-by: Sergio Durigan Junior <[email protected]>
@sergiodj sergiodj merged commit 0258a6b into canonical:main Nov 7, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants