Add Profile-Guided Optimization page #55

sergiodj · 2024-09-26T01:10:25Z

This adds a page explaining what Profile-Guided Optimization (PGO) is. It also provides an example of an application being PGO'd.

I tried to hit a balance between explaining everything important in detail while not going into a deep dive on things that I don't consider worth it. I realize that this is a complex topic, so I would appreciate a review. Please let me know if there's anything that could be improved.

Thanks!

cpaelzer

I like it a lot, yet still have many suggestions.

Give them a thought and I hope they will help you to make this even more polished and worthwhile for the reader.

explanation/performance/perf-pgo.md

cpaelzer · 2024-09-26T06:54:22Z

explanation/performance/perf-pgo.md

+
+* Overall, the performance gain/loss was minimal.  Most of the time it was less than 1%.
+
+* There were some huge outliers, though.  For example, `sha512` showed a gain of more than 13% in one of the runs, but also showed a loss of 11% in another run!  This is likely noise due to external factors, even though we tried to isolate our test environment as much as possible.  Still, it was surprising to see such discrepancies.


the noise as an excuse is not really helpful, but we can make it helpful.
For example we can show "but by disabling this and that in background" and "by running it N times excluding the outliers" and ... we have found this to be ...
And then, as you did well in touching but not explaining all elude to more like "but this is about geting reliable statistical data out of measurements which shall not be the topic here in all its detail"

It could be helpful to suggest an order of magnitude number of tests that could be needed to provide a decent signal-to-noise ratio. I assume you'll be reporting the number of tests you ran in your blog post, but still I think it's worth reporting on what could be considered a statistically useful number of tests

@cpaelzer and @s-makin, thanks for the suggestions.

@cpaelzer, would you have some time to look deeper into this with me? I'd like your input regarding the statistical side of things, especially when it comes to determining a good number of tests to be run.

Unfortunately I have already released the machine I was using for testing this, so if more tests are needed they will take some time.

thanks for the discussion - go for:

positive tone

explain that non-improvement can happen even if doing right

hint at any general statistic trick to help like what sally and I mentioned (multi run, exclude outlier, background load, not picking the most optimized assembly, ...) - This helps people, it is a mistake they could otherwise make too

link/hint at blog post for an example of the same with improvements

Thanks @cpaelzer . I've updated the last sections, please take a look when you can.

cpaelzer · 2024-09-26T06:56:43Z

explanation/performance/perf-pgo.md

+
+We obtained some interesting results while performing the tests.  Here are the highlights:
+
+* Overall, the performance gain/loss was minimal.  Most of the time it was less than 1%.


That as the first message is a bit non-inspiring.

We might want to mention that this is due to these hot code paths already being some that are highly optimized (they are) and that you've found with more complex software (in your other tests much more than a hand optimized hot loop is active) you've seen usually 5%-15%

Without framing it this way of "we chosen this for simplicity of the test, but thereby accepting we won't see much - yet you can have much we are shooting the messaging and helpfulness in the foot".

Once the blog exists which is more reporting what we did vs showing how one could recreate the same you can even point to it from here.

If someone is following along, won't their results depend on their hardware? If you're presenting a blog with the results of your testing, I agree that those should go here as an example of what could be seen. Since during the article you're using we/our throughout, it's not necessarily clear that "our results" here actually refers to the results of your testing

@cpaelzer Good suggestion. I'll add some text expanding on the reason why we've seen such a low performance gain.

@s-makin The results will depend on a bunch of things, including their hardware, yes. I still have to prepare the blog post, but I'll certainly link it here once it's done.

s-makin

Really nice article! I agree with all the points cpaelzer has already raised. Added a few extra suggestions as well.

explanation/performance/perf-pgo.md

explanation/performance.rst

explanation/performance/perf-pgo.md

s-makin · 2024-09-26T08:56:31Z

explanation/performance/perf-pgo.md

+
+We obtained some interesting results while performing the tests.  Here are the highlights:
+
+* Overall, the performance gain/loss was minimal.  Most of the time it was less than 1%.


If someone is following along, won't their results depend on their hardware? If you're presenting a blog with the results of your testing, I agree that those should go here as an example of what could be seen. Since during the article you're using we/our throughout, it's not necessarily clear that "our results" here actually refers to the results of your testing

s-makin · 2024-09-26T08:58:12Z

explanation/performance/perf-pgo.md

+
+* Overall, the performance gain/loss was minimal.  Most of the time it was less than 1%.
+
+* There were some huge outliers, though.  For example, `sha512` showed a gain of more than 13% in one of the runs, but also showed a loss of 11% in another run!  This is likely noise due to external factors, even though we tried to isolate our test environment as much as possible.  Still, it was surprising to see such discrepancies.


It could be helpful to suggest an order of magnitude number of tests that could be needed to provide a decent signal-to-noise ratio. I assume you'll be reporting the number of tests you ran in your blog post, but still I think it's worth reporting on what could be considered a statistically useful number of tests

sergiodj · 2024-10-01T21:16:58Z

@cpaelzer @s-makin Hi, I've finally force-pushed my branch with almost all of the modifications requested by you. Could you please take another look? There's one specific change that looks more involved, so I left a comment asking for some help.

Thanks!

sergiodj · 2024-10-01T21:18:29Z

BTW, after reading all your comments I've been thinking whether using openssl speed as an example was indeed a good idea, given that it did not provide a concrete example of optimization generated by PGO. WDYT?

s-makin

Additional review on the parts that have been changed

explanation/performance/perf-pgo.md

cpaelzer · 2024-10-08T09:06:21Z

BTW, after reading all your comments I've been thinking whether using openssl speed as an example was indeed a good idea, given that it did not provide a concrete example of optimization generated by PGO. WDYT?

As I said, yes it isn't the most catchy example. But many readers might do the same mistake by picking an already highly optimized case and then be sad to not see benefit. PGO has the biggest potential if there was not 25 years of optimization yet. Hence you can use your mistake to the benefit of others - explain that it can be normal to not see a benefit if e.g. the workload is not good for it (similar if things are entriely I/O bound).

And for motivation then Refer to your blog post which has examples where it really helped despite the case being quite complex, too complex for manual tuning for example.

cpaelzer

I've before replied suggestions and opinion on a few threads that have been open between you and Sally. Then I've done another full pass leaving a few hints.

But these are polishing, this is already awesome and hence you get my Approval even before considering my next barrage of suggestions. If you implement them, fine - if not for good reason this is still good to go and help the world that want to experiment with it.

explanation/performance/perf-pgo.md

s-makin · 2024-11-06T09:53:03Z

explanation/performance/perf-pgo.md

+
+## More information
+
+LINK TO BLOG POST HERE


Don't forget to link :D

Thanks! Since the blog post will be available later, I have temporarily removed this subsection and will reintroduce it when I have the final link.

sergiodj · 2024-11-07T00:20:24Z

@cpaelzer @s-makin Thanks for all your feedback. I have addressed them and force-pushed the results. I believe this is ready to be published now, but I will let you take another look just in case.

s-makin

LGTM :) thanks for all your work on this!

I have taken the liberty of adding a few suggestions on adjusted header levels, and also I noticed in the rendered RTD page that the bash code blocks (when applied to the output) lead to some weird and distracting colours being applied. You can either split up the command + output, or change the code block language to "text" to avoid this. I'll leave it up to you to decide which you prefer :)

I'll let @cpaelzer take the approving review, I'm happy for it to be merged once he's happy.

explanation/performance/perf-pgo.md

s-makin

LGTM :)

Signed-off-by: Sergio Durigan Junior <[email protected]>

sergiodj requested review from s-makin and cpaelzer September 26, 2024 01:12

cpaelzer reviewed Sep 26, 2024

View reviewed changes

s-makin reviewed Sep 26, 2024

View reviewed changes

sergiodj force-pushed the pgo-doc branch 2 times, most recently from c20a183 to 344f9fd Compare October 1, 2024 21:15

sergiodj requested review from cpaelzer and s-makin October 1, 2024 21:17

s-makin reviewed Oct 7, 2024

View reviewed changes

sergiodj force-pushed the pgo-doc branch from d2d0cfd to d4cafdc Compare October 8, 2024 03:23

cpaelzer approved these changes Oct 8, 2024

View reviewed changes

s-makin reviewed Nov 6, 2024

View reviewed changes

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved

s-makin reviewed Nov 6, 2024

View reviewed changes

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved

s-makin reviewed Nov 6, 2024

View reviewed changes

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved

s-makin reviewed Nov 6, 2024

View reviewed changes

explanation/performance/perf-pgo.md Outdated Show resolved Hide resolved

s-makin reviewed Nov 6, 2024

View reviewed changes

sergiodj force-pushed the pgo-doc branch from d4cafdc to e4e0927 Compare November 7, 2024 00:19

s-makin reviewed Nov 7, 2024

View reviewed changes

s-makin approved these changes Nov 7, 2024

View reviewed changes

Add Profile-Guided Optimization page

06747db

Signed-off-by: Sergio Durigan Junior <[email protected]>

sergiodj force-pushed the pgo-doc branch from 7229b53 to 06747db Compare November 7, 2024 17:31

sergiodj merged commit 0258a6b into canonical:main Nov 7, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Profile-Guided Optimization page #55

Add Profile-Guided Optimization page #55

sergiodj commented Sep 26, 2024

cpaelzer left a comment

cpaelzer Sep 26, 2024

s-makin Sep 26, 2024

sergiodj Oct 1, 2024

cpaelzer Oct 7, 2024 •

edited

Loading

sergiodj Oct 8, 2024

cpaelzer Sep 26, 2024

s-makin Sep 26, 2024

sergiodj Oct 1, 2024

s-makin left a comment

s-makin Sep 26, 2024

s-makin Sep 26, 2024

sergiodj commented Oct 1, 2024

sergiodj commented Oct 1, 2024

s-makin left a comment

cpaelzer commented Oct 8, 2024

cpaelzer left a comment

s-makin Nov 6, 2024

sergiodj Nov 7, 2024

sergiodj commented Nov 7, 2024

s-makin left a comment

s-makin left a comment


		* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%.

		* There were some huge outliers, though. For example, `sha512` showed a gain of more than 13% in one of the runs, but also showed a loss of 11% in another run! This is likely noise due to external factors, even though we tried to isolate our test environment as much as possible. Still, it was surprising to see such discrepancies.


		We obtained some interesting results while performing the tests. Here are the highlights:

		* Overall, the performance gain/loss was minimal. Most of the time it was less than 1%.

Add Profile-Guided Optimization page #55

Add Profile-Guided Optimization page #55

Conversation

sergiodj commented Sep 26, 2024

cpaelzer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpaelzer Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-makin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergiodj commented Oct 1, 2024

sergiodj commented Oct 1, 2024

s-makin left a comment

Choose a reason for hiding this comment

cpaelzer commented Oct 8, 2024

cpaelzer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergiodj commented Nov 7, 2024

s-makin left a comment

Choose a reason for hiding this comment

s-makin left a comment

Choose a reason for hiding this comment

cpaelzer Oct 7, 2024 •

edited

Loading