-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Blog Post on Autotuning for Kokkos with APEX #130
base: main
Are you sure you want to change the base?
Changes from 9 commits
d0f7a54
66f9f65
e2312ab
6fa5156
6f0c076
76b4074
6093077
afe018c
48567cb
5744a5d
a47a88d
254c2bc
d63974c
2fed83e
4c52f6a
fae91e3
5686423
1f99a9f
0ec7e22
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
dalg24 marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,37 @@ | ||||||
--- | ||||||
authors: ["kokkos-team"] | ||||||
title: "Kokkos Releases New Autotuning Features" | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Be more specific, this title will look confusing in 1y, even more so in 3. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I used your suggestion for the more specific title. Perhaps we can say "...Introduces New Online Auto-Tuning Features" to be even more specific. As described in the blog, this Online Auto-Tuning relies on Kokkos Tools infrastructure. I wonder if that makes it too long. @khuck Feel free to suggest a different title. |
||||||
date: 2024-12-20 | ||||||
tags: ["blog"] | ||||||
thumbnail: img/blog/apex-kokkos.png | ||||||
--- | ||||||
|
||||||
# Motivation | ||||||
|
||||||
By default, internal Kokkos execution space parameters are empirically or heuristically hand-tuned with fixed parameter values to provide "one size fits most" performance, with the goal of minimizing the effect of the abstraction overhead and approximating the performance of an optimized, lower-level implementation. Can these parameters be tuned for a particular application and architecture for programmers to easily tackle further performance opportunities? The Kokkos Tools APEX auto-tuning connector [5], a git submodule in Kokkos Tools with a stable version released in Kokkos 4.5 [4], offers an answer. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the git-submodule part important? I would not mention it. |
||||||
|
||||||
# How it Works | ||||||
|
||||||
Kokkos includes a Tuning API (TuningInterface) that can be used to construct a tuning context around a computational kernel, declare input variables that define the context state, declare output variables to be tuned, and request output variables when the kernel is executed. The Kokkos Tools infrastructure provide integrated support to utilize this API during Kokkos application execution, i.e., online, rather than offline [2]. Together, Kokkos Tools and the Tuning API is used in APEX to tune at runtime Kokkos kernel parameters running in any execution space / policy combination. We note through this Kokkos auto-tuning capability from APEX allows for (a) switching its tuning heuristics between Kokkos Execution Spaces (i.e. choose between serial or OpenMP depending on the problem size, etc.) or execution policies and (b) auto-tuning any arbitrary parameter within an application that uses Kokkos - solver choices, algorithmic parameters, tolerances, etc. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is "TuningInterface" referring to? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if we can come up with a code snippet illustrating how one declare tunable variables and whether it would help here. Thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I think we could show the matrix multiplication code that is being experimented with. The code and setup with the Tuning API is a bit long (about 50 lines of code) and all of it is needed. @khuck and I discussed it but decided it's not needed to be shown here - maybe we can go into more detail in that Wiki Post we linked in the Outcomes Section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what @khuck and I used to refer to the set of functions in Kokkos::Tools that helps declare tuning parameter variables and tuning contexts in which those variables should be tuned, e.g., the function Kokkos::Tools::declareTuningVariable() is part of that interface. I think the text TuningInterface could actually be taken out and we can link to the Wiki Post for more information. Also, maybe this can documented better in Kokkos or Kokkos Tools right now - this GitHub issue is probably the best documentation for its use: kokkos/kokkos-tools#90 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well... that's one of the issues with the current API, it is kind of complicated to create a variable... |
||||||
|
||||||
# Outcomes | ||||||
|
||||||
Our experiments have shown that in most cases the actively tuning case still performs faster than the default, untuned configuration despite the search exploration overhead. Figure 1 shows how the Kokkos Tools APEX auto-tuning connector adjusts the occupancy for a Kokkos parallel_for in a Kokkos benchmark [3] via APEX’s auto-tuning capabilities. From the figure, we see how the best-performing parameter value converges half-way through the Kokkos application’s execution. The figure below shows how Kokkos tuning parameter values converge over Kokkos Application Execution. | ||||||
|
||||||
{{< image src="img/blog/2024/APEX-tuning.jpeg" style="float: center; height=10">}} | ||||||
|
||||||
For an in-depth example on how to use the Kokkos Tools runtime auto-tuning API with the APEX performance measurement and runtime adaptation tool, see the Wiki post at [https://github.com/UO-OACISS/apex/wiki/Kokkos-Runtime-Auto-Tuning-with-APEX](https://github.com/UO-OACISS/apex/wiki/Kokkos-Runtime-Auto-Tuning-with-APEX). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mean for the URL to appear rather than making "Wiki post" an hyper reference? |
||||||
|
||||||
The Kokkos team welcomes users to try the Kokkos Tools APEX auto-tuning capabilities and provide feedback given their auto-tuning needs. The Kokkos team is actively working on new features for auto-tuning, including providing a new flag for Kokkos executables, ML-guidance of auto-tuning, per-MPI process auto-tuning, and utilizing feedback from performance monitoring software such as LDMS. | ||||||
|
||||||
# References | ||||||
|
||||||
[1] Kokkos Tools library: [https://github.com/kokkos/kokkos-tools](https://github.com/kokkos/kokkos-tools) | ||||||
|
||||||
[2] GPTune for Kokkos Albany: [https://linkinghub.elsevier.com/retrieve/pii/S0377042723001668](https://linkinghub.elsevier.com/retrieve/pii/S0377042723001668) | ||||||
|
||||||
[3] Kokkos Occupancy Tuning Benchmark: [https://github.com/khuck/apex-kokkos-tuning/blob/main/tests/occupancy.cpp](https://github.com/khuck/apex-kokkos-tuning/blob/main/tests/occupancy.cpp) | ||||||
|
||||||
[4] Kokkos 4.5 Release Briefing: [https://github.com/kokkos/kokkos-tutorials/blob/main/Other/ReleaseBriefings/release-45.pdf](https://github.com/kokkos/kokkos-tutorials/blob/main/Other/ReleaseBriefings/release-45.pdf) | ||||||
|
||||||
[5] Autonomic Performance Environment for eXascale (APEX): [https://github.com/UO-OACISS/apex](https://github.com/UO-OACISS/apex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added a whole lot of duplicated images with various quality and format. Was it intentional?