-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220
base: develop
Are you sure you want to change the base?
Enable GPU execution of atm_recover_large_step_variables via OpenACC #1220
Conversation
NOTE: the changes in this PR slightly change answers. The changes seem to come from a couple of loops. In my regional test case, if the calculations of First this loop
And in this loop
|
Although there isn't anything in particular that looks suspect to me, it does seem like the differences in results after the first timestep of a 12-km regional simulation comparing with the current I also get a core dump at the end of the simulation on Derecho when using the following modules:
It may be worth taking a second look at the changes in this PR to see if we can find anything that's subtly off. |
This includes splitting a long do loop on iCell so an if condition on rk_step is outside the do loop. Though this condition shouldn't lead to warp divergence, it can still be helpful to do branch evaluation on the device best suited for it (the CPU). Since there was also limited data re-use in the long do loop, it may help to reduce kernel launch overhead. Also line up whitespace so the bdyMaskCell if condition is apparent and clean up some lines that end with whitespace.
…work This commit adds an initial port of this routine using OpenACC. More changes are needed to improve performance.
This commit ensures the invariant fields used during this work routine are present on the device from model startup to model shutdown. It builds on the changes in PR MPAS-Dev#1176 to copyin invariant fields during mpas_atm_dynamics_init and delete them from the device during mpas_atm_dynamics_finalize.
These changes ensure that the other, non-invariant, fields are available on the device during this routine. Some fields that are overwritten are only created at the beginning, while others are copied in. Any fields that were unmodified right-hand side fields are deleted at the end and modified left-hand side fields are copied out. Timing for these transfers are reported in the output log file in the new timer: 'atm_recover_large_step_variables [ACC_data_xfer]'.
168d6e3
to
4a923ad
Compare
@mgduda, this is ready for another round of review! There are slight answer differences, but isolated just to
|
@gdicker1 If I use the current HEAD of the
and
around lines 2860 and 2873 (in this PR branch) in order to get identical results with the baseline. If you agree that we should be initializing the |
I agree with those changes, and will get them pushed. I would assume we were just getting lucky before this point! |
…bles_work Since rw is created on the device in this PR, all values need to be updated to get correct results.
This PR enables GPU calculation of post-acoustic step variable reconstitution by adding OpenACC directives to the
atm_recover_large_step_variables_work
routine.Timing information for the OpenACC data transfers in this routine is captured in the log file by a new timer:
atm_recover_large_step_variables [ACC_data_xfer]
Invariant fields used in the work routine are copied in during
mpas_atm_dynamics_init
and deleted inmpas_atm_dynamics_finalize
by building off the fields already handled in previous PRs.