-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU execution of atm_compute_vert_imp_coefs via OpenACC #1241
base: develop
Are you sure you want to change the base?
Enable GPU execution of atm_compute_vert_imp_coefs via OpenACC #1241
Conversation
@abishekg7 Can you clean up the commits and add a description to this PR? |
Initial OpenACC port of atm_compute_vert_imp_coefs. - Separating the computation of cofrz into a separate parallel clause is required for correctness. - Variables b_tri and c_tri are declared private in lieu of extending the storage to two dimensions, like a_tri.
bd7bbdd
to
da32982
Compare
Thanks. Done! |
@@ -1894,32 +1894,46 @@ subroutine atm_compute_vert_imp_coefs_work(nCells, moist_start, moist_end, dts, | |||
real (kind=RKIND) :: dtseps, c2, qtotal, rcv | |||
real (kind=RKIND), dimension( nVertLevels ) :: b_tri, c_tri | |||
|
|||
MPAS_ACC_TIMER_START('atm_compute_vert_imp_coefs_work [ACC_data_xfer]') | |||
!$acc enter data copyin(rdzw, zz, fzm, fzp, rdzu, cqw, p, t, qtot, rb, & | |||
!$acc rtb, rt, pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the fields here are invariant (across timesteps) and should be handled in the mpas_atm_dynamics_{init,finalize}
routines instead.
Generally these invariant fields are those contained in the "invariant stream" (L421-L524 in the atm Registry.xml) or those accessed through the "mesh" mpas_pool
(L1222-L1516 in the atm Registry.xml)
|
||
! set coefficients | ||
dtseps = .5*dts*(1.+epssm) | ||
rcv = rgas/(cp-rgas) | ||
c2 = cp*rcv | ||
|
||
!$acc parallel | ||
!$acc loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to be explicit with loop
directives (e.g. loop gang worker
here).
|
||
!$acc exit data copyout(cofrz, cofwr, cofwz, coftz, cofwt, a_tri, b_tri, & | ||
!$acc c_tri, alpha_tri, gamma_tri) | ||
!$acc exit data delete(rdzw, zz, fzm, fzp, rdzu, cqw, p, t, qtot, rb, rtb, rt, pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides handling fields like fzm
in mpas_atm_dynamics_finalize
instead, these directives should be surrounded by MPAS_ACC_TIMER_{START,STOP}
calls.
! MGD bad to have all threads setting this variable? | ||
do k=1,nVertLevels | ||
cofrz(k) = dtseps*rdzw(k) | ||
end do | ||
|
||
!$acc end parallel | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line contains extra spaces that should be removed. Might be good to double check for any trailing whitespace in the lines you're adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR may need a little more work. I'm getting differences in 2 fields in the history file when comparing the regional testcase. Restart files are reported as matching.
Output from compare_netcdf.py:
Comparing files: f1 - f2
f1=../2024Dec31_MPASAPR1223_OpenACC/baseline_acc_1735844488/history.2019-09-01_00.06.00.nc
f2=test_att1735695005/history.2019-09-01_00.06.00.nc
Variable Min Max
=========================================
initial_time is not a numeric field and will not be compared
xtime is not a numeric field and will not be compared
depv_dt_fric -32.448421 33.315609
depv_dt_fric_pv -32.448421 30.517973
mminlu is not a numeric field and will not be compared
Some suggested edits as well.
Initial OpenACC port of atm_compute_vert_imp_coefs.
cofrz
into a separate parallel clause is required for correctness.b_tri
andc_tri
are declared private in lieu of extending the storage to two dimensions, likea_tri
.alpha_tri
andgamma_tri
needs to be sequential