Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HR4 GWD update for FV3 #2290

Merged
merged 40 commits into from
Jul 11, 2024
Merged

Conversation

Qingfu-Liu
Copy link
Collaborator

@Qingfu-Liu Qingfu-Liu commented May 19, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR#2290 depends on fv3atm PR##836 and ccpp-physics PR#207. These PRs are created for updating the code of the GWD.

Commit Message:

  * FV3 - 
    * ccpp-physics - 
  * NOAHMP - 

Priority:

  • High: This PR#2290 is created for HR4 and updating GWD code

Git Tracking

UFSWM:

  • None

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Updates/Changes Baselines.

Input data Changes:

  • There is a new set of orographic data related to this PR#2290 and the new data is described in PR#2670 workflow:
    HR4 GWD update NOAA-EMC/global-workflow#2670
    This PR#2670 includes changes for four scripts, and new orographic data.
    The new orographic data temporarily stored at:
    /scratch1/NCEPDEV/global/Qingfu.Liu/git/GWD_SHong/APR2023_SSO
    and this new data should replace the old data: /scratch1/NCEPDEV/global/glopara/fix/ugwd/20231027
  • New orographic input files on hera:
    • /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/FV3_input_data/INPUT_L127_gfsv17
    • /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/FV3_input_data48/INPUT_L127_gfsv17
    • /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/FV3_input_data192/INPUT_L127_gfsv17
    • /scratch2/NAGAPE/epic/UFS-WM_RT/NEMSfv3gfs/input-data-20240501/FV3_input_data384/INPUT_L127_gfsv17

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@grantfirl
Copy link
Collaborator

All (@Qingfu-Liu @barlage @JongilHan66 @mdtoyNOAA @cenlinhe @BoYang-NOAA), please look over this list copied from test_changes.list that comes out of the regression tests and confirm that all tests are expected to change results given the changes in NOAA-EMC/fv3atm#836 and ufs-community/ccpp-physics#207).

cpld_control_p8_mixedmode intel
cpld_control_gfsv17 intel
cpld_control_gfsv17_iau intel
cpld_restart_gfsv17 intel
cpld_mpi_gfsv17 intel
cpld_control_sfs intel
cpld_debug_gfsv17 intel
cpld_control_p8 intel
cpld_control_p8.v2.sfc intel
cpld_restart_p8 intel
cpld_control_qr_p8 intel
cpld_restart_qr_p8 intel
cpld_2threads_p8 intel
cpld_decomp_p8 intel
cpld_mpi_p8 intel
cpld_control_ciceC_p8 intel
cpld_control_c192_p8 intel
cpld_restart_c192_p8 intel
cpld_bmark_p8 intel
cpld_restart_bmark_p8 intel
cpld_s2sa_p8 intel
cpld_control_noaero_p8 intel
cpld_control_nowave_noaero_p8 intel
cpld_debug_p8 intel
cpld_debug_noaero_p8 intel
cpld_control_noaero_p8_agrid intel
cpld_control_c48 intel
cpld_warmstart_c48 intel
cpld_restart_c48 intel
cpld_control_p8_faster intel
cpld_control_pdlib_p8 intel
cpld_restart_pdlib_p8 intel
cpld_mpi_pdlib_p8 intel
cpld_debug_pdlib_p8 intel
control_CubedSphereGrid intel
control_CubedSphereGrid_parallel intel
control_latlon intel
control_wrtGauss_netcdf_parallel intel
control_c48 intel
control_c192 intel
control_c384 intel
control_p8 intel
control_p8.v2.sfc intel
control_p8_ugwpv1 intel
control_restart_p8 intel
control_noqr_p8 intel
control_restart_noqr_p8 intel
control_decomp_p8 intel
control_2threads_p8 intel
control_p8_lndp intel
control_p8_rrtmgp intel
control_p8_mynn intel
merra2_thompson intel
control_p8_faster intel
control_CubedSphereGrid_debug intel
control_wrtGauss_netcdf_parallel_debug intel
control_diag_debug intel
control_debug_p8 intel
rrfs_v1beta_debug intel
control_p8_atmlnd_sbs intel
control_p8_atmlnd intel
control_restart_p8_atmlnd intel
control_p8_atmlnd_debug intel
atmwav_control_noaero_p8 intel
atmaero_control_p8 intel
atmaero_control_p8_rad intel
atmaero_control_p8_rad_micro intel
control_c48 gnu
control_p8 gnu
control_p8_ugwpv1 gnu
control_diag_debug gnu
rrfs_v1beta_debug gnu
control_debug_p8 gnu
cpld_control_nowave_noaero_p8 gnu
cpld_control_pdlib_p8 gnu
cpld_debug_pdlib_p8 gnu

@mdtoyNOAA
Copy link
Contributor

Looks good

@grantfirl
Copy link
Collaborator

@Qingfu-Liu Here is the PR into your branch to add RT logs: Qingfu-Liu#1 (also updates FV3 submodule pointer)

update RT logs and FV3 submodule pointer
@grantfirl
Copy link
Collaborator

@Qingfu-Liu Can we move this out of draft and fill out the description?

@Qingfu-Liu Qingfu-Liu marked this pull request as ready for review June 6, 2024 12:42
@Qingfu-Liu
Copy link
Collaborator Author

@Qingfu-Liu Can we move this out of draft and fill out the description?

@grantfirl Thank you very much. I just updated the description and change the draft to PR

@jkbk2004
Copy link
Collaborator

@Qingfu-Liu can you sync up branch? We can start working on this pr.

@grantfirl
Copy link
Collaborator

@jkbk2004 Thanks very much for syncing this. I didn't have permissions to do this on Qingfu's branch. I'm still out of the country until mid next week, although I'll be available off and on throughout the day to merge if necessary if no one else is. Dustin is at the same conference as me.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Jun 20, 2024

@Qingfu-Liu @grantfirl @dustinswales @mdtoyNOAA can you check the following error message about mntvar !subgrid orographic statistics data ?

111: forrtl: severe (408): fort: (2): Subscript #2 of the array MNTVAR has value 15 which is greater than the upper bound of 14

It crashes with rrfs_v1beta_debug intel control_p8_atmlnd_sbs intel control_p8_atmlnd intel control_p8_atmlnd_debug intel rrfs_v1beta_debug gnu. An experiment output is available at /scratch2/NAGAPE/epic/Jong.Kim/stmp2/Jong.Kim/FV3_RT/rt_3529436/rrfs_v1beta_debug_intel/err

@Qingfu-Liu
Copy link
Collaborator Author

Qingfu-Liu commented Jun 20, 2024

@Qingfu-Liu @grantfirl @dustinswales @mdtoyNOAA can you check the following error message about mntvar !subgrid orographic statistics data ?

111: forrtl: severe (408): fort: (2): Subscript #2 of the array MNTVAR has value 15 which is greater than the upper bound of 14

It crashes with rrfs_v1beta_debug intel control_p8_atmlnd_sbs intel control_p8_atmlnd intel control_p8_atmlnd_debug intel rrfs_v1beta_debug gnu. An experiment output is available at /scratch2/NAGAPE/epic/Jong.Kim/stmp2/Jong.Kim/FV3_RT/rt_3529436/rrfs_v1beta_debug_intel/err

@jkbk2004 There is a new set of orographic data related to this PR#2290 and the new data is described in PR#2670 workflow. Can you run the regression tests using the new data? Thank you very much.:
NOAA-EMC/global-workflow#2670
This PR#2670 includes changes for four scripts, and new orographic data.
The new orographic data temporarily stored at:
/scratch1/NCEPDEV/global/Qingfu.Liu/git/GWD_SHong/APR2023_SSO
and this new data should replace the old data: /scratch1/NCEPDEV/global/glopara/fix/ugwd/20231027

@BrianCurtis-NOAA
Copy link
Collaborator

If there are any anticipated changes to data in the input-data-YYYYMMDD directories, it should be very specifically addressed in the template. Please make changes there.

@Qingfu-Liu
Copy link
Collaborator Author

If there are any anticipated changes to data in the input-data-YYYYMMDD directories, it should be very specifically addressed in the template. Please make changes there.

OK. Thanks. After look the error, I am not sure the failed tests are related to the data change. I just add Jongil who works on the code to see if he know this.

@Qingfu-Liu
Copy link
Collaborator Author

@JongilHan66 Can you take a look of the errors @jkbk2004 mentioned in this PR#2290? Thanks

@JongilHan66
Copy link
Collaborator

Currently we use gwd_opt=2 (unified ugwp GWD), which is defined in config.fcst. Then the dimension of MNTVAR is increased from 10 to 24, as described in 'GFS_typedefs.F90'.

@JongilHan66
Copy link
Collaborator

@Qingfu-Liu If you haven't yet, please update the config.fcst files with those in the directories of gfs & gefs in /lfs/h2/emc/physics/noscrub/jongil.han/git_HR4_gwd/global-workflow/parm/config

@Qingfu-Liu
Copy link
Collaborator Author

@Qingfu-Liu If you haven't yet, please update the config.fcst files with those in the directories of gfs & gefs in /lfs/h2/emc/physics/noscrub/jongil.han/git_HR4_gwd/global-workflow/parm/config

@JongilHan66 Those files are updated in workflow PR#2670:
https://github.com/NOAA-EMC/global-workflow/pull/2670/files

@JongilHan66
Copy link
Collaborator

@Qingfu-Liu If you haven't yet, please update the config.fcst files with those in the directories of gfs & gefs in /lfs/h2/emc/physics/noscrub/jongil.han/git_HR4_gwd/global-workflow/parm/config

@JongilHan66 Those files are updated in workflow PR#2670: https://github.com/NOAA-EMC/global-workflow/pull/2670/files

@Qingfu-Liu Did you also update the "parsing_namelists_FV3.sh" and "parsing_namelists_FV3_nest.sh" in the ush directory?

@Qingfu-Liu
Copy link
Collaborator Author

@Qingfu-Liu If you haven't yet, please update the config.fcst files with those in the directories of gfs & gefs in /lfs/h2/emc/physics/noscrub/jongil.han/git_HR4_gwd/global-workflow/parm/config

@JongilHan66 Those files are updated in workflow PR#2670: https://github.com/NOAA-EMC/global-workflow/pull/2670/files

@Qingfu-Liu Did you also update the "parsing_namelists_FV3.sh" and "parsing_namelists_FV3_nest.sh" in the ush directory?

@JongilHan66 Yes, both files are updated in the PR#2670. There are 4 files are updated in PR#2670

@JongilHan66
Copy link
Collaborator

@Qingfu-Liu If you haven't yet, please update the config.fcst files with those in the directories of gfs & gefs in /lfs/h2/emc/physics/noscrub/jongil.han/git_HR4_gwd/global-workflow/parm/config

@JongilHan66 Those files are updated in workflow PR#2670: https://github.com/NOAA-EMC/global-workflow/pull/2670/files

@Qingfu-Liu Did you also update the "parsing_namelists_FV3.sh" and "parsing_namelists_FV3_nest.sh" in the ush directory?

@JongilHan66 Yes, both files are updated in the PR#2670. There are 4 files are updated in PR#2670

@Qingfu-Liu If gwd_opt=2 in config.fcst, the dimension of MNTVAR is 24 and the model should not complain the max of 14 which is for gwd_opt=1.

@Qingfu-Liu
Copy link
Collaborator Author

@JongilHan66 Thanks. I think I understand the problem.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Jul 8, 2024

Why would the gnu version of these tests fail comparison but not the intel version?

@zach1221
Copy link
Collaborator

zach1221 commented Jul 8, 2024

Let me check the baselines again. It could be an issue with the baselines I synced.

@DeniseWorthen
Copy link
Collaborator

Thanks @BrianCurtis-NOAA. Does anyone know if those additional tests are expected to be failing comparison?

@grantfirl
Copy link
Collaborator

Thanks @BrianCurtis-NOAA. Does anyone know if those additional tests are expected to be failing comparison?

@DeniseWorthen @BrianCurtis-NOAA Something seems amiss between the test_changes.list and RegressionTests_hera.log. They don't show the same failures. FWIW, the failures listed in RegressionTests_hera.log match the failures that I had the last time I tested this PR there.

@BrianCurtis-NOAA
Copy link
Collaborator

@grantfirl apologies for not adding that, here's some other stats from that run_dir:

[18:54:25]Brian.Curtis@hfe03:/scratch1/NCEPDEV/nems/Brian.Curtis/git/Qingfu-Liu/ufs-weather-model/tests
-->grep -ril "NOT IDENTICAL" logs/log_hera/rt_* | wc -l
88

[18:54:35]Brian.Curtis@hfe03:/scratch1/NCEPDEV/nems/Brian.Curtis/git/Qingfu-Liu/ufs-weather-model/tests
-->cat test_changes.list | wc -l
106

[18:55:22]Brian.Curtis@hfe03:/scratch1/NCEPDEV/nems/Brian.Curtis/git/Qingfu-Liu/ufs-weather-model/tests
-->grep "aborted" rte.out | wc -l
18

the "aborted" search is for ecflow where it aborts a test as its parent test failed. The numbers add up.

@grantfirl
Copy link
Collaborator

@Qingfu-Liu @BrianCurtis-NOAA @DeniseWorthen @zach1221 Granted, this is not my work, but it appears to me that all additional RT failures to control_, rap_, etc. (since I ran them) are the result of the change in control_run.IN (changed input). So, in this sense, I guess that they are "expected", but whether these RTs should be using different input, I'll leave that up to others in this discussion.

@grantfirl
Copy link
Collaborator

@jkbk2004 @zach1221 Are we still waiting on results from some platforms, or is there another issue that needs to be addressed with this PR?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Jul 10, 2024

@jkbk2004 @zach1221 Are we still waiting on results from some platforms, or is there another issue that needs to be addressed with this PR?

@grantfirl We like to merge once wcoss2 tests are done.

@BrianCurtis-NOAA
Copy link
Collaborator

@jkbk2004 @zach1221 Are we still waiting on results from some platforms, or is there another issue that needs to be addressed with this PR?

@grantfirl We like to merge once wcoss2 tests are done.

113/189 tasks remaining with ecflow, currently.

@grantfirl
Copy link
Collaborator

@BrianCurtis-NOAA @jkbk2004 Thanks for the update!

@jkbk2004
Copy link
Collaborator

@BrianCurtis-NOAA @FernandoAndrade-NOAA move on to merge? or wait for acorn and jet test results?

@BrianCurtis-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA @FernandoAndrade-NOAA move on to merge? or wait for acorn and jet test results?

Acorn can be skipped it's down for a while.

@jkbk2004
Copy link
Collaborator

I see jet queue is very busy. We will catch up jet baseline later. We can start merging process. @grantfirl @dustinswales we can merge ufs-community/ccpp-physics#207 first.

@grantfirl
Copy link
Collaborator

I see jet queue is very busy. We will catch up jet baseline later. We can start merging process. @grantfirl @dustinswales we can merge ufs-community/ccpp-physics#207 first.

@jkbk2004 CCPP physics and framework PRs were merged and .gitmodules/submodules were updated in NOAA-EMC/fv3atm#836

@jkbk2004 jkbk2004 merged commit 0b59ad3 into ufs-community:develop Jul 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. New Input Data Req'd This PR requires new data to be sync across platforms Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.