Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow diagnostic accumulation bucket to change in fv3atm integration #2128

Closed
wants to merge 18 commits into from

Conversation

junwang-noaa
Copy link
Collaborator

@junwang-noaa junwang-noaa commented Feb 8, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full RT suite (compared to current baselines) on either Hera/Derecho/Hercules AND have committed the log to my PR branch.
  • Add list of all failed regression tests in "Regression Tests" section.

PR Information

Description

The code updates are to allow diagnostic accumulation bucket to change during integration in FV3ATM.

Commit Message

Add new capability to allow diagnostic accumulation bucket to change in fv3atm integration

Priority

  • Critical Bugfix (This PR contains a critical bug fix and should be prioritized.)
  • High (This PR contains a feature or fix needed for a time-sensitive project (eg, retrospectives, implementations))
  • Normal

Blocking Dependencies

NOAA-EMC/fv3atm#775

Git Issues Fixed By This PR

Closes #1983

Changes

Subcomponent (with links)

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Input data

  • No changes are expected to input data.
  • Changes are expected to input data:
    • New input data.
    • Updated input data.

Regression Tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:
FAILED REGRESSION TESTS

Libraries

  • Not Needed
  • Needed
    • Create separate issue in JCSDA/spack-stack asking for update to library. Include library name, library version.
    • Add issue link from JCSDA/spack-stack following this item

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@junwang-noaa
Copy link
Collaborator Author

@MatthewPyle-NOAA I opened a new PR here.

@MatthewPyle-NOAA
Copy link
Collaborator

Thanks @junwang-noaa

@junwang-noaa
Copy link
Collaborator Author

@WenMeng-NOAA We are outputting the non-integer bucket (e.g. fhzero=0.25), This feature is requested by RRFSv1 application. With the changes in this PR, model is able to output non-integer bucket. It looks to me that the UPP develop branch needs to be updated to allow the non-integer bucket in the UPP products. @MatthewPyle-NOAA mentioned that there is some UPP version that can do that. May I ask if you can take a look into this? Thanks

@WenMeng-NOAA
Copy link
Contributor

@WenMeng-NOAA We are outputting the non-integer bucket (e.g. fhzero=0.25), This feature is requested by RRFSv1 application. With the changes in this PR, model is able to output non-integer bucket. It looks to me that the UPP develop branch needs to be updated to allow the non-integer bucket in the UPP products. @MatthewPyle-NOAA mentioned that there is some UPP version that can do that. May I ask if you can take a look into this? Thanks

@junwang-noaa Thanks for the heads-up. Do you have sample model output in netcdf? I could test in UPP.

@junwang-noaa
Copy link
Collaborator Author

@WenMeng-NOAA You can see some sample model output on hera at:

/scratch1/NCEPDEV/stmp2/Jun.Wang/FV3_RT/rt_237670/regional_control_intel

Thanks

@junwang-noaa
Copy link
Collaborator Author

The code is synced with the latest develop branch. Now with fhzero data type changed from integer to real, all the RT tests except datm tests and no quilt test are failed as expected.

[Jun.Wang@hfe02 logs]$ grep "PASS -- TEST" RegressionTests_hera.log |more
PASS -- TEST 'regional_noquilt_intel' [07:42, 05:03](1365 MB)
PASS -- TEST 'hafs_regional_datm_cdeps_intel' [23:58, 16:09](1201 MB)
PASS -- TEST 'datm_cdeps_control_cfsr_intel' [10:17, 02:44](1103 MB)
PASS -- TEST 'datm_cdeps_restart_cfsr_intel' [03:17, 01:41](1069 MB)
PASS -- TEST 'datm_cdeps_control_gefs_intel' [09:16, 02:42](1000 MB)
PASS -- TEST 'datm_cdeps_iau_gefs_intel' [09:17, 02:39](1001 MB)
PASS -- TEST 'datm_cdeps_stochy_gefs_intel' [08:14, 02:38](1010 MB)
PASS -- TEST 'datm_cdeps_ciceC_cfsr_intel' [06:13, 02:46](1107 MB)
PASS -- TEST 'datm_cdeps_bulk_cfsr_intel' [05:14, 02:45](1121 MB)
PASS -- TEST 'datm_cdeps_bulk_gefs_intel' [04:15, 02:32](1001 MB)
PASS -- TEST 'datm_cdeps_mx025_cfsr_intel' [09:20, 05:59](1042 MB)
PASS -- TEST 'datm_cdeps_mx025_gefs_intel' [09:14, 06:10](1027 MB)
PASS -- TEST 'datm_cdeps_multiple_files_cfsr_intel' [04:12, 02:39](1125 MB)
PASS -- TEST 'datm_cdeps_3072x1536_cfsr_intel' [05:15, 03:53](2428 MB)
PASS -- TEST 'datm_cdeps_gfs_intel' [05:16, 03:55](2429 MB)
PASS -- TEST 'datm_cdeps_debug_cfsr_intel' [08:16, 06:07](1043 MB)
PASS -- TEST 'datm_cdeps_control_cfsr_faster_intel' [04:13, 02:40](1119 MB)
PASS -- TEST 'datm_cdeps_lnd_gswp3_intel' [02:25, 00:43](255 MB)
PASS -- TEST 'datm_cdeps_lnd_era5_intel' [02:19, 00:46](317 MB)
PASS -- TEST 'datm_cdeps_lnd_era5_rst_intel' [07:23, 00:30](316 MB)
PASS -- TEST 'datm_cdeps_control_cfsr_gnu' [08:17, 02:58](694 MB)

The log file is at:
/scratch1/NCEPDEV/nems/Jun.Wang/nems/vlab/20231122/ufs-weather-model/tests/logs/RegressionTests_hera.log

@junwang-noaa junwang-noaa marked this pull request as ready for review March 4, 2024 20:12
DusanJovic-NOAA
DusanJovic-NOAA previously approved these changes Mar 4, 2024
@zach1221 zach1221 added Baseline Updates Current baselines will be updated. Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. labels Mar 5, 2024
@WenMeng-NOAA
Copy link
Contributor

@WenMeng-NOAA You can see some sample model output on hera at:

/scratch1/NCEPDEV/stmp2/Jun.Wang/FV3_RT/rt_237670/regional_control_intel

Thanks

@junwang-noaa @MatthewPyle-NOAA I have been testing UPP with bucket in 15 minutes. With the currentl RRFS control file postxconfig-NT-fv3lam_rrfs.txt, the UPP fails to generate some averaged/accumulated variables with correct grib2 metadata. An example is 10-m MAXUW. In UPP, this variable is hard-wired to output as hourly average.

Copy link
Collaborator

@BrianCurtis-NOAA BrianCurtis-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was not ready yet. It's missing test_changes.list

@FernandoAndrade-NOAA
Copy link
Collaborator

FernandoAndrade-NOAA commented Mar 6, 2024

There are test cases failing during the initial BL creation run across Gaea, Hera, and Jet. A rerun on hera indicates the tests are going to keep failing due to timeouts. The tests that fail are not consistent across machines:

@zach1221 FYI

Hera:

control_c48_gnu

Gaea:

cpld_control_c192_p8 intel
control_flake intel
control_CubedSphereGrid intel
control_CubedSphereGrid_parallel intel
control_latlon intel
control_wrtGauss_netcdf_parallel intel
control_c48 intel
control_c48.v2.sfc intel
control_c192 intel
control_c384 intel
control_c384gdas intel
control_stochy intel
control_lndp intel
control_iovr4 intel
control_iovr5 intel
control_p8 intel
control_p8.v2.sfc intel
control_p8_ugwpv1 intel
control_p8_lndp intel
control_p8_rrtmgp intel
control_p8_mynn intel
merra2_thompson intel
regional_control intel
regional_netcdf_parallel intel
regional_wofs intel
rap_control intel
regional_spp_sppt_shum_skeb intel
rap_sfcdiff intel
hrrr_control intel
rrfs_v1beta intel
rrfs_v1nssl intel
rrfs_v1nssl_nohailnoccn intel
control_csawmg intel
control_csawmgt intel
control_ras intel
control_p8_faster intel
regional_control_faster intel
control_CubedSphereGrid_debug intel
control_wrtGauss_netcdf_parallel_debug intel
control_stochy_debug intel
control_lndp_debug intel
control_csawmg_debug intel
control_csawmgt_debug intel
control_ras_debug intel
control_debug_p8 intel
regional_debug intel
hrrr_control_debug intel
hrrr_gf_debug intel
hrrr_c3_debug intel
rap_diag_debug intel
rap_noah_debug intel
rap_sfcdiff_debug intel
rrfs_v1beta_debug intel
gnv1_c96_no_nest_debug intel
regional_spp_sppt_shum_skeb_dyn32_phy32 intel
rap_control_dyn32_phy32 intel
hrrr_control_dyn32_phy32 intel
conus13km_control intel
conus13km_restart_mismatch intel
rap_control_dyn64_phy32 intel
hrrr_control_debug_dyn32_phy32 intel
conus13km_debug intel
conus13km_radar_tten_debug intel
hafs_regional_atm intel
hafs_regional_atm_ocn intel
hafs_regional_atm_wav intel
hafs_regional_atm_ocn_wav intel
hafs_regional_1nest_atm intel
hafs_global_1nest_atm intel
hafs_regional_specified_moving_1nest_atm intel
hafs_regional_storm_following_1nest_atm intel
hafs_global_storm_following_1nest_atm intel
gnv1_nested intel
hafs_regional_storm_following_1nest_atm_ocn_wav intel
hafs_regional_storm_following_1nest_atm_ocn_wav_inline intel
hafs_regional_storm_following_1nest_atm_ocn_wav_mom6 intel
hafs_regional_docn intel
hafs_regional_docn_oisst intel
control_p8_atmlnd_sbs intel
control_p8_atmlnd intel
atmwav_control_noaero_p8 intel
control_atmwav intel
atmaero_control_p8 intel
atmaero_control_p8_rad intel
atmaero_control_p8_rad_micro intel
regional_atmaq_debug intel

Jet:

cpld_control_c48 intel
control_c48 intel
control_c48.v2.sfc intel
regional_control intel
regional_netcdf_parallel intel
regional_control_faster intel
regional_debug intel

@FernandoAndrade-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA FYI committed test_changes.list from Jun's hera directory

@junwang-noaa
Copy link
Collaborator Author

@FernandoAndrade-NOAA May I ask the location of the RT run directory? I'd like to take a look. Thanks

@FernandoAndrade-NOAA
Copy link
Collaborator

FernandoAndrade-NOAA commented Mar 6, 2024

@FernandoAndrade-NOAA May I ask the location of the RT run directory? I'd like to take a look. Thanks

Sure thing:
/scratch1/NCEPDEV/nems/Fernando.Andrade-maldonado/regression-tests/wm/2128/ufs-weather-model/tests
&
/scratch1/NCEPDEV/stmp2/Fernando.Andrade-maldonado/FV3_RT/rt_301310/control_c48_gnu/err

@jkbk2004 jkbk2004 removed the Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. label Mar 6, 2024
@BrianCurtis-NOAA
Copy link
Collaborator

@junwang-noaa I am getting a bunch of timeouts on WCOSS2 as well:

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_136929/

@junwang-noaa junwang-noaa self-assigned this Apr 12, 2024
@jkbk2004
Copy link
Collaborator

jkbk2004 commented May 2, 2024

@junwang-noaa Can you sync up branches? We can start working on this pr.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented May 2, 2024

@zach1221 @FernandoAndrade-NOAA Branches are synced ok. Can you proceed to test?

@zach1221
Copy link
Collaborator

zach1221 commented May 2, 2024

@zach1221 @FernandoAndrade-NOAA Branches are synced ok. Can you proceed to test?

Sure. @BrianCurtis-NOAA

@zach1221 zach1221 added the Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. label May 2, 2024
@zach1221
Copy link
Collaborator

zach1221 commented May 2, 2024

Does the tests changed file need to be updated? @jkbk2004 @junwang-noaa

@junwang-noaa
Copy link
Collaborator Author

@jkbk2004 @DusanJovic-NOAA and I are looking at the gnu test control_c48 that hangs on hera. The test passed on hercules. It is possible that either the version of the GNU compiler or the version of the MPI library. The line that causes the hang was identified is the line:
https://github.com/NOAA-EMC/fv3atm/pull/775/files#diff-dc3da9b9c37c068b769128e69328ab808bb6a17947cae75342a9a462cebf63ebR1187

The test also works with default mpi tasks on hera. At this point, we will report the issue to ESMF team. We suggest turning off this test on hera and we will follow up with a fix.

@zach1221 zach1221 removed the Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. label May 2, 2024
@junwang-noaa
Copy link
Collaborator Author

@jkbk2004 @BrianCurtis-NOAA @DusanJovic-NOAA I turned off the control_c48 gnu test and updated the test_changes.list

@BrianCurtis-NOAA
Copy link
Collaborator

@junwang-noaa could you clarify why so many baselines are changing?

@junwang-noaa
Copy link
Collaborator Author

An attribute (fhzero) in the sfc file is changed from integer to real. So we expect all the tests with fv3atm will change baseline.

@junwang-noaa
Copy link
Collaborator Author

@jkbk2004 @BrianCurtis-NOAA @DusanJovic-NOAA I ran RT test on wcoss2, and also saw the hanging issue in other tests. I'd suggest skipping this PR until we work with ESMF to find a fix. Thanks

@BrianCurtis-NOAA
Copy link
Collaborator

Sounds like a good plan to me.

@jkbk2004
Copy link
Collaborator

jkbk2004 commented May 3, 2024

@junwang-noaa thanks for the note!

@jkbk2004
Copy link
Collaborator

Replacement PR #2281 was merged

@jkbk2004 jkbk2004 closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Add the ability to handle accumulations and max/min computations over multiple time windows
8 participants