Skip to content

Commit

Permalink
Fix workflow on Cheyenne (ufs-community#672)
Browse files Browse the repository at this point in the history
## DESCRIPTION OF CHANGES: 

A couple of fixes to get the workflow running on Cheyenne.

 - Remove `module purge` from load_modules_run_task.sh. This no longer causes failures on Cheyenne due to intervening PR ufs-community#650, but it should be removed anyway as it can cause future issues
 - Fixing the number of processors used in the mpirun command for the weather model on Cheyenne. I am honestly not sure how this was ever working, but this change fixes nearly all of the runtime failures currently seen on Cheyenne.

## TESTS CONDUCTED: 
### Cheyenne
Ran a set of WE2E tests on Cheyenne, chosen mostly at random to save core hours (I did ensure that a variety of domains were run so that several different MPI layouts were tested). Most tasks succeed, and all failures (aside from one walltime issue) are also tests that fail on Hera with the current develop branch. See issue ufs-community#673 for more details.

**Successful tests:**
 - grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
 - grid_GSD_HRRR_AK_50km_ics_RAP_lbcs_RAP_suite_GSD_SAR
 - grid_RRFS_CONUS_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
 - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
 - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
 - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
 - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
 - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
 - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
 - grid_RRFS_CONUS_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta

**Unsuccessful tests:**
 - All gfdlmp tests (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp)
 - grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16
 - GST_release_public_v1
   - Hit walltime limit

### Hera, Jet, and Orion
Ran the same set of tests on Hera, Jet, and Orion, with similar results. On Hera the GST successfully completed (though was close to reaching the walltime limit). On Jet, a few tests (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR, grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR, grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta) failed due to missing initial and/or lateral boundary conditions. On Orion, even more tests failed due to missing ICs and LBCs (grid_GSD_HRRR_AK_50km_ics_RAP_lbcs_RAP_suite_GSD_SAR, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16).

**To summarize, the only test failures were those that were also seen in develop, and mostly due to missing input files on those platforms.**

## DEPENDENCIES:
This will need to be merged prior to ufs-community#206

## ISSUE: 
ufs-community#663 has technically already been resolved, but this will fully address that specific issue.
  • Loading branch information
mkavulich authored Feb 4, 2022
1 parent 3a7ff03 commit c6d9937
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 3 deletions.
2 changes: 0 additions & 2 deletions ush/load_modules_run_task.sh
Original file line number Diff line number Diff line change
Expand Up @@ -132,8 +132,6 @@ jjob_fp="$2"
#-----------------------------------------------------------------------
#

module purge

machine=$(echo_lowercase $MACHINE)
env_fp="${SR_WX_APP_TOP_DIR}/env/${BUILD_ENV_FN}"
module use "${SR_WX_APP_TOP_DIR}/env"
Expand Down
2 changes: 1 addition & 1 deletion ush/machine/cheyenne.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ FIXLAM_NCO_BASEDIR=${FIXLAM_NCO_BASEDIR:-"/needs/to/be/specified"}

RUN_CMD_SERIAL="time"
RUN_CMD_UTILS='mpirun -np $nprocs'
RUN_CMD_FCST='mpirun -np $nprocs'
RUN_CMD_FCST='mpirun -np ${PE_MEMBER01}'
RUN_CMD_POST='mpirun -np $nprocs'

# MET Installation Locations
Expand Down

0 comments on commit c6d9937

Please sign in to comment.