-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add module purge
to beginning of orion build module, update regional_workflow hash
#206
Merged
mkavulich
merged 3 commits into
ufs-community:develop
from
mkavulich:feature/orion_module_purge
Feb 4, 2022
Merged
Add module purge
to beginning of orion build module, update regional_workflow hash
#206
mkavulich
merged 3 commits into
ufs-community:develop
from
mkavulich:feature/orion_module_purge
Feb 4, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…to updated regional_workflow hash (needs to be updated after that PR is merged)
mkavulich
added
Tested on Cheyenne
Successfully tested on Cheyenne machine
Tested on Hera
Tested successfully on Hera machine
Tested on Jet
Successfully tested on Jet machine
labels
Feb 3, 2022
gsketefian
reviewed
Feb 4, 2022
mkavulich
added a commit
to ufs-community/regional_workflow
that referenced
this pull request
Feb 4, 2022
## DESCRIPTION OF CHANGES: A couple of fixes to get the workflow running on Cheyenne. - Remove `module purge` from load_modules_run_task.sh. This no longer causes failures on Cheyenne due to intervening PR #650, but it should be removed anyway as it can cause future issues - Fixing the number of processors used in the mpirun command for the weather model on Cheyenne. I am honestly not sure how this was ever working, but this change fixes nearly all of the runtime failures currently seen on Cheyenne. ## TESTS CONDUCTED: ### Cheyenne Ran a set of WE2E tests on Cheyenne, chosen mostly at random to save core hours (I did ensure that a variety of domains were run so that several different MPI layouts were tested). Most tasks succeed, and all failures (aside from one walltime issue) are also tests that fail on Hera with the current develop branch. See issue #673 for more details. **Successful tests:** - grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 - grid_GSD_HRRR_AK_50km_ics_RAP_lbcs_RAP_suite_GSD_SAR - grid_RRFS_CONUS_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta - grid_RRFS_CONUS_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta **Unsuccessful tests:** - All gfdlmp tests (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp) - grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16 - GST_release_public_v1 - Hit walltime limit ### Hera, Jet, and Orion Ran the same set of tests on Hera, Jet, and Orion, with similar results. On Hera the GST successfully completed (though was close to reaching the walltime limit). On Jet, a few tests (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR, grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR, grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta) failed due to missing initial and/or lateral boundary conditions. On Orion, even more tests failed due to missing ICs and LBCs (grid_GSD_HRRR_AK_50km_ics_RAP_lbcs_RAP_suite_GSD_SAR, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16). **To summarize, the only test failures were those that were also seen in develop, and mostly due to missing input files on those platforms.** ## DEPENDENCIES: This will need to be merged prior to ufs-community/ufs-srweather-app#206 ## ISSUE: #663 has technically already been resolved, but this will fully address that specific issue.
JeffBeck-NOAA
approved these changes
Feb 4, 2022
gsketefian
approved these changes
Feb 4, 2022
mkavulich
added a commit
to mkavulich/ufs-srweather-app
that referenced
this pull request
Aug 26, 2022
## DESCRIPTION OF CHANGES: A couple of fixes to get the workflow running on Cheyenne. - Remove `module purge` from load_modules_run_task.sh. This no longer causes failures on Cheyenne due to intervening PR ufs-community#650, but it should be removed anyway as it can cause future issues - Fixing the number of processors used in the mpirun command for the weather model on Cheyenne. I am honestly not sure how this was ever working, but this change fixes nearly all of the runtime failures currently seen on Cheyenne. ## TESTS CONDUCTED: ### Cheyenne Ran a set of WE2E tests on Cheyenne, chosen mostly at random to save core hours (I did ensure that a variety of domains were run so that several different MPI layouts were tested). Most tasks succeed, and all failures (aside from one walltime issue) are also tests that fail on Hera with the current develop branch. See issue ufs-community#673 for more details. **Successful tests:** - grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 - grid_GSD_HRRR_AK_50km_ics_RAP_lbcs_RAP_suite_GSD_SAR - grid_RRFS_CONUS_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta - grid_RRFS_CONUS_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta **Unsuccessful tests:** - All gfdlmp tests (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp) - grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16 - GST_release_public_v1 - Hit walltime limit ### Hera, Jet, and Orion Ran the same set of tests on Hera, Jet, and Orion, with similar results. On Hera the GST successfully completed (though was close to reaching the walltime limit). On Jet, a few tests (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR, grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR, grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta) failed due to missing initial and/or lateral boundary conditions. On Orion, even more tests failed due to missing ICs and LBCs (grid_GSD_HRRR_AK_50km_ics_RAP_lbcs_RAP_suite_GSD_SAR, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16). **To summarize, the only test failures were those that were also seen in develop, and mostly due to missing input files on those platforms.** ## DEPENDENCIES: This will need to be merged prior to ufs-community#206 ## ISSUE: ufs-community#663 has technically already been resolved, but this will fully address that specific issue.
mkavulich
added a commit
that referenced
this pull request
Sep 8, 2022
## DESCRIPTION OF CHANGES: A couple of fixes to get the workflow running on Cheyenne. - Remove `module purge` from load_modules_run_task.sh. This no longer causes failures on Cheyenne due to intervening PR #650, but it should be removed anyway as it can cause future issues - Fixing the number of processors used in the mpirun command for the weather model on Cheyenne. I am honestly not sure how this was ever working, but this change fixes nearly all of the runtime failures currently seen on Cheyenne. ## TESTS CONDUCTED: ### Cheyenne Ran a set of WE2E tests on Cheyenne, chosen mostly at random to save core hours (I did ensure that a variety of domains were run so that several different MPI layouts were tested). Most tasks succeed, and all failures (aside from one walltime issue) are also tests that fail on Hera with the current develop branch. See issue #673 for more details. **Successful tests:** - grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 - grid_GSD_HRRR_AK_50km_ics_RAP_lbcs_RAP_suite_GSD_SAR - grid_RRFS_CONUS_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16 - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR - grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta - grid_RRFS_CONUS_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta **Unsuccessful tests:** - All gfdlmp tests (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp) - grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16 - GST_release_public_v1 - Hit walltime limit ### Hera, Jet, and Orion Ran the same set of tests on Hera, Jet, and Orion, with similar results. On Hera the GST successfully completed (though was close to reaching the walltime limit). On Jet, a few tests (grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR, grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR, grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta) failed due to missing initial and/or lateral boundary conditions. On Orion, even more tests failed due to missing ICs and LBCs (grid_GSD_HRRR_AK_50km_ics_RAP_lbcs_RAP_suite_GSD_SAR, grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp, grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16). **To summarize, the only test failures were those that were also seen in develop, and mostly due to missing input files on those platforms.** ## DEPENDENCIES: This will need to be merged prior to #206 ## ISSUE: #663 has technically already been resolved, but this will fully address that specific issue.
SamuelTrahanNOAA
pushed a commit
to SamuelTrahanNOAA/ufs-srweather-app
that referenced
this pull request
Sep 15, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Tested on Cheyenne
Successfully tested on Cheyenne machine
Tested on Hera
Tested successfully on Hera machine
Tested on Jet
Successfully tested on Jet machine
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DESCRIPTION OF CHANGES:
With changes in regional_workflow necessary for fixing of some problems on the Cheyenne platform, the ufs-srweather-app hash needs to be updated, and a
module purge
command needs to be added to the beginning of the Orion platform's build module. This ensures a clean and consistent build environment and avoids tricky-to-solve environment-specific build errors.TESTS CONDUCTED:
Testing complete on Cheyenne, Orion, Hera, and Jet. All tests pass aside from those with pre-existing issues; see regional_workflow PR for more details
DEPENDENCIES:
ISSUE (optional):
Related to issue ufs-community/regional_workflow#663