-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Replace hpc-stack with spack-stack #913
[develop] Replace hpc-stack with spack-stack #913
Conversation
@EdwardSnyder-NOAA is working on fixing issue with METPLUS_PATH, issue #905 |
Conflicts: modulefiles/build_gaea_intel.lua modulefiles/build_hera_gnu.lua modulefiles/build_hera_intel.lua modulefiles/build_jet_intel.lua modulefiles/build_orion_intel.lua modulefiles/srw_common.lua modulefiles/srw_common_spack.lua
@RatkoVasic-NOAA - Thanks for the heads up! Are you going to include a spack-stack build for Derecho as well, or just Hercules and Gaea C5? Just wanted to check and see which platforms would be included with this initial spack-stack transition. |
No, Derecho is not ready with spack-stack version 1.4, only 1.5.0 (and weather model is not ready yet for 1.5.0).For this commit, we will have Jet, Hera, Gaea C4,C5, Hercules, and Orion.Once wm is ready with 1.5.0, we can update all to that version.
@RatkoVasic-NOAA - Thanks for the heads up! Are you going to include a spack-stack build for Derecho as well, or just Hercules and Gaea C5? Just wanted to check and see which platforms would be included with this initial spack-stack transition.
|
I was able to successfully run vx cases on the NOAA cloud platforms (see below for details) after making a change to the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran a few met vx and/or fundamental tests without issue on Gaea, Hera, Hercules, Jet, NOAA Cloud, and Derecho. Approving.
Suggesting some cleanup for Gaea and longer time for run_MET_PcpCombine_fcst_APCP* tasks that timed out in one of my tests.
For spack-stack, intel-classic/2022.02 was used. Removing the last line with "PMP_NO_PREINITIALIZE" also allows to remove the --mpi=pmi2 in gaea.yaml machine file, in RUN_CMD_UTILS
|
Timing of the run_MET_PcpCombine_fcst_APCP* tasks in grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 test before the time increase:
Rerunning the tasks after increase of walltime=10min:
|
Increase walltime
@natalie-perlin Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RatkoVasic-NOAA - These changes look good to me! I was also able to successfully run the fundamental tests on Derecho without issue:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta COMPLETE 16.34
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_ COMPLETE 22.33
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2 COMPLETE 13.00
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 26.16
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR COMPLETE 29.88
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0 COMPLETE 30.04
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 COMPLETE 42.25
----------------------------------------------------------------------------------------------------
Total COMPLETE 180.00
Approving now.
@RatkoVasic-NOAA - The Hera GNU build has failed in Jenkins. The message is: The following modules are unknown: If you would like to see the Jenkins pipeline for this PR, please see: Looking at the versions, The workspace on Hera that contains the failed Jenkins test directory is: |
@MichaelLueken fixed and committed. |
Thanks, @RatkoVasic-NOAA! A test build using the Jenkins build scripts shows that the updated version should now build. I requeued the Hera GNU test in Jenkins. |
One test failed on Hera Intel:
Manual rerun of the
|
Tested and successfully ran fundamental tests on Gaea, Gaea C5 (still with hpc-stack), Orion, and Hercules. |
Derecho and Gaea-c5 are not yet ready for spack-stack transition, remaining with hpc-stack at the moment |
The Jenkins tests have successfully passed on Gaea, Gaea-C5, Hercules, Jet, and Orion. I have requeued the Hera GNU tests, which failed in the |
The WE2E coverage tests were successfully run on Derecho:
|
The Hera GNU tests have successfully passed with this morning's requeue:
With this, all tests have successfully passed. Moving forward with merging this PR now. |
DESCRIPTION OF CHANGES:
Replaced use of hpc-stack with spack-stack version 1.4.1
In modulefiles directory, build* and wflow* scripts are updated. Also removed srw_common_spack.lua and all *.lua files are calling same srw_common.lua file.
Ran fundamental tests on Hera, Jet, Gaea and Orion
NOTE1 there is need to fix MET and METplus for this PR (---fixed---)
NOTE2 using load_any, we still support machines with hpc-stack, with only one srw_common.lua. Once hpc-stack is removed, we can remove load_any and have simpler srw_common.lua modulefile
Type of change
TESTS CONDUCTED:
DEPENDENCIES:
None
DOCUMENTATION:
Once all machines switch to spack-stack, documentation on hpc-stack may be removed.
ISSUE:
Solves issue #912, #905
CHECKLIST
LABELS:
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS:
@natalie-perlin
@EdwardSnyder-NOAA