Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms #934

Merged
merged 2 commits into from
Oct 10, 2023

Conversation

mkavulich
Copy link
Collaborator

DESCRIPTION OF CHANGES:

The option to create an experiment with the option USE_CRON_TO_RELAUNCH=True is currently broken on Cheyenne and Derecho due to some bad python logic. This PR fixes that issue.

I also took the opportunity to update the PR template to include the new supported platforms (Derecho, Hercules, and Gaea C5)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

TESTS CONDUCTED:

Ran WE2E fundamental tests with the option --launch=cron on three platforms. Previously failing on Cheyenne an Derecho, these tasks all succeed except for the grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 test on Cheyenne: this is a pre-existing failure (see Issue #933)

  • hera.intel
  • cheyenne.intel
  • derecho.intel

DEPENDENCIES:

None

DOCUMENTATION:

None

ISSUE:

Fixes #932

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

Copy link
Collaborator

@RatkoVasic-NOAA RatkoVasic-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@MichaelLueken MichaelLueken changed the title Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms [develop] Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms Oct 9, 2023
Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich - These changes look good to me! I was also able to test the current develop on Derecho and saw the same issue that you reported in issue #932. The test successfully passed when ran using your branch.

Approving this PR now.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Oct 9, 2023
@MichaelLueken
Copy link
Collaborator

@mkavulich - I wanted to let you know that the SRW v2.2 release branch was created last Friday - release/public-v2.2.0. This update should be included in the release, so please open another PR and make these changes, at least to ush/get_crontab_contents.py, in the release/public-v2.2.0 release branch. Thanks!

@MichaelLueken
Copy link
Collaborator

The Hera GNU tests failed in the Functional Workflow Task Tests. Relaunching the tests on that machine now.

@MichaelLueken
Copy link
Collaborator

The WE2E coverage tests were run on Derecho and all successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km                                     COMPLETE              21.55
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     COMPLETE              34.70
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16                COMPLETE              42.16
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR           COMPLETE              26.85
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta    COMPLETE              16.51
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR                COMPLETE              38.01
nco_grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_timeoffset_suite_  COMPLETE              22.55
pregen_grid_orog_sfc_climo                                         COMPLETE              12.77
specify_template_filenames                                         COMPLETE              13.20
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             228.30

Additionally, while the Jenkins pipeline doesn't show the status of the Test stage for Gaea, all tests successfully ran and passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community                                                          COMPLETE              32.58
custom_ESGgrid_NewZealand_3km                                      COMPLETE              82.04
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta    COMPLETE              40.09
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP              COMPLETE              48.88
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR             COMPLETE              41.41
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson  COMPLETE             432.25
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR          COMPLETE              51.55
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta     COMPLETE             391.29
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot      COMPLETE              15.88
nco_ensemble                                                       COMPLETE             122.72
nco_grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thom  COMPLETE             414.46
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1673.15

Once the Hera GNU tests successfully complete, I will be able to move forward and merge this work.

@MichaelLueken
Copy link
Collaborator

The Jenkins tests continued to fail on Hera GNU in the Functional Workflow Task Tests stage (30 minutes isn't enough time to run the community test using GNU compiled executables and fixes are added in PR #935 and #936 for develop and release/public-v2.2.0, respectively). The Hera GNU coverage tests were manually run on Hera GNU and successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km                                           COMPLETE              31.48
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200         COMPLETE              18.88
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS                             COMPLETE              33.88
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR             COMPLETE             233.42
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta      COMPLETE              36.70
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0              COMPLETE              23.34
long_fcst                                                          COMPLETE              81.91
MET_verification_only_vx                                           COMPLETE               0.12
MET_ensemble_verification_only_vx_time_lag                         COMPLETE               7.74
nco_grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16      COMPLETE             341.16
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             808.63

Merging this PR now.

@MichaelLueken MichaelLueken merged commit 77a81fa into ufs-community:develop Oct 10, 2023
3 of 5 checks passed
mkavulich added a commit that referenced this pull request Oct 11, 2023
…e for new platforms (#934)

The option to create an experiment with the option USE_CRON_TO_RELAUNCH=True is currently broken on Cheyenne and Derecho due to some bad python logic. This fixes that issue.

Also took the opportunity to update the PR template to include the new supported platforms (Derecho, Hercules, and Gaea C5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

USE_CRON_TO_RELAUNCH option is broken on Cheyenne and Derecho
4 participants