Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Update Jenkinsfile to allow automated tests to run simultaneously on Orion/Hercules and Gaea/Gaea C5 #921

Merged

Conversation

MichaelLueken
Copy link
Collaborator

@MichaelLueken MichaelLueken commented Oct 3, 2023

DESCRIPTION OF CHANGES:

With the addition of Gaea C5 and Hercules to the Jenkinsfile, file contention between Gaea and Gaea C5, as well as Orion and Hercules, caused testing to fail for one of these two pairs of machines. While requeuing the failed machine would allow it to work, the automated tests should run automatically on all machines, simultaneously. The Platform team recommended adding the dir command to the stages' step sections in the Jenkinsfile. This has been completed and the Jenkins tests are now running on all machines simultaneously.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • hercules.intel
  • gaea-c5.intel
  • gaea.intel
  • jet.intel
  • Jenkins
  • coverage test suite

ISSUE:

Fixes #920

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
    This update only changes the Jenkinsfile and the scripts used to run the automated Jenkins tests.
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

CONTRIBUTORS (optional):

@kbooker79 @BruceKropp-Raytheon

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Oct 3, 2023
@MichaelLueken MichaelLueken changed the title [develop] Checking to see if simultaneous Jenkins tests will run on both Orion and Hercules [develop] Update Jenkinsfile to allow automated tests to run simultaneously on Orion/Hercules and Gaea/Gaea C5 Oct 4, 2023
@MichaelLueken
Copy link
Collaborator Author

Two tests failed on Hera Intel:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Central_Asia_3km                                    COMPLETE              23.49
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200          COMPLETE               5.61
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2  COMPLETE             748.75
get_from_HPSS_ics_HRRR_lbcs_RAP                                    COMPLETE              13.60
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2        DEAD                   4.11
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     DEAD                   3.87
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP                 COMPLETE               9.32
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2        COMPLETE               6.08
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2         COMPLETE             224.81
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16           COMPLETE             300.56
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR            COMPLETE             322.50
pregen_grid_orog_sfc_climo                                         COMPLETE               7.12
----------------------------------------------------------------------------------------------------
Total                                                              DEAD                1669.82

The two tests successfully completed when they were rerun:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2        COMPLETE               8.81
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot     COMPLETE              14.97
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE              23.78

One test failed on Hera GNU:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km                                           COMPLETE              30.46
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200         COMPLETE              18.70
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS                             DEAD                  31.93
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR             COMPLETE             234.18
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta      COMPLETE              35.33
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0              COMPLETE              23.01
long_fcst                                                          COMPLETE              76.60
MET_verification_only_vx                                           COMPLETE               0.15
MET_ensemble_verification_only_vx_time_lag                         COMPLETE               7.66
nco_grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16      COMPLETE             341.23
----------------------------------------------------------------------------------------------------
Total                                                              DEAD                 799.25

This test is apparently failing due to CFL violations:

FATAL from PE 7: compute_qs: saturation vapor pressure table overflow, nbad= 1

Reruns are also failing with the same error. No changes were made to the component hashes in this branch and no failures like these have been noted in previous Jenkins testing. The get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS test is even failing at the HEAD of develop. Will investigate further and issue #923 has been opened.

The tests have successfully passed on Gaea, Gaea C5, Hercules, Jet, and Orion, without needing to requeue tests on Gaea, Gaea C5, Orion, or Hercules.

Wrapping up modifications to allow test stage artifacts to be published to the S3 bucket, then this PR should be ready to be opened.

@MichaelLueken
Copy link
Collaborator Author

The failure encountered yesterday on Hera GNU is now successfully passing this morning:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS                             COMPLETE              33.80
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE              33.80

Presumably, something weird happened with the data during yesterday's test and changing the time and date is now allowing the test to pass.

…low GNU to successfully finish. The Hera GNU Functional Workflow Task Tests sometime fail because the run_fcst test doesn't finish in 30 minutes, which causes issues in the pipeline.
@MichaelLueken MichaelLueken merged commit e57f35d into ufs-community:develop Oct 9, 2023
4 of 5 checks passed
@MichaelLueken MichaelLueken deleted the feature/update_Jenkinsfile branch October 9, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
3 participants