Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Updates for FF on Hercules #2194

Conversation

TerrenceMcGuinness-NOAA
Copy link
Collaborator

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA commented Jan 2, 2024

Description

If merged, this PR will update the CI scripts so they will run on Hercules

Type of change

  • New feature (adds functionality)
    Updates CI to run on Hercules

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

This will be tested in place when the label CI-Hercules-Ready is applied to this PR.

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI/CD Issue related to CI/CD CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules labels Jan 2, 2024
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules labels Jan 2, 2024
@emcbot
Copy link

emcbot commented Jan 2, 2024

CI Update on Hercules at 01/02/24 11:28:06 AM
============================================
Cloning and Building global-workflow PR: 2194
with PID: 2781716 on host: hercules-login-1

@emcbot emcbot added CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Jan 2, 2024
@emcbot
Copy link

emcbot commented Jan 2, 2024

Automated global-workflow Testing Results:

Machine: Hercules
Start: Tue Jan  2 11:29:40 CST 2024 on hercules-login-1.hpc.msstate.edu
---------------------------------------------------
Build: Completed at 01/02/24 11:39:16 AM
Case setup: Completed for experiment C48_ATM_d40c039e
Case setup: Completed for experiment C48_S2SW_d40c039e
Case setup: Completed for experiment C48_S2SWA_gefs_d40c039e
Case setup: Skipped for experiment C96C48_hybatmDA_d40c039e
Case setup: Skipped for experiment C96_atm3DVar_d40c039e

@emcbot emcbot added CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Jan 2, 2024
@emcbot
Copy link

emcbot commented Jan 2, 2024

Experiment C48_S2SWA_gefs_d40c039e  *** FAILED *** on Hercules
Experiment C48_S2SWA_gefs_d40c039e  with 1 tasks failed at 01/02/24 11:51:07 AM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/HERCULES/PR/2194/RUNTESTS/COMROT/C48_S2SWA_gefs_d40c039e/logs/2021032312/gefswaveinit.log
+ waveinit.sh[6]: echo

+ waveinit.sh[7]: echo '=============== START TO SOURCE FV3GFS WORKFLOW MODULES ==============='
=============== START TO SOURCE FV3GFS WORKFLOW MODULES ===============
+ waveinit.sh[9]: . /work2/noaa/stmp/GFS_CI_ROOT/HERCULES/PR/2194/global-workflow/ush/load_ufswm_modules.sh
++ load_ufswm_modules.sh[4]: [[ NO == \N\O ]]
++ load_ufswm_modules.sh[5]: echo 'Loading modules quietly...'
Loading modules quietly...
++ load_ufswm_modules.sh[6]: set +x
Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None
Lmod Warning: MODULEPATH directory:
"/work2/noaa/stmp/GFS_CI_ROOT/HERCULES/PR/2194/global-workflow/sorc/ufs_model.fd/tests"
has too many non-modulefiles (447). Please make sure that modulefiles are in
their own directory and not mixed in with non-modulefiles (e.g. source code)

End waveinit.sh at 17:45:07 with error code 1 (time elapsed: 00:00:01)

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA changed the title Feature/ci hercules CI Updates for FF on Hercules Jan 2, 2024
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed labels Jan 2, 2024
@emcbot emcbot added CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Jan 2, 2024
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA removed the CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed label Jan 2, 2024
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added the CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress label Jan 2, 2024
@NOAA-EMC NOAA-EMC deleted a comment from emcbot Jan 2, 2024
@emcbot emcbot added CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Jan 2, 2024
@emcbot
Copy link

emcbot commented Jan 2, 2024

Experiment C48_S2SWA_gefs_d40c039e  *** FAILED *** on Hercules
Experiment C48_S2SWA_gefs_d40c039e  with 1 tasks failed at 01/02/24 02:36:18 PM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/HERCULES/PR/2194/RUNTESTS/COMROT/C48_S2SWA_gefs_d40c039e/logs/2021032312/gefsefcs02.log

@TerrenceMcGuinness-NOAA
Copy link
Collaborator Author

TerrenceMcGuinness-NOAA commented Jan 2, 2024

mterry (hercules-login-1) global-workflow (feature/ci_hercules) $ /bin/ln -sf /work2/noaa/stmp/GFS_CI_ROOT/HERCULES/PR/2194/RUNTESTS/COMROT/C48_S2SWA_gefs_d40c039e/gefs.20210323/12/mem002/model_data/ocean/restart/20210328.120000.MOM.res.nc /work/noaa/stmp/mterry/RUNDIRS/C48_S2SWA_gefs_d40c039e/efcs.325808/mem002/MOM6_RESTART/
/bin/ln: /work/noaa/stmp/mterry/RUNDIRS/C48_S2SWA_gefs_d40c039e/efcs.325808/mem002/MOM6_RESTART/: cannot overwrite directory
mterry (hercules-login-1) global-workflow (feature/ci_hercules) $ /bin/ln -sf /work2/noaa/stmp/GFS_CI_ROOT/HERCULES/PR/2194/RUNTESTS/COMROT/C48_S2SWA_gefs_d40c039e/gefs.20210323/12/mem002/model_data/ocean/restart/20210328.120000.MOM.res.nc /work/noaa/stmp/mterry/RUNDIRS/C48_S2SWA_gefs_d40c039e/efcs.325808/mem002/MOM6_RESTART
mterry (hercules-login-1) global-workflow (feature/ci_hercules) $ 

I noticed that I can recreate the error when the extra / is included at the end of the command but not when it isn't

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA removed the CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed label Jan 4, 2024
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added the CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules label Jan 4, 2024
@emcbot emcbot added CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules and removed CI-Hercules-Ready **CM use only** PR is ready for CI testing on Hercules labels Jan 4, 2024
@emcbot
Copy link

emcbot commented Jan 4, 2024

CI Update on Hercules at 01/04/24 11:00:18 AM
============================================
Cloning and Building global-workflow PR: 2194
with PID: 3073448 on host: hercules-login-1

@emcbot emcbot added CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress and removed CI-Hercules-Building **Bot use only** CI testing is cloning/building on Hercules labels Jan 4, 2024
@emcbot
Copy link

emcbot commented Jan 4, 2024

Automated global-workflow Testing Results:

Machine: Hercules
Start: Thu Jan  4 11:02:09 CST 2024 on hercules-login-1.hpc.msstate.edu
---------------------------------------------------
Build: Completed at 01/04/24 11:12:00 AM
Case setup: Completed for experiment C48_ATM_42eb959f
Case setup: Completed for experiment C48_S2SW_42eb959f
Case setup: Completed for experiment C48_S2SWA_gefs_42eb959f
Case setup: Skipped for experiment C96C48_hybatmDA_42eb959f
Case setup: Skipped for experiment C96_atm3DVar_42eb959f

@emcbot emcbot added CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed and removed CI-Hercules-Running **Bot use only** CI testing on Hercules for this PR is in-progress labels Jan 4, 2024
@emcbot
Copy link

emcbot commented Jan 4, 2024

Experiment C48_S2SWA_gefs_42eb959f  *** FAILED *** on Hercules
Experiment C48_S2SWA_gefs_42eb959f  with 1 tasks failed at 01/04/24 11:27:08 AM
Error logs:
/work2/noaa/stmp/GFS_CI_ROOT/HERCULES/PR/2194/RUNTESTS/COMROT/C48_S2SWA_gefs_42eb959f/logs/2021032312/gefswaveinit.log

Oh shoot this branch does not have the updates to ufswm module load away from test.
@DavidHuber-NOAA has added that update to the PR that is running on Hercules now.
@WalterKolczynski-NOAA That should work fine because this branch with the needed updates in the CI scripts for Hercules is running in the CI cron scripts there now as well. If it runs to completion we should should then merge this PR too.

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA removed the CI-Hercules-Failed **Bot use only** CI testing on Hercules for this PR has failed label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Issue related to CI/CD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants