Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Orion modulefiles to Rocky 9 #1159

Closed
RussTreadon-NOAA opened this issue Jun 10, 2024 · 4 comments · Fixed by #1180
Closed

Update Orion modulefiles to Rocky 9 #1159

RussTreadon-NOAA opened this issue Jun 10, 2024 · 4 comments · Fixed by #1180

Comments

@RussTreadon-NOAA
Copy link
Contributor

Received the following from RDHPCS Management

Orion’s Operating System (OS) and software stack is scheduled to be upgraded during a two day downtime, starting on Wednesday, June 12th and going through Thursday, June 13th. The OS on Orion will be upgraded from CentOS 7 to Rocky 9, another derivative of Red Hat Linux.

This issue is opened to document the updating of modulefiles/EVA/orion.lua and modulefiles/GDAS/orion.intel.lua to Rocky 9

@RussTreadon-NOAA
Copy link
Contributor Author

@DavidHuber-NOAA notes that spack-stack #981 addresses the Orion Rocky 9 update from the module perspective.

@RussTreadon-NOAA
Copy link
Contributor Author

As a test, clone g-w develop at 5af325a6 on Orion following Rocky 9 upgrade. This snapshot of g-w develop uses GDASApp at 368c9c5. Copy GDASApp modulefiles/GDAS/hercules.intel.lua to orion.intel.lua. Build GDASApp. Run test_gdasapp. 36 out of 48 test pass.

77% tests passed, 11 tests failed out of 48

Label Time Summary:
gdas-utils    =  11.54 sec*proc (11 tests)
script        =  11.54 sec*proc (11 tests)

Total Test time (real) = 1321.78 sec

The following tests FAILED:
        1843 - test_gdasapp_soca_JGLOBAL_PREP_OCEAN_OBS (Failed)
        1844 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP (Failed)
        1845 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_BMAT (Failed)
        1846 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_RUN (Failed)
        1847 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_ECEN (Failed)
        1848 - test_gdasapp_soca_copy_scratch (Failed)
        1849 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_CHKPT (Failed)
        1850 - test_gdasapp_soca_JGDAS_GLOBAL_OCEAN_ANALYSIS_POST (Failed)
        1851 - test_gdasapp_soca_socahybridweights (Failed)
        1852 - test_gdasapp_soca_incr_handler (Failed)
        1853 - test_gdasapp_soca_ens_handler (Failed)

All failures except test_gdasapp_soca_copy_scratch are due to

sbatch: error: invalid partition specified: hercules
sbatch: error: Batch job submission failed: Invalid partition name specified

test/soca/gw/CMakeLists.txt sets variable MACHINE via

# Identify machine
set(MACHINE "container")
IF (IS_DIRECTORY /work2)
  IF (IS_DIRECTORY /apps/other)
    set(MACHINE "hercules")
    set(PARTITION "hercules")
  ELSE()
    set(MACHINE "orion")
    set(PARTITION "orion")
  ENDIF()
ENDIF()
IF (IS_DIRECTORY /scratch2/NCEPDEV/)
  set(MACHINE "hera")
  set(PARTITION "hera")
ENDIF()

IF (IS_DIRECTORY /lfs/h2/)
   set(MACHINE "wcoss2")
ENDIF()

Directory /apps/other exists on Orion following the Rocky 9 upgrade. Thus, we wind up with MACHINE and PARTITION set to hercules. I do not know if there remain any directories unique to Orion and Hercules after the Rocky 9 upgrade which we can use to distinguish between the machines.

Test test_gdasapp_soca_copy_scratch failed due to an expected directory

/work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd/build/gdas/test/soca/gw/testrun/testjjobs/RUNDIRS/gdas_test/gdasocnanal_12/

not being present. This absence of this directory is likely due to failed soca tests prior to this test.

FYI @guillaumevernieres - we need to figure out how to distinguish between Orion and Hercules following the Rocky 9 upgrade.

@RussTreadon-NOAA
Copy link
Contributor Author

build.sh sets BUILD_TARGET. Use this to set MACHINE and PARTITION via the following changes

build.sh

@@ -87,7 +87,7 @@ case ${BUILD_TARGET} in
     ;;
 esac

-CMAKE_OPTS+=" -DCLONE_JCSDADATA=$CLONE_JCSDADATA"
+CMAKE_OPTS+=" -DCLONE_JCSDADATA=$CLONE_JCSDADATA -DMACHINE=$BUILD_TARGET"

 BUILD_DIR=${BUILD_DIR:-$dir_root/build}
 if [[ $CLEAN_BUILD == 'YES' ]]; then

test/soca/gw/CMakeLists.txt

@@ -10,25 +10,14 @@ add_test(NAME test_gdasapp_soca_prep
       ENVIRONMENT "PYTHONPATH=${PROJECT_BINARY_DIR}/ush:${PROJECT_SOURCE_DIR}/../../ush/python/wxflow/src:$ENV{PYTHONPATH}")

 # Identify machine
-set(MACHINE "container")
-IF (IS_DIRECTORY /work2)
-  IF (IS_DIRECTORY /apps/other)
-    set(MACHINE "hercules")
-    set(PARTITION "hercules")
-  ELSE()
-    set(MACHINE "orion")
-    set(PARTITION "orion")
-  ENDIF()
-ENDIF()
-IF (IS_DIRECTORY /scratch2/NCEPDEV/)
-  set(MACHINE "hera")
+if (MACHINE STREQUAL "hercules")
+  set(PARTITION "hercules")
+ELSEIF (MACHINE STREQUAL "orion")
+  set(PARTITION "orion")
+ELSEIF (MACHINE STREQUAL "hera")
   set(PARTITION "hera")
 ENDIF()

-IF (IS_DIRECTORY /lfs/h2/)
-   set(MACHINE "wcoss2")
-ENDIF()
-
 # Clean-up
 add_test(NAME test_gdasapp_soca_run_clean
   COMMAND  ${CMAKE_COMMAND} -E remove_directory ${PROJECT_BINARY_DIR}/test/soca/gw/testrun/testjjobs)

Also need to add hack to g-w workflow/hosts.py. g-w issue #2695 reports a bug in hosts.py following the Orion Rocky 9 upgrade. The hack forces machine=ORION when hosts.py is executed. This hack is required for GDASApp ctests which run g-w jobs.

Build GDASApp inside g-w on Orion with the hosts.py hack and the above GDASApp local changes in place. Run ctests. 48 out of 48 tests pass.

Test project /work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd/build
      Start 1489: test_gdasapp_util_coding_norms
 1/48 Test #1489: test_gdasapp_util_coding_norms ........................   Passed    4.56 sec
      Start 1490: test_gdasapp_util_ioda_example
 2/48 Test #1490: test_gdasapp_util_ioda_example ........................   Passed   10.26 sec

...

      Start 1869: test_gdasapp_atm_jjob_ens_final
47/48 Test #1869: test_gdasapp_atm_jjob_ens_final .......................   Passed   42.23 sec
      Start 1870: test_gdasapp_aero_gen_3dvar_yaml
48/48 Test #1870: test_gdasapp_aero_gen_3dvar_yaml ......................   Passed    0.51 sec

100% tests passed, 0 tests failed out of 48

Label Time Summary:
gdas-utils    =  23.59 sec*proc (11 tests)
script        =  23.59 sec*proc (11 tests)

Total Test time (real) = 1622.32 sec

The above changes are in /work2/noaa/da/rtreadon/git/global-workflow/develop/sorc/gdas.cd

        modified:   build.sh
        modified:   modulefiles/GDAS/orion.intel.lua
        modified:   test/soca/gw/CMakeLists.txt

Note that orion.intel.lua has been updated to specify

prepend_path("MODULEPATH", '/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/modulefiles/Core')

@RussTreadon-NOAA
Copy link
Contributor Author

Changes to enable GDASApp to build and run on Orion following the Rocky 9 upgrade will be committed to RussTreadon-NOAA:feature/orion_rocky9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant