Skip to content

Releases: EarthWorksOrg/EarthWorks

Version 2.3.001 Bug Fix Release

08 Aug 20:01
Compare
Choose a tag to compare

This addresses an issue for nvhpc builds of any compset in the v2.3 release. A line was left in the nvhpc.cmake file in ccs_configs that causes builds to fail. The line is removed by this release.

Known Issues

Answer differences that were seen in the ew-rel test category in the v2.3 release is now being tracked. Compsets involving MPAS-O and MPAS-SI have answer differences when comparing forward runs to restart runs. See EarthWorksOrg/EarthWorks #66 for more information.

Please refer to the v2.3 Release Notes for the other Known Issues.

Version 2.3 Functional Release

01 Aug 23:39
Compare
Choose a tag to compare

The EarthWorks v2.3 release introduces these new features and fixes:

Optimizing OpenACC Data Transfers RRTMG-P: Updates to the interface between RRTMG-P and CAM allow variables to stay resident on the GPU between timesteps. This increases performance by reducing the overhead of data transfers.

Fix Correctness Issue: When doing longer science runs, marked differences were found between the EarthWorks v2.1 outputs and previous versions. This is addressed in this release by reverting MPAS-A to a version without OpenACC offloading.

Fix GPU Builds with Modified Physics Columns: Build failures during the final linking step when increasing pcols are addressed by adding the -mcmodel=medium flag, only during GPU builds. Increasing the number of physics columns processed per MPI rank (thus per GPU) reduces the kernel launch overhead and provides a performance boost.

Fix Archiving Step for MPAS-O and MPAS-SI: Some test failures from last release are addressed by adding a file that describes output files to archive for MPAS-O and MPAS-SI.

Description of Model Configurations (Compsets)

See EarthWorks Supported Configurations in the GitHub wiki for more details.

Testing

Tested Systems

NSF NCAR’s Derecho Supercomputer

The tests in this release occurred on Derecho.

CPU-only hardware Derecho’s CPU-only nodes consisted of dual-socket, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processors with 256 GB of DDR4 memory.

CPU/GPU hybrid hardware Derecho has GPU nodes consisting of single-socket, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processor with 512 GB of DDR4 memory plus 4 NVIDIA A100 GPUs each with 40 GB of onboard memory.

Tested Software Stacks

Compiler Versions

Derecho:

  • ifort (Intel Classic compiler version 2023.2.1)
  • ifx (Intel OneAPI version 2023..2.1)
  • Nvfortran (NVHPC fortran compiler version 24.3)
  • Gnufortran (compiler version 12.2.0)

Libraries

Derecho:

  • MPI (Cray MPICH version 8.1.27)
  • Parallel-NetCDF (version 1.12.3)
  • PIO2 (version 2.6.2)
  • ESMF (version 8.6.0)

Testing Results

Derecho create_test Results

To test this release, CPU-only tests were carried out on Derecho using the ew-pr and ew-rel categories as described in the v2.2 release.

5 Day Smoke Tests (ew-pr)

  • 120km FHS94 with GNU, Intel-OneAPI, and NVHPC (Overall: PASS)
  • 120km FKESSLER with NVHPC (Overall: PASS)
  • 120km QPC6 with NVHPC (Overall: PASS)
  • 120km F2000climoEW with NVHPC (Overall: PASS)
  • 120km FullyCoupledEW with NVHPC (Overall: PASS)
  • 120km CHAOS2000dev with NVHPC, Intel, and GNU (Overall: PASS)

All these runs ran to completion and no differences were found when comparing between two runs of this release. These results can serve as baselines going forward.

11 Day Exact Restart Tests (ew-rel)

  • 120km, 32L CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
  • 120km, 58L CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
  • 60km CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
    • These tests failed during init, too few resources (nodes) requested.
  • 30km CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
  • 15km, 58L CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
    • These tests didn’t make it far through simulation, future releases will up the amount of resources (nodes) requested.

For tests without another note, NVHPC runs failed during initialization (see Known Issues below) and Intel runs failed when comparing the original run to the restart run.

Known Issues

  • NVHPC compilers (tested nvfortran version 24.3, continues from previous releases): Initializing from restart fails.
    • Additional details: Any configuration (tested with QPC6, F2000ClimoEW, FullyCoupledEW) that attempts to restart from a previous run will fail in CAM subroutine dyn_init.
    • Resolutions affected: all supported resolution/level combinations.
    • Work around: Run without restart.

Version 2.2 Release

03 Jun 20:43
Compare
Choose a tag to compare

Version 2.2 Functional Release

The EarthWorks Version 2.2 release introduces these new features:

Multi-Platform Support: Version 2.2 is the first EarthWorks release with multi platform support. We have added GH1, a Grace-Hopper system at the Texas Advanced Computing Center (TACC). GH1 consists of two Grace-Hopper NVIDIA nodes: one compute node and one login/compile node. The Grace node component is an ARM-V9 72-core processor, Hopper is an NVIDIA Tesla H-100 GPU. We expect a substantially larger multi-node test system called Vista to replace GH1. Some caveats for GH1 release include:

  • GH1 testing has been performed with the NVHPC 24.1 compiler only.
  • GH1 Grace (CPU) testing has been performed for the FHS94, FKESSLER, and QPC6 (Aquaplanet) compsets only.
  • GH1 Hopper (GPU) offload testing has been performed for the FHS94 (Held-Suarez) test case only. MPS has been verified to work in this case of GPU offload.

NSF NCAR’s Derecho supercomputer remains the principal system supported in the EarthWorks release.

Multi-Component GPU Offload: Version 2.2 is the first functional release of a multi-component GPU offload capability, including the MPAS dynamical core, PUMAS microphysics (pumas_cam-release_v1.36) and RRTMG-P radiative transfer physics code. The release comes with the following caveats:

  • Basic functionality and correctness of the multi-component GPU offload release has been tested on F2000devEW compset only. We plan a more complete matrix of correctness tests of other compsets and resolutions in a later release.
  • The performance of the GPU offload version (particularly of the physics) has not been fully optimized.
  • We have not confirmed multi-component GPU offload on the Grace-Hopper platform.

Defining Compsets & Enabling create_test: The new approach provides a more “CESM-like” create, build, run test environment. This includes definitions of tests to be used with CIME’s create_test workflow, adjustments to default values (coupling intervals and component timesteps), and definitions of some commonly used EarthWorks-specific compsets. These additions will make testing EarthWorks simpler in the future and will allow generation and comparison against baselines.

The newly added compsets added include:

  • F2000climoEW: An analogue to the F2000climo compset in CAM, but with the CICE prescribed mode swapped for the MPAS-SI prescribed mode.
  • F2000devEW: An analogue to the F2000dev compset in CAM, again with MPAS-SI prescribed mode instead of CICE.
  • FullyCoupledEW: A compset that has been mentioned in other releases, formalized here. It uses active MPAS components for the atmosphere, ocean, and seaice.
  • CHAOS2000: The Coupled Hexagonal Atmosphere, Ocean, and Seaice compset. Like FullyCoupledEW, but with an active river-runoff (MOSART) component as well.
  • CHAOS2000dev: Uses “cam_dev” physics by default instead of “CAM6” physics.

The tests are defined for Derecho and grouped into the following categories:

  • ew-pr: contains some tests that are expected to be run when creating a PR to try to catch bugs, reversions, or changes that may affect EarthWorks. These tests try to consume a low amount of core-hours, so they are not exhaustive. In this release they are 5 day “smoke tests” (forward run only), on 120km, for each supported compset, with various compilers.
  • ew-ver: contains tests that can be run to verify the correctness of EarthWorks (especially versus CESM). In this release the only test described is a 1200 day “smoke test” of FHS94 to match what’s described in https://www.cesm.ucar.edu/models/simple/held-suarez. This group will be expanded in future releases.
  • ew-rel: contains a broader range of test cases that the EarthWorks team expects to pass (along with ew-pr) before creating a release. In this release we tested the CHAOS2000dev compset using an 11 day “exact restart” test, for a few resolutions, and for both the Intel and NVHPC compilers. These are a starting point, and will be expanded in future releases.

New Documentation: As we create more releases we hope to grow the community around EarthWorks. These documents help set some ground rules, start guiding potential contributors, and define the development practices already in place. These guides include:

Description of Model Configurations (Compsets)

See EarthWorks Supported Configurations in the GitHub wiki for more details.

Testing

Tested Systems

NSF NCAR’s Derecho Supercomputer

The majority of tests occurred on Derecho.

CPU-only hardware Derecho’s CPU-only nodes consisted of dual-socket, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processors with 256 GB of DDR4 memory.

CPU/GPU hybrid hardware Derecho has GPU nodes consisting of single-socket, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processor with 512 GB of DDR4 memory plus 4 NVIDIA A100 GPUs each with 40 GB of onboard memory.

TACC’s GH1 Test System

CPU/GPU hybrid hardware GH1 has 1 login/compile node and 1 compute node with the same hardware on each. The Grace (CPU) component is an ARM-V9 72-core processor, the Hopper component is an NVIDIA Tesla H-100 GPU.

Tested Software Stacks

Compiler Versions

Derecho:

  • ifort (Intel Classic compiler version 2023.2.1)
  • ifx (Intel OneAPI version 2023..2.1)
  • Nvfortran (NVHPC fortran compiler version 24.3)
  • Gnufortran (compiler version 12.2.0)

GH1:

  • Nvfortran (NVHPC fortran compiler version 24.1)

Libraries

Derecho:

  • MPI (Cray MPICH version 8.1.27)
  • Parallel-NetCDF (version 1.12.3)
  • PIO2 (version 2.6.2)
  • ESMF (version 8.6.0)

GH1:

  • MPI (OpenMPI version 4.1.7a1)
  • Parallel-NetCDF (version 1.12.3)
  • PIO2 (version 2.6.2)
  • ESMF (version v8.7.0b05)

Testing Results

Derecho create_test Results

To test this release, CPU-only tests were carried out on Derecho using the ew-pr and ew-rel categories as described above.

5 Day Smoke Tests (ew-pr)

  • 120km FHS94 with GNU, Intel-OneAPI, and NVHPC (Overall: PASS)
  • 120km FKESSLER with NVHPC (Overall: PASS)
  • 120km QPC6 with NVHPC (Overall: PASS)
  • 120km F2000climoEW with NVHPC (Overall: FAIL)
    • This test failed since the wrong resolution (mpasa120_mpasa120) was requested. Since this test uses MPASSI%PRES mode, it must use an oQU120 grid for ocean and sea ice. This is corrected but untested in this release.
  • 120km FullyCoupledEW with NVHPC (Overall: FAIL)
    • Same issue as with CHAOS2000dev below.
  • 120km CHAOS2000dev with NVHPC, Intel, and GNU (Overall: FAIL)
    • These tests ran through every step successfully except the final short-term archiving step. The archiver doesn’t know which files to copy for MPAS-O and MPAS-SI components, this will be resolved in a future release. Message: ERROR: No archive entry found for components: ['ICE', 'OCN']

11 Day Exact Restart Tests (ew-rel)

  • 120km, 32L CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
    • Failed due to errors accumulating in CLUBB routines leading to a segmentation fault. Messages from the run display Infinity and NaN values in array invrs_tau_xp2_zm and “Error calling advance_xp2_xpyp”.
  • 120km, 58L CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
  • 60km CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
  • 30km CHAOS2000dev with NVHPC and Intel (Overall: FAIL)
  • 15km, 58L CHAOS2000dev with NVHPC and Intel (Overall: FAIL)

Known Issues

  • NVHPC compilers (tested nvfortran version 23.X, from previous releases): Initializing from restart fails.
    • Additional details: Any configuration (tested with QPC6, F2000ClimoEW, FullyCoupledEW) that attempts to restart from a previous run will fail in CAM subroutine dyn_init.
    • Resolutions affected: all supported resolution/level combinations.
    • Work around: Run without restart.

Known issues by compset/compiler/resolution (CPU-only):

See the Derecho create_test Results above

Known issues by compset/compiler/resolution (Hybrid CPU-GPU)

  • Drastic effect of Physics Columns (PCOLS) on GPU performance
    • Additional details: PCOLS sets the number of columns an MPI rank processes during the run. When running multiple physics on GPUs, set PCOLS to a bigger number.
    • Resolutions affected: all supported resolution/level combinations that use a combination of GPUs, cam_dev physics, and rrtmgp_gpu radiation.
    • Work around: Change PCOLS using xmlchange during the setup of a case. E.g. for a case just created, use this command to request rrtmgp_gpu and set a valid PCOLS value: ./xmlchange --append CAM_CONFIG_OPTS="-rad rrtmgp_gpu -pcols 2048"
    • NOTE: 2048 is the maximum amount of PCOLS we can use on Derecho with NVHPC. Any number greater than 2048 causes a build error. Numbers below 2048 result in worse performance in our ...
Read more

Version 2.1.001 Patch Release

18 Mar 22:57
Compare
Choose a tag to compare

Version 2.1.001 Patch Release

This patch release adds new features and addresses issues in the EarthWorks version 2.1 release. Specifically:

  • Fixes compilation of the “FullyCoupled” compset on CPUs which was broken by changes in EarthWorksOrg/mpas-framework in the v2.1 release.
  • Eliminates the need to use the CICE model in “prescribed mode” by replacing it with a simliar MPAS-SeaIce model on the native MPAS grid. This change may also address a CICE compilation issue observed on Perlmutter.
    • Compiler affected: NVHPC v24.1.
    • Cases that use NVHPC compilers should use MPASSI%PRES instead of CICE%PRES in their compset requests.
    • See EarthWorksOrg/EarthWorks Issue #23 for a discussion of the compilation issue. Note: the issue has not been reproduced on NCAR’s Derecho system.
  • Adds RRTMGP as a radiation parameterization by incorporating upstream work.

Try the New Features

To try the “prescribed mode” for MPAS-SeaIce, you will need to request it in your compset value when you create a case with MPASSI%PRES. For example, try --compset 2000_CAM60_CLM50%SP_MPASSI%PRES_DOCN%DOM_MOSART_CISM2%NOEVOLVE_SWAV when creating a case. Note: you should also use a grid with an ocean defined like mpasa120_oQU120 for MPAS-SeaIce to work correctly.

To try RRTMGP, you should first create a case and then (within the case directory) run ./xmlchange --append CAM_CONFIG_OPTS=“-rad rrtmgp”.

Please let the EarthWorks team know if you have any problems by reporting an Issue and describing what you noticed!

Known Issues

Known issues with the EarthWorks 2.1.001 patch release include:

  • Compiling any case that uses multiple MPAS-based components (CAM-MPAS, MPAS-Ocean, MPAS-SeaIce) with GPU offload flags enabled causes the build to fail due to Multiple definition errors.
  • Cases that use CAM-MPAS (QPC6, F2000Climo, Fully Coupled) and NVHPC compilers and attempt a restart, fail during initialization of the restart run.

Version 2.1 Release

14 Feb 20:32
Compare
Choose a tag to compare

Version 2.1 Functional Release

Version 2.1 of the EarthWorks Modeling System is a patch release that addresses some issues in the November 2023 Version 2.0 release. In particular, Version 2.1:

  • Patches aerosol_optics_cam.F90 for use with nvhpc compiler suite. It is critical for EarthWorks Software to stay synchronized with the evolution of the CESM code base and simultaneously maintain the ability to compile with the NVIDIA compiler (nvhpc) for GPU offload. This patch lifts the software version in the CESM externals from CAM6.3 tag 124 to 145 level.
  • Fixes an ifx (Intel OneAPI) compiler error with MPAS-Seaice file mpas_seaice_core_interface.F.
  • First release of MPAS-7x GPU-offload. At present, testing of this capability is confined to the FHS94 and F2000Climo compsets and 120 km, 60 km, 30 km resolutions.

NB: All of these configurations should be considered functional releases, i.e. that they have not been scientifically verified.

Description of Model Configurations (Compsets)

See EarthWorks Supported Configurations in the GitHub wiki for more details.

Testing

Tested Systems

Tests were performed on NCAR’s Derecho supercomputer.

CPU-only tests: Derecho’s CPU-only nodes consisted of dual, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processors with 256 GB of DDR4 memory.
CPU/GPU hybrid tests: Derecho has GPU nodes consisting of a single 3rd Gen AMD EPYC™ 7763 Milan processor with 512 GB of DDR4 memory plus 4 NVIDIA A100 GPUs each with 40 GB of onboard memory.

Tested Software Stacks

Compiler Versions

Derecho:

  • ifort (Intel Classic compiler version 2021.8)
  • ifx (Intel OneAPI version 2023.0)
  • Nvfortran (NVHPC fortran compiler version 23.5)
  • Gnufortran (compiler version 12.2)

Libraries

Derecho:

  • ESMF (8.6)
  • PIO2 (2.6.2)
  • MPI (Cray MPICH (8.1.25))

Testing Results

Known issues by compset/compiler/resolution (CPU-only):

Supported resolution/level combinations:

  • 120km_L32

  • 60km_L32

  • 30km_L32

  • 15km_L58

  • NVHPC compilers (tested nvfortran version 23.X): Initializing from restart fails.

    • Additional details: Any configuration (tested with QPC6, F2000Climo, Fully-Coupled) that attempts to restart from a previous run will fail in CAM subroutine dyn_init.
    • Resolutions affected: all supported resolution/level combinations.
    • Work around: Run without restart.
  • NVHPC compiler requires patched version of aerosol_optics_cam.F90 module: Module will not build or run correctly without patched version.

    • Additional Details: The nvhpc compiler functionality with tag 145 software was accomplished by making two patches to the aerosol_optics_cam.F90 module. The first patch addresses an internal compiler error encountered in the nvhpc 23.x compilers related to the use of Fortran complex number accessors (see ESCOMP/CAM Issue #881). The second is a runtime error addressed (for the time being) by removing dynamic dispatch polymorphism in the F2003 code in one place in this module (ESCOMP/Cam Issue #945).The team is working with NVIDIA to permanently correct these issues in a future version of the compiler.
    • Resolutions affected: All supported resolution/level combinations involving the atmospheric component (CAM).
    • Work around: Use the patched version of aerosol_optics_cam.F90 distributed in release version 2.1.
  • FullyCoupled compset (any compiler): Due to changes in the mpas-framework directory to match with MPAS-7.x OpenACC framework, runs fail.

    • Example error message: ERROR: shr_reprosum_calc ERROR: NaNs or INFs in input

Known issues by compset/compiler/resolution (Hybrid CPU-GPU)

  • F2000Climo-GPU-Offload:
    • Configuration notes: Includes PUMAS and MPAS-7.x GPU offload with OpenACC directives. Does not work with nvfortran 24.1 compiler due to an compilation issue with the CICE model. .
    • Supported resolution/level combinations:
      • 120km_L32
      • 60km_L32
      • 30 km_L32
    • Tested on:
      • Derecho GPU-partition
    • Compiler Issues:
      • See: Known NVHPC compilation issues above.

FHS94-GPU-Offload:

  • Configuration notes: CAM-Physics not included (e.g. PUMAS GPU offload). So only MPAS-7.x is offloaded to GPUs.
  • Supported resolution/level combinations:
    • 120km_L32
    • 60km_L32
    • 30km_L32
  • Tested on:
    • Derecho GPU-partition

FullyCoupled Compset GPU-Offload:

  • Configuration notes: Builds fail due to redefinition of of routines mpas_dmpar.F.
  • Example error message: nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6481_gpu' in '${CASEDIR}/bld/lib/libocn.a:mpas_dmpar.o', first defined in '${CASEDIR}/bld/lib/libice.a:mpas_dmpar.o'

Version 2.0.001 Bug Fix Release

19 Jan 23:29
Compare
Choose a tag to compare

This release addresses a critical issue with the EarthWorksOrg repositories:

  • Due to SVN access of GitHub being sunset on January 8th, 2024, fresh clones of EarthWorksOrg/CAM and EarthWorksOrg/EarthWorks could not complete the manage_externals/checkout_externals setup step. In this release, it is fixed by merging in the change from upstream. Details

Known issues

These remain the same as the Version 2.0 Release. Please refer to those release notes for Known Issues and other information.

Version 2.0 Functional Release

08 Nov 23:42
Compare
Choose a tag to compare

Version 2 of the EarthWorks Modeling System extends support from Version 1 {30,60,120 km}_L32 to 15km_L58 (58-level) configurations, and introduces support for the Intel OneAPI (ifx) and NVIDIA (nvfortran) compilers. Also, for the first time, support for GPU offload in the case of PUMAS model physics running with the F2000Climo compset at 30km_L32 configuration, and is introduced as a technical demonstration.

All of these configurations should be considered functional releases, i.e. that they have not been scientifically verified.

Description of Model Configurations (Compsets)

See EarthWorks Supported Configurations in the GitHub wiki.

Testing

Tested Systems

Unless otherwise indicated, all tests were performed on NCAR’s Derecho supercomputer.

CPU-only tests used Derecho’s CPU-only nodes consisting of dual, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processors with 256 GB of DDR4 memory.

CPU/GPU hybrid tests used Derecho’s GPU nodes consisting of a single 3rd Gen AMD EPYC™ 7763 Milan processors with 512 GB of DDR4 memory and 4 NVIDIA A100 GPUs.

Tested Software Stacks

Compiler Versions

  • ifort (Intel Classic compiler version 2021.8)
  • ifx (Intel OneAPI version 2023.0)
  • Nvfortran (NVHPC fortran compiler version 23.5)
  • Gnufortran (compiler version 12.2)

Libraries

  • ESMF (version 8.6)
  • PIO2 (version 2.6.2)

Testing Results

Known cross cutting NVHPC compilation issues

  • NVHPC compilers (tested nvfortran version 23.5): Compilation fails

    • Additional details: NVHPC compilers won’t build any of the V2 release compsets by default. Line 65 of hcoio_read_std_mod.F90 breaks Fortran free format line length convention.
    • Work around: Once the offending line is split into multiple, builds succeed. This has been fixed upstream and will be fixed for EarthWorks in a future release. See CAM Issue #871 for more information.
  • NVHPC compilers (tested nvfortran version 23.5): Initializing from restart fails.

    • Additional details: Any configuration (tested with QPC6, F2000Climo, Fully-Coupled) that attempts to restart from a previous run will fail in CAM subroutine dyn_init.
    • Work around: Run without restart.

Known issues by compset/compiler/resolution (CPU-only):

  • Fully-Coupled simulations:

    • Supported resolution/level combinations:

      • 120km_L32
      • 60km_L32
      • 30km_L32
      • 15km_L58
    • NVHPC compiler (nvfortran 23.5)

      • See Known cross cutting issues section above.
    • Gnu compiler (gfortran version 12.2)

      • No known issues.
    • Intel OneAPI compiler (version 2023.0): Compilation failed with internal compiler error.

      • Resolutions affected: all supported resolution/level combinations
      • Additional details: Build log files contain error #5633: **Internal compiler error: segmentation violation signal raised** Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be the explicit cause of this error. compilation aborted for mpas_seaice_core_interface.F
      • Work around: use ifort (Intel classic) compiler version 2021.8 instead.
  • QPC6 (Aquaplanet):

    • Supported resolution/level combinations:

      • 120km_L32
      • 60km_L32
      • 30km_L32
      • 15km_L58
    • NVHPC compiler (nvfortran 23.5)

      • See Known cross cutting issues section above.
    • Gnu compiler (gfortran version 12.2)

      • No known issues.
    • Intel OneAPI compiler (version 2023.0):

      • No known issues.
  • F2000Climo (coupled atmosphere, land surface):

    • Supported resolution/level combinations:

      • 120km_L32
      • 60km_L32
      • 30km_L32
      • 15km_L58
    • NVHPC compiler (nvfortran 23.5)

      • See Known cross cutting issues section above.
    • Gnu compiler (gfortran version 12.2)

      • No known issues.
    • Intel OneAPI compiler (version 2023.0):

      • No known issues.
  • FHS94 (Held-Suarez) and FKESSLER (Kessler microphysics) test cases:

    • Supported resolution/level combinations:

      • 120km_L32
      • 60km_L32
      • 30km_L32
      • 15km_L32
    • NVIDIA nvhpc compiler (version 23.5):

      • No known issues.
    • Gnu compiler (gfortran version 12.2)

      • No known issues.
    • Intel OneAPI compiler (version 2023.0):

      • No known issues.

Known issues by compset/compiler/resolution (Hybrid CPU-GPU)

  • F2000Climo-GPU-PUMAS-only:
    • Supported resolution/level combinations:

      • 120km_L32
      • 60km_L32
    • NVIDIA nvhpc compiler (nvfortran version 23.5):

      • See Known cross cutting NVHPC compilation issues above.

Version 1.0 Functional Release

07 Mar 17:47
Compare
Choose a tag to compare

Functional release of EarthWorks with CPU-only support of 5 compsets on 120km, 60km, and 30km MPAS grids. Testing of these compsets utilized the Intel and GNU compiler suites and was conducted on NCAR's Cheyenne (Intel Broadwell CPUs) and Gust (AMD EPYC CPUs) systems.


Description of Model Configurations (Compsets):

EarthWorks supports a set of CESM component sets (compsets or test cases) of increasing complexity and realism. However in EarthWorks, the atmosphere, ocean, and sea ice are all based on the MPAS dynamical core framework. The supported test cases range from atmospheres with idealized physics,to full CAM6-MPAS atmospheres that are coupled to additional components to form Earth System Models.

The test cases include:

FHS94: The HELD-SUAREZ test case replaces the CAM physics package with a simple relaxation of the temperature field toward a zonally symmetric equilibrium profile and simple linear drag at the lower boundary. This test case follows the specifications outlined by Held and Suarez (1994), Bull. Amer. Met. Soc., 75, 1825-1830. Because of its long run time (1200 days) this idealized case is supported in EarthWorks only at the 120 km resolution with 32 layers. The FHS94 test case and compset are described in more detail at: https://www2.cesm.ucar.edu/models/simpler-models/held-suarez.html

FKESSLER: Is a moist baroclinic wave test case with KESSLER microphysics. The version tested here uses the CAM-MPAS model. Like FHS94, KESSLER is a simplified model for validating the dynamical core and is supported in EarthWorks only at the 120 km resolution with 32 layers. The 10-day test, and the validation its results, is described in: https://www2.cesm.ucar.edu/models/simpler-models/fkessler/index.html

F2000Climo: Present day (2000) atmosphere + (CTSM) land surface with prescribed (data) ocean. See https://www2.cesm.ucar.edu/models/cesm2/config/compsets.html for configuration details.

QPC6 (AquaPlanet): is an idealized global atmospheric model in which the planetary surface is covered by water. Most commonly, the sea surface temperature is prescribed by an analytic zonal  distribution in latitude.  For an analysis of CESM (CAM5) results for the AquaPlanet test case, consult this example from Brian Medeiros,  David L. Williamson,  Jerry G. Olson, Reference Aquaplanet climate in the Community Atmosphere Model, Version 5. JAMES, Volume 8, Issue 1, March 2016. (http://dx.doi.org/10.1002/2015MS000593). See https://www2.cesm.ucar.edu/models/cesm2/config/compsets.html for compset details.

Fully Coupled:  CAM-MPAS Atmosphere 32 layers / MPAS Ocean / MPAS SeaIce on identical icosahedral grids; plus CTSM land, and stub river runoff, glacier, and wave components. Initial condition is a Jan 1 atmosphere, ocean and sea ice restarts from one year into a spin up from rest, and Levitus climatology.


Testing

Tested Systems

Cheyenne: Production high-performance computing system for NCAR, a SGI ICE XA Cluster that achieves 5.34 peak petaflops.

  • 4032 compute nodes
    • 36 total core per node, dual-socket Intel Xeon E5-2697V4 CPUs
    • 64 GB memory per node, DDR4-2400 (small amount of nodes w/ 128 GB)
  • Mellanox EDR InfiniBand interconnect
  • See NCAR CISL article about the Cheyenne supercomputer for more

Gust: A small test system designed to prototype and mimic the hardware, software, user environment, and job execution configuration of the Derecho system, an HPE-Cray EX cluster.

Tested Software Stacks

NOTE: these are the software names and versions as they appear in NCAR modules. Users can use these lists when making comparisons to any other release or tests of EarthWorks.

Cheyenne:

  • gnu/10.1.0, mpt/2.25, netcdf-mpi/4.8.1, pnetcdf/1.12.2, pio/2.5.6, esmf-8.2.0b23-ncdfio-mpt-O, cmake/3.18.2, openblas/0.3.9
  • intel/19.1.1, mpt/2.22, netcdf-mpi/4.8.0, pnetcdf/1.12.2, pio/2.5.6, esmf_libs/8.2.0, esmf-8.2.0b23-ncdfio-mpt-O, cmake/3.18.2, mkl/2020.0.1

Gust: 

  • gcc/12.1.0, cray-mpich/8.1.21, netcdf-mpi/4.9.0, parallel-netcdf/1.12.3, parallelio/2.5.10, esmf/8.4.1b02, cmake, cray-libsci/22.11.1.2
  • intel/2021.7.1, cray-mpich/8.1.24, netcdf-mpi/4.9.1, parallel-netcdf/1.12.3 parallelio/2.5.10, esmf/8.5.0b17, cmake, mkl/2022.2.1

Testing Results

Cheyenne Summary:

  • 30 possible tests, 22 attempted, 8 triaged, 0 not attempted
  • 21/22 tests attempted passed
  • 1/22 tests failed, see below for more information

Gust Summary:

  • 30 possible tests, 14 attempted, 8 triaged, 8 not attempted
  • 10/14 tests attempted passed
  • 4/14 tests attempted failed, see below for more information

With 5 compsets, 3 grids, 2 systems, and compiler suites there were a total of 60 tests considered. Of those, runs of FHS94 and FKESSLER compsets at 60 and 30 km resolutions were “triaged” or ignored since they are simpler atmosphere-only test cases.

Results as Tables

Table showing GNU compiler testing on Cheyenne and Gust systems

Figure showing the testing coverage with GNU compiler suite on Cheyenne and Gust systems

Table showing Intel compiler testing on Cheyenne and Gust systems

Figure showing the testing coverage with Intel compiler suite on Cheyenne and Gust systems

These figures demonstrate that:

  • That FHS94 and FKESSLER were “Triaged” at 60 km and 30 km resolutions.
  • When built with GNU compilers on Cheyenne, run of the Fully Coupled compset at 30 km fails when writing monthly mean diagnostic MPAS-Ocean output. This issue is currently being debugged, so the fix will be included in a future release.
  • On Gust, Fully Coupled builds fail with both Intel and GNU compiler pointing towards build issues in MPAS-Ocean. We believe that this build error is because of some system configuration/software that is enabled on Cheyenne but not on Gust. We are currently tracking it down.
  • Due to the configuration issue above, input file issues, and since Gust is a test system the 30 km runs weren't attempted for this release.

Known Software Issues

  • To run FKESSLER the output variables requested needs to be reduced. Refer to this section of our FKESSLER script on Cheyenne for help.
  • 30 km runs of Fully Coupled compset with GNU fails at run-time when writing monthly diagnostic output.
  • One file of MPAS-SI must be built without optimization or model runs including it will fail in the first timestep due to "failing State validation checks." See EarthWorksOrg/cime PR#1 and PR#2 for more context.
  • Fully Coupled code fails to build on the Gust system with both Intel and GNU compiler. This issue is specific to the Gust system and we are tracking down the software and system configuration differences between Gust and Cheyenne systems.
  • Building with the cray-libsci libraries and Intel compilers on Cray Systems (Gust) causes a run-time error. This configuration throws a segmentation fault due to the incompatibility of libsci and Intel. Just use the Intel mkl library instead.