Skip to content

Version 2.1 Release

Compare
Choose a tag to compare
@gdicker1 gdicker1 released this 14 Feb 20:32
· 82 commits to main since this release

Version 2.1 Functional Release

Version 2.1 of the EarthWorks Modeling System is a patch release that addresses some issues in the November 2023 Version 2.0 release. In particular, Version 2.1:

  • Patches aerosol_optics_cam.F90 for use with nvhpc compiler suite. It is critical for EarthWorks Software to stay synchronized with the evolution of the CESM code base and simultaneously maintain the ability to compile with the NVIDIA compiler (nvhpc) for GPU offload. This patch lifts the software version in the CESM externals from CAM6.3 tag 124 to 145 level.
  • Fixes an ifx (Intel OneAPI) compiler error with MPAS-Seaice file mpas_seaice_core_interface.F.
  • First release of MPAS-7x GPU-offload. At present, testing of this capability is confined to the FHS94 and F2000Climo compsets and 120 km, 60 km, 30 km resolutions.

NB: All of these configurations should be considered functional releases, i.e. that they have not been scientifically verified.

Description of Model Configurations (Compsets)

See EarthWorks Supported Configurations in the GitHub wiki for more details.

Testing

Tested Systems

Tests were performed on NCAR’s Derecho supercomputer.

CPU-only tests: Derecho’s CPU-only nodes consisted of dual, 64-core, 3rd Gen AMD EPYC™ 7763 Milan processors with 256 GB of DDR4 memory.
CPU/GPU hybrid tests: Derecho has GPU nodes consisting of a single 3rd Gen AMD EPYC™ 7763 Milan processor with 512 GB of DDR4 memory plus 4 NVIDIA A100 GPUs each with 40 GB of onboard memory.

Tested Software Stacks

Compiler Versions

Derecho:

  • ifort (Intel Classic compiler version 2021.8)
  • ifx (Intel OneAPI version 2023.0)
  • Nvfortran (NVHPC fortran compiler version 23.5)
  • Gnufortran (compiler version 12.2)

Libraries

Derecho:

  • ESMF (8.6)
  • PIO2 (2.6.2)
  • MPI (Cray MPICH (8.1.25))

Testing Results

Known issues by compset/compiler/resolution (CPU-only):

Supported resolution/level combinations:

  • 120km_L32

  • 60km_L32

  • 30km_L32

  • 15km_L58

  • NVHPC compilers (tested nvfortran version 23.X): Initializing from restart fails.

    • Additional details: Any configuration (tested with QPC6, F2000Climo, Fully-Coupled) that attempts to restart from a previous run will fail in CAM subroutine dyn_init.
    • Resolutions affected: all supported resolution/level combinations.
    • Work around: Run without restart.
  • NVHPC compiler requires patched version of aerosol_optics_cam.F90 module: Module will not build or run correctly without patched version.

    • Additional Details: The nvhpc compiler functionality with tag 145 software was accomplished by making two patches to the aerosol_optics_cam.F90 module. The first patch addresses an internal compiler error encountered in the nvhpc 23.x compilers related to the use of Fortran complex number accessors (see ESCOMP/CAM Issue #881). The second is a runtime error addressed (for the time being) by removing dynamic dispatch polymorphism in the F2003 code in one place in this module (ESCOMP/Cam Issue #945).The team is working with NVIDIA to permanently correct these issues in a future version of the compiler.
    • Resolutions affected: All supported resolution/level combinations involving the atmospheric component (CAM).
    • Work around: Use the patched version of aerosol_optics_cam.F90 distributed in release version 2.1.
  • FullyCoupled compset (any compiler): Due to changes in the mpas-framework directory to match with MPAS-7.x OpenACC framework, runs fail.

    • Example error message: ERROR: shr_reprosum_calc ERROR: NaNs or INFs in input

Known issues by compset/compiler/resolution (Hybrid CPU-GPU)

  • F2000Climo-GPU-Offload:
    • Configuration notes: Includes PUMAS and MPAS-7.x GPU offload with OpenACC directives. Does not work with nvfortran 24.1 compiler due to an compilation issue with the CICE model. .
    • Supported resolution/level combinations:
      • 120km_L32
      • 60km_L32
      • 30 km_L32
    • Tested on:
      • Derecho GPU-partition
    • Compiler Issues:
      • See: Known NVHPC compilation issues above.

FHS94-GPU-Offload:

  • Configuration notes: CAM-Physics not included (e.g. PUMAS GPU offload). So only MPAS-7.x is offloaded to GPUs.
  • Supported resolution/level combinations:
    • 120km_L32
    • 60km_L32
    • 30km_L32
  • Tested on:
    • Derecho GPU-partition

FullyCoupled Compset GPU-Offload:

  • Configuration notes: Builds fail due to redefinition of of routines mpas_dmpar.F.
  • Example error message: nvlink error : Multiple definition of 'mpas_dmpar_mpas_dmpar_exch_halo_1d_real_acc_6481_gpu' in '${CASEDIR}/bld/lib/libocn.a:mpas_dmpar.o', first defined in '${CASEDIR}/bld/lib/libice.a:mpas_dmpar.o'