Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UFS-WM cpld_debug_p8 and cpld_control_p8 gnu test case hangs on hera #2263

Open
jkbk2004 opened this issue May 3, 2024 · 5 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@jkbk2004
Copy link
Collaborator

jkbk2004 commented May 3, 2024

Description

To Reproduce:

Additional context

Failure message from error log for cpld_debug_p8 and cpld_control_p8 gnu.


The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release.
Workarounds are to run on a single node, or to use a system with an RDMA
capable network such as Infiniband.

Output

@jkbk2004 jkbk2004 added the bug Something isn't working label May 3, 2024
@jkbk2004
Copy link
Collaborator Author

jkbk2004 commented May 3, 2024

@uturuncoglu @RatkoVasic-NOAA This issue could be an issue with openmpi (especially old version of gnu) on hera. But worth to note that the issue became visible at the call ESMF_InfoBroadcast(info, rootPet=fcstPetList(1), rc=rc).

@junwang-noaa
Copy link
Collaborator

An ticket about this issue was created on ESMF support.

@natalie-perlin
Copy link
Collaborator

natalie-perlin commented May 28, 2024

An update for Hera GNU:

Spack-stacks 1.5.1 and 1.6.0 with packages for ufs-weather-model and ufs-srweather-app have been built on Hera with GNU/13.3.0 compiler. Spack-stack v1.6.0 built with ESMF/8.6.1 and MAPL/2.46.0.

A first check of running the RTs: some pass, some RT fail

  • some runtime failures - model run completed but comparison with the baseline fails (gnu9.2 compiler baseline will not result in bit-to-bit matching)
  • s2swa would not compile with v.1.6.0 as it includesesmf/8.6.1 and mapl/2.46.0 - ahead of the dev. branch , and model has troubles finding esmf/ecognizing that the esmf has been found ( maybe the GOCART code needs to have the versions of these libraries updated, so they are found during the build?)
  • a couple of tests do fail with memory issues (spack-stack v1.5.1)

More testing is needed maybe on the specific tests.

Locations of the spack-stacks (NB: packages for UFS-WM and UFS-SRW only!)

/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13.3/envs/ufs-wm-srw-rocky8
/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/envs/ufs-wm-srw-rocky8/

My WM tests with spack-stack-1.6.0 are in
/scratch1/NCEPDEV/nems/Natalie.Perlin/ufs-weather-model

and with spack-stack-1.5.1 (run with -w option) are in
/scratch1/NCEPDEV/nems/Natalie.Perlin/ufs-weather-model2/

A modulefile for using spack-stack-1.6.0:
/scratch1/NCEPDEV/nems/Natalie.Perlin/ufs-weather-model/modulefiles/ufs_hera.gnu.lua

help([[
loads UFS Model prerequisites for Hera/GNU
]])

prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/modulefiles")
prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/modulefiles")
prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.6.0_gnu13.3/envs/ufs-wm-srw-rocky8/install/modulefiles/Core")

stack_gnu_ver=os.getenv("stack_gnu_ver") or "13.3.0"
load(pathJoin("stack-gcc", stack_gnu_ver))

stack_openmpi_ver=os.getenv("stack_openmpi_ver") or "4.1.6"
load(pathJoin("stack-openmpi", stack_openmpi_ver))

cmake_ver=os.getenv("cmake_ver") or "3.23.1"
load(pathJoin("cmake", cmake_ver))

load("ufs_common")

nccmp_ver=os.getenv("nccmp_ver") or "1.9.0.1"
load(pathJoin("nccmp", nccmp_ver))

prepend_path("CPPFLAGS", " -I/apps/slurm_hera/23.11.3/include/slurm"," ")
prepend_path("LD_LIBRARY_PATH", "/apps/slurm_hera/23.11.3/lib")

setenv("CC", "mpicc")
setenv("CXX", "mpic++")
setenv("FC", "mpif90")
setenv("CMAKE_Platform", "hera.gnu")

whatis("Description: UFS build environment") 

The ufs_common.lua for use with spack-stack1.6.0:

whatis("Description: UFS build environment common libraries")

help([[Load UFS Model common libraries]])

local ufs_modules = {
  {["jasper"]          = "2.0.32"},
  {["zlib"]            = "1.2.13"},
  {["libpng"]          = "1.6.37"},
  {["hdf5"]            = "1.14.0"},
  {["netcdf-c"]        = "4.9.2"},
  {["netcdf-fortran"]  = "4.6.1"},
  {["parallelio"]      = "2.5.10"},
  {["esmf"]            = "8.6.1"},
  {["fms"]             = "2023.04"},
  {["bacio"]           = "2.4.1"},
  {["crtm"]            = "2.4.0.1"},
  {["g2"]              = "3.4.5"},
  {["g2tmpl"]          = "1.10.2"},
  {["ip"]              = "4.3.0"},
  {["sp"]              = "2.5.0"},
  {["w3emc"]           = "2.10.0"},
  {["gftl-shared"]     = "1.6.1"},
  {["mapl"]            = "2.46.0-esmf-8.6.1"},
  {["scotch"]          = "7.0.4"},
}

for i = 1, #ufs_modules do
  for name, default_version in pairs(ufs_modules[i]) do
    local env_version_name = string.gsub(name, "-", "_") .. "_ver"
    load(pathJoin(name, os.getenv(env_version_name) or default_version))
  end
end

A modulefile for using spack-stack-1.5.1:
/scratch1/NCEPDEV/nems/Natalie.Perlin/ufs-weather-model2/modulefiles/ufs_hera.gnu.lua

help([[
loads UFS Model prerequisites for Hera/GNU
]])

prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/installs/gnu/modulefiles")
prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/installs/openmpi/modulefiles")
prepend_path("MODULEPATH", "/scratch2/NCEPDEV/stmp1/role.epic/spack-stack/spack-stack-1.5.1/envs/ufs-wm-srw-rocky8/install/modulefiles/Core")

stack_gnu_ver=os.getenv("stack_gnu_ver") or "13.3.0"
load(pathJoin("stack-gcc", stack_gnu_ver))

stack_openmpi_ver=os.getenv("stack_openmpi_ver") or "4.1.6"
load(pathJoin("stack-openmpi", stack_openmpi_ver))

cmake_ver=os.getenv("cmake_ver") or "3.23.1"
load(pathJoin("cmake", cmake_ver))

load("ufs_common")

nccmp_ver=os.getenv("nccmp_ver") or "1.9.0.1"
load(pathJoin("nccmp", nccmp_ver))

prepend_path("CPPFLAGS", " -I/apps/slurm_hera/23.11.3/include/slurm"," ")
prepend_path("LD_LIBRARY_PATH", "/apps/slurm_hera/23.11.3/lib")
setenv("CC", "mpicc")
setenv("CXX", "mpic++")
setenv("FC", "mpif90")
setenv("CMAKE_Platform", "hera.gnu")

whatis("Description: UFS build environment")

@RatkoVasic-NOAA
Copy link
Collaborator

I tested @natalie-perlin installation, and tests that were failing on Hera using GNU compiler now work. There are so many other tests to be done. @jkbk2004 I suggest weather-model group to test because some of tests are failing just because of not bit-identical results (which is expected).

@natalie-perlin
Copy link
Collaborator

natalie-perlin commented Jun 2, 2024

All the regression tests with gnu/13.3.0 compiler and spack-stack/1.6.0 have successfully passed for the weather model,
please see a full comment:
#2093 (comment)

@zach1221 zach1221 changed the title UFS-WM control_c48/gnu test case hangs on hera UFS-WM cpld_debug_p8 and cpld_control_p8 gnu test case hangs on hera Jun 5, 2024
@zach1221 zach1221 self-assigned this Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Status: No status
Development

No branches or pull requests

5 participants