Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEFS regression test suite from EP5r2 configuration/case #2442

Draft
wants to merge 56 commits into
base: develop
Choose a base branch
from

Conversation

NickSzapiro-NOAA
Copy link
Collaborator

@NickSzapiro-NOAA NickSzapiro-NOAA commented Sep 19, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR updates the cpld_bmark_p8 tests to a prototype GEFS test case of fully coupled s2swa+IAU+stochastics physics, with configuration and warm starts from restarts of EP5r2 ensemble member 1 for 2021-03-25 06Z.

The EP5r2 test case was kindly provided by @bingfu-NOAA via @junwang-noaa with aerosol input data and configurations from @lipan-NOAA.

A separate INPUTDATA_ROOT_BMIC is no longer needed and is removed.

This PR is in a draft mode subject to meeting basic reproducibility/quality checks. The following have been tested on Hera:

  • control reproduces itself
  • restart reproduces control
  • changing number of tasks reproduces control
  • Intel debug version runs
  • GNU debug version runs
  • Runs on supported platforms
  • No major diffs from GEFS workflow configuration

GNU debug fails with possible system error of [../../../../../opal/mca/btl/tcp/btl_tcp_endpoint.c:730:mca_btl_tcp_endpoint_start_connect] bind on local address (removed) failed: Address already in use (98)

Intel debug fails in GOCART Floating point exception: floating-point divide by zero (GOCART/Process_Library/GOCART2G_Process.F90:1575) on n_atmsteps = 16

Input data is currently in user space on hera and scripts need updating once filepaths are in shared space.

Commit Message:

* UFSWM - Add GEFS regression test suite from EP5r2 configuration/case

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.

Input data Changes:

  • New input data.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

NickSzapiro-NOAA and others added 30 commits May 6, 2024 06:24
Comment on lines +149 to +151
SU_BIOMASS NA N Y %y4-%m2-%d2t12:00:00 none 0.7778 SO2 ExtData/QFED_Blended_20210325_50days.nc
OC_BIOMASS NA N Y %y4-%m2-%d2t12:00:00 none 0.7778 OC ExtData/QFED_Blended_20210325_50days.nc
BC_BIOMASS NA N Y %y4-%m2-%d2t12:00:00 none 0.7778 BC ExtData/QFED_Blended_20210325_50days.nc
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aerorahul I believe these are the only custom files for the regression test, so as not to carry so much data on all platforms. ExtData is not staged yet...may be nice to be consistent between RT and workflow

@NickSzapiro-NOAA
Copy link
Collaborator Author

@lipan-NOAA @junwang-noaa Intel debug reliably fails in GOCART with traceback to Floating point exception: floating-point divide by zero at GOCART/Process_Library/GOCART2G_Process.F90:1575 on n_atmsteps = 16:
tau = vs/dz

I would imagine FV3 would complain first about any vanishing layer thickness (?), from say stochastic physics

Interestingly, a 3 hour forecast completes if I change this line to:

diff --git a/Process_Library/GOCART2G_Process.F90 b/Process_Library/GOCART2G_Process.F90
index cc0b599..123260c 100644
--- a/Process_Library/GOCART2G_Process.F90
+++ b/Process_Library/GOCART2G_Process.F90
@@ -1561,7 +1561,7 @@ CONTAINS


     ! local
-    integer :: i, j, iit
+    integer :: i, j, iit, k
     integer :: nSubSteps

     real, dimension(i1:i2, j1:j2, km) :: tau
@@ -1571,8 +1571,13 @@ CONTAINS

     real :: dt, dt_cfl

-
-    tau = vs/dz
+    do k = 1,km
+      do j = j1, j2
+        do i = i1, i2
+          tau(i,j,k) = vs(i,j,k)/dz(i,j,k)
+        end do
+      end do
+    end do

Maybe there is some haloes or padding to arrays leading to dz=0? I'm not sure where to go from here

@junwang-noaa
Copy link
Collaborator

@lipan-NOAA @junwang-noaa Intel debug reliably fails in GOCART with traceback to Floating point exception: floating-point divide by zero at GOCART/Process_Library/GOCART2G_Process.F90:1575 on n_atmsteps = 16: tau = vs/dz

I would imagine FV3 would complain first about any vanishing layer thickness (?), from say stochastic physics

Interestingly, a 3 hour forecast completes if I change this line to:

diff --git a/Process_Library/GOCART2G_Process.F90 b/Process_Library/GOCART2G_Process.F90
index cc0b599..123260c 100644
--- a/Process_Library/GOCART2G_Process.F90
+++ b/Process_Library/GOCART2G_Process.F90
@@ -1561,7 +1561,7 @@ CONTAINS


     ! local
-    integer :: i, j, iit
+    integer :: i, j, iit, k
     integer :: nSubSteps

     real, dimension(i1:i2, j1:j2, km) :: tau
@@ -1571,8 +1571,13 @@ CONTAINS

     real :: dt, dt_cfl

-
-    tau = vs/dz
+    do k = 1,km
+      do j = j1, j2
+        do i = i1, i2
+          tau(i,j,k) = vs(i,j,k)/dz(i,j,k)
+        end do
+      end do
+    end do

Maybe there is some haloes or padding to arrays leading to dz=0? I'm not sure where to go from here

@NickSzapiro-NOAA So you are running the test in debug mode? It might be some dimension mismatch.

@weiyuan-jiang @tclune have you run into the issue?

@tclune
Copy link

tclune commented Sep 20, 2024

@junwang-noaa There should not be any halo inside of GOCART, so it is quite weird/interesting that your fix had any effect.

@NickSzapiro-NOAA
Copy link
Collaborator Author

Yes, this divide by 0 error is with intel debug

COMPILE | s2swa_debug | intel | -DAPP=S2SWA -DDEBUG=ON -DCCPP_SUITES=FV3_GFS_v17_coupled_p8_ugwpv1 | | fv3 |
RUN | cpld_debug_gefs                                   | - noaacloud                          | baseline |

The only change for the debug test is to shorten forecast length fhmax, since it is slower

@weiyuan-jiang
Copy link
Collaborator

weiyuan-jiang commented Sep 20, 2024 via email

@NickSzapiro-NOAA
Copy link
Collaborator Author

NickSzapiro-NOAA commented Sep 23, 2024

@weiyuan-jiang @tclune I added

k = count(dz .LT. .001)
write(*,*) 'Gocart dz check for vanishing dz : ', k, i1,i2,j1,j2,km

There are dozens of tasks (out of 768) with small dz, with counts ranging from 1 up to 111. All have the same size
[Task 232:] 111 1 48 1 24 127
and the count on each task does not change throughout the simulation

The curiosities continue as using the alternate loop with tau(i,j,k) = vs(i,j,k)/dz(i,j,k) leads to different counts, with counts ranging up to 110...

fltng_pnt
lossless
pos_pert_fcst
12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA It seems to me the UPP control files "postxconfig-NT-gefs.txt" and "postxconfig-NT-gefs_FH00.txt" haven't been updated to the new format. I wonder if any grib2 files (inline post results) have been successfully generated from your new RT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @WenMeng-NOAA . These postxconfig files are from the provided EP5r2 workflow and stopped working after UPP update in #2326.

Do you know how to reformat these?

As temporary fix, I'm using the gfs postxconfig instead. This choice happens in tests/fv3_conf/cpld_control_run.IN

#inline post
if [ $WRITE_DOPOST = .true. ]; then
  cp    ${PATHRT}/parm/post_itag_gfs itag
  cp    ${PATHRT}/parm/postxconfig-NT-gfs.txt postxconfig-NT.txt
  cp    ${PATHRT}/parm/postxconfig-NT-gfs_FH00.txt postxconfig-NT_FH00.txt
  cp    ${PATHRT}/parm/params_grib2_tbl_new params_grib2_tbl_new
  if [[ ${BMIC} == .true. ]]; then
    cp    ${PATHRT}/parm/post_itag_gefs itag
    #copied "gefs" postxconfig files not working afer UFS #2326 
    #cp    ${PATHRT}/parm/postxconfig-NT-gefs.txt postxconfig-NT.txt
    #cp    ${PATHRT}/parm/postxconfig-NT-gefs_FH00.txt postxconfig-NT_FH00.txt
    cp    ${PATHRT}/parm/postxconfig-NT-gfs.txt postxconfig-NT.txt
    cp    ${PATHRT}/parm/postxconfig-NT-gfs_FH00.txt postxconfig-NT_FH00.txt
    cp    ${PATHRT}/parm/params_grib2_tbl_new params_grib2_tbl_new
  else
    cp    ${PATHRT}/parm/post_itag_gfs itag
    cp    ${PATHRT}/parm/postxconfig-NT-gfs.txt postxconfig-NT.txt
    cp    ${PATHRT}/parm/postxconfig-NT-gfs_FH00.txt postxconfig-NT_FH00.txt
    cp    ${PATHRT}/parm/params_grib2_tbl_new params_grib2_tbl_new
  fi
fi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA @lipan-NOAA If you provide me the UPP control files in xml format, I can generate the text files in new format for you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA I have regenerated "postxconfig-NT-gefs.txt" with the xml file "postcntrl_gefs.xml" provided by @lipan-NOAA. Please let me know if you pick it up from Hera or other machines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WenMeng-NOAA Can you make PR to this branch with file changes? If not, happy to bring in from Hera

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA A PR was just submitted to your branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example run directory using the updated postxconfig files is on hera at:
/scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/updateToEP5/uwm_gefs_upp/tests/run_dir/cpld_control_gefs_intel/
wgrib2 -v seems reasonable, but please let me know if we can verify contents are ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA Your test results look good to me, except for missing aerosol fields. I will provide you changes for generating these aerosol fields from the inline post.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-ran and see aerosol fields (same run_dir). Please let me know if there is anything more to resolve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA Your test results look good to me. @lipan-NOAA Can you also validate aerosol fields in grib2 files?

MODELNAME='GFS'
/
&NAMPGB
KPO=50,PO=1000.,975.,950.,925.,900.,875.,850.,825.,800.,775.,750.,725.,700.,675.,650.,625.,600.,575.,550.,525.,500.,475.,450.,425.,400.,375.,350.,325.,300.,275.,250.,225.,200.,175.,150.,125.,100.,70.,50.,40.,30.,20.,15.,10.,7.,5.,3.,2.,1.,0.4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA To generate aerosol fields from the inline post, please add change as:
from

KPO=50,PO=1000.,975.,950.,925.,900.,875.,850.,825.,800.,775.,750.,725.,700.,675.,650.,625.,600.,575.,550.,525.,500.,475.,450.,425.,400.,375.,350.,325.,300.,275.,250.,225.,200.,175.,150.,125.,100.,70.,50.,40.,30.,20.,15.,10.,7.,5.,3.,2.,1.,0.4,

into

KPO=50,PO=1000.,975.,950.,925.,900.,875.,850.,825.,800.,775.,750.,725.,700.,675.,650.,625.,600.,575.,550.,525.,500.,475.,450.,425.,400.,375.,350.,325.,300.,275.,250.,225.,200.,175.,150.,125.,100.,70.,50.,40.,30.,20.,15.,10.,7.,5.,3.,2.,1.,0.4,nasa_on=.true.,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickSzapiro-NOAA Also copy all optics_luts_*_nasa.dat files from UPP/fix/chem to your run directory.

@junwang-noaa
Copy link
Collaborator

@weiyuan-jiang @tclune I added

k = count(dz .LT. .001)
write(*,*) 'Gocart dz check for vanishing dz : ', k, i1,i2,j1,j2,km

There are dozens of tasks (out of 768) with small dz, with counts ranging from 1 up to 111. All have the same size [Task 232:] 111 1 48 1 24 127 and the count on each task does not change throughout the simulation

The curiosities continue as using the alternate loop with tau(i,j,k) = vs(i,j,k)/dz(i,j,k) leads to different counts, with counts ranging up to 110...

@yangfanglin @

@weiyuan-jiang @tclune I added

k = count(dz .LT. .001)
write(*,*) 'Gocart dz check for vanishing dz : ', k, i1,i2,j1,j2,km

There are dozens of tasks (out of 768) with small dz, with counts ranging from 1 up to 111. All have the same size [Task 232:] 111 1 48 1 24 127 and the count on each task does not change throughout the simulation

The curiosities continue as using the alternate loop with tau(i,j,k) = vs(i,j,k)/dz(i,j,k) leads to different counts, with counts ranging up to 110...

@NickSzapiro-NOAA can you check if dz is zero on any of grid pionts since the error message is "Floating point exception: floating-point divide by zero", we may then need to track where the zero value dz comes from. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants