Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post processing errors resulting from GFS HR4 test run #3019

Open
ChristopherHill-NOAA opened this issue Oct 18, 2024 · 9 comments
Open

Post processing errors resulting from GFS HR4 test run #3019

ChristopherHill-NOAA opened this issue Oct 18, 2024 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@ChristopherHill-NOAA
Copy link

What is wrong?

From a test run of GFS HR4 performed by @RuiyuSun, execution of the post processing package is included, and the following log files indicated the following errors:

gfsarch.log:  FATAL ERROR: Required file, directory, or glob gfs.20201030/00/products/atmos/wmo/gfs_collective1.postsnd_00 not found!
gfsawips_20km_1p0deg_f###-f###.log:  End exgfs_atmos_awips_20km_1p0deg.sh ... with error code 30
gfswaveawipsbulls.log:  FATAL ERROR: Job waveawipsbulls.95602 failed RETURN CODE 4
gfswaveawipsgridded.log: End exgfs_wave_prdgen_gridded.sh ... with error code 1

What should have happened?

The post processing scripts should all have run to completion without error or interruption.

What machines are impacted?

WCOSS2

What global-workflow hash are you using?

4ad9695

Steps to reproduce

Clone and build the workflow code from the indicated hash on WCOSS, then execute a single cycle (2020103000) that points to datasets available from the HR4 test case. Please consult @RuiyuSun for more specifications.

Additional information

Errors from gfswaveawipsbulls and gfswaveawipsgridded appear to result from a misread of Alaska buoy station information, and these scripts within the workflow may require reference to an updated table that links stations to expected data bulletins. @AminIlia-NOAA is added here for reference to the issue and the attached log files.
gfswaveawipsbulls.log
gfswaveawipsgridded.log

Errors resulting from gfsfbwind and gfsgempakncdcupagif are being assessed for potential inclusion with this issue.

Do you have a proposed solution?

The GETGB2P errors generated from gfsawips_20km_1p0deg - as seen in OUTPUT70005.txt - result from the absence of GRIB variable 5WAVH from the GFS control file, and appear related to those resolved through #2652; the relevant parameter tables will be modified in a similar manner. Resolution of errors from gfsawips_20km_1p0deg may subsequently resolve the absence of files causing the gfsarch error.

Once developed, the modifications resolving all errors described here will be bundled into one or two pull requests.

@ChristopherHill-NOAA ChristopherHill-NOAA added bug Something isn't working triage Issues that are triage labels Oct 18, 2024
@ChristopherHill-NOAA
Copy link
Author

Additional errors from the post-processing scripts:

  1. gfsfbwind
    gfsfbwind.log: ERROR config.resources must be sourced before sourcing WCOSS2.env

This error is triggered by the following logic statement within WCOSS.env:

if [[ -n "${ntasks:-}" && -n "${max_tasks_per_node:-}" && -n "${tasks_per_node:-}" ]]; then
    max_threads_per_task=$((max_tasks_per_node / tasks_per_node))
    NTHREADSmax=${threads_per_task:-${max_threads_per_task}}
    NTHREADS1=${threads_per_task:-1}
    [[ ${NTHREADSmax} -gt ${max_threads_per_task} ]] && NTHREADSmax=${max_threads_per_task}
    [[ ${NTHREADS1} -gt ${max_threads_per_task} ]] && NTHREADS1=${max_threads_per_task}
    APRUN_default="${launcher} -n ${ntasks}"
else
    echo "ERROR config.resources must be sourced before sourcing WCOSS2.env"
    exit 2
fi

The variable $tasks_per_node is currently absent from the 'fbwind' case module within config.resources, and will be added.

  1. gfsgempakncdcupapgif
    gfsgempakncdcupapgif.log: End exgfs_atmos_gempak_gif_ncdc_skew_t.sh ... with error code 1

Preceding this error:

  • the ImageMagick command that is invoked during the execution of make_tif.sh (convert) is not recognized
  • the location of file make_NTC_file.pl is not recognized during the execution of make_tif.sh
  • the subdirectory defined by $COM_OBS was found to be absent from the $ROTDIRS filespace

The file make_tif.sh is found to rely on the static reference to an outdated version of ImageMagick, and will need to be modified to reference a currently available system module. Further, make_tif.sh relies on the $UTILgfs variable to construct the path to make_NTC_file.pl. $UTILgfs is defined only within the job file $HOMEgfs/sorc/upp.fd/jobs/J_NCEPPOST, which references an antiquated 'util' directory that is no longer present within the UPP directory tree. Either or each of the files J_NCEPPOST or make_tif.sh will be modified to reflect the current path to make_NTC_file.pl.

Specific to the HR4 test of the global workflow, there may be a need to ensure proper staging of the observational data directory, defined by $COM_OBS. The workflow code will be reviewed for any need of revision.

@WalterKolczynski-NOAA
Copy link
Contributor

These run to completion in CI tests (C96), albeit we are not validating the results.

@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the triage Issues that are triage label Oct 31, 2024
@WalterKolczynski-NOAA
Copy link
Contributor

@DavidHuber-NOAA Can you look at the env issues here?

@DavidHuber-NOAA DavidHuber-NOAA self-assigned this Oct 31, 2024
@DavidHuber-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA The variable that @ChristopherHill-NOAA mentions not being in config.resources for the fbwind jobs is correct. That job is not run by us in the C96_atm3DVar_extended CI test. We should probably add it so it is tested.

For the gfsgempakncdcupapgif job, I see the same make_tif.sh and make_NTC_file.pl errors. However, the COM_OBS directory was found for our C96 test case. I suspect that the COM_OBS directory is not generated for a forecast-only experiment (assuming that is how this is run). In that case, the gfsgempakncdcupapgif should probably not be part of the mesh in forecast-only cases.

Going back to the original post, the C96_atm3DVar_extended test does not attempt to create the gfs_downstream.tar tarball, in which the WMO collective files would be stored. It's not immediately clear to me why not, as it should be triggered when DO_BUFRSND == YES, which it is. I will look into this.

We do not run the gfs_awips* or gfs_waveawips* jobs, so I don't have anything to compare against there. I wonder if the gfs_awips jobs should be run in the C96_atm3DVar_extended test. I know it adds a significant number of jobs.

@WalterKolczynski-NOAA
Copy link
Contributor

WalterKolczynski-NOAA commented Oct 31, 2024

AWIPS jobs should be running in the extended test. That's the main point of it. I believe fbwind is gated behind the AWIPS switch as well.

@WalterKolczynski-NOAA
Copy link
Contributor

WalterKolczynski-NOAA commented Oct 31, 2024

Traced back why it is was off. When I added the test in #2567, there was an issue with tocgrib2 and the convective precip fields, so AWIPS was turned off. That has since been fixed (#2652), so the test should be turned on now (it should've been turned on then).

@DavidHuber-NOAA
Copy link
Contributor

Looking into the gfs_downstream.tar failure, I see that WCOSS2 does not enable HPSS archiving by default. Turning this feature on and attempting to run gfs_arch in the C96_atm3DVar_extended test causes an identical failure that Chris reported. HPSS archiving should probably be enabled during CI testing on WCOSS2.

@DavidHuber-NOAA
Copy link
Contributor

I will address the gfs_downstream.tar issue in an upcoming PR.

@DavidHuber-NOAA
Copy link
Contributor

The source of the missing postsnd files comes from #1929. This PR reworked the way data was sent to COM_ATMOS_WMO. Now, data is only sent to that directory if SENDDBN == "YES". However, the postsnd collective files are also (always) sent to COM_ATMOS_BUFR, though they are named differently. I will update the archive yaml to point to this location instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants