Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C96C48_hybatmDA 2021122100 gfsfcst aborts upon ufs_model.x start #2551

Closed
RussTreadon-NOAA opened this issue Apr 27, 2024 · 3 comments
Closed
Labels
bug Something isn't working triage Issues that are triage

Comments

@RussTreadon-NOAA
Copy link
Contributor

What is wrong?

Failure captured in Hera /scratch1/NCEPDEV/stmp2/Russ.Treadon/COMROOT/prtest_gsi/logs/2021122100/gfsfcst.log

+ exglobal_forecast.sh[153]: srun -l --export=ALL -n 40 /scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/prtest_gsi/gfsfcst.2021122100/fcst.223101/ufs_model.x
 0:
 0:
 0: * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * .
 0:      PROGRAM ufs-weather-model HAS BEGUN. COMPILED       0.00     ORG: np23
 0:      STARTING DATE-TIME  APR 27,2024  11:26:31.010  118  SAT   2460428
 0:
 0:
 0: MPI Library = Intel(R) MPI Library 2021.5 for Linux* OS
 0:
 0: MPI Version = 3.1
26: Abort(1) on node 26 (rank 26 in comm 496): application called MPI_Abort(comm=0x84000002, 1) - process 26
36: Abort(1) on node 36 (rank 36 in comm 496): application called MPI_Abort(comm=0x84000002, 1) - process 36
14: Abort(1) on node 14 (rank 14 in comm 496): application called MPI_Abort(comm=0x84000002, 1) - process 14
23: Abort(1) on node 23 (rank 23 in comm 496): application called MPI_Abort(comm=0x84000002, 1) - process 23

What should have happened?

gfsfcst should run to completion

What machines are impacted?

Hera

Steps to reproduce

  1. clone and install g-w develop
  2. set up C96C48_hybatmDA CI
  3. cd EXPDIR for C96C48_hybatmDA. rocotorun to start test
  4. enable cron to drive test

Half cycle gdas and enkfgdas forecasts run to completion. gdas and enkfgdas forecasts 2021122100 and 2021122106 successfully complete. 2021122100 gfs forecast aborts.

Failed gfs forecast rewound and rebooted. Failed again in the same manner.

Key Hera directories for above test

  • HOMEgfs=/scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/test
  • EXPDIR=/scratch1/NCEPDEV/stmp2/Russ.Treadon/EXPDIR/prtest_gsi
  • COMROOT=/scratch1/NCEPDEV/stmp2/Russ.Treadon/COMROOT/prtest_gsi

Additional information

Repeat the above on Orion. 2021122100 gfs forecast ran to completion.

Do you have a proposed solution?

No response

@RussTreadon-NOAA RussTreadon-NOAA added bug Something isn't working triage Issues that are triage labels Apr 27, 2024
@RussTreadon-NOAA RussTreadon-NOAA changed the title C96C48_hybatmDA 2021122100 gfsfcst aborts upon ufs_model.x start HERA C96C48_hybatmDA 2021122100 gfsfcst aborts upon ufs_model.x start Apr 27, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

Repeat C96C48_hybatmDA test on Hercules. 2021122100 gfsfcst runs without error.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added triage Issues that are triage and removed triage Issues that are triage labels Apr 27, 2024
@RussTreadon-NOAA RussTreadon-NOAA changed the title HERA C96C48_hybatmDA 2021122100 gfsfcst aborts upon ufs_model.x start C96C48_hybatmDA 2021122100 gfsfcst aborts upon ufs_model.x start Apr 29, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

Update working copy of g-w develop to 4b96c12 on Hera and Orion. Rerun 2021122100 gfsfcst. Now ufs_model.x aborts upon startup on both machines.

@RussTreadon-NOAA
Copy link
Contributor Author

GSI and JEDI based CI testing on Orion and Hera using PR #2553 for HOMEgfs did not encounter failures with the 00Z gfs forecasts.

The errors reported in this issue used an older snapshot of g-w. PR #2553 is up to date with the current head of g-w develop. It's also possible that the errors reported in this issue reflect system and not g-w problems.

Close this issue for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issues that are triage
Projects
None yet
Development

No branches or pull requests

2 participants