Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PET test failure with DEBUG=True #212

Open
mnlevy1981 opened this issue Mar 1, 2022 · 0 comments
Open

PET test failure with DEBUG=True #212

mnlevy1981 opened this issue Mar 1, 2022 · 0 comments

Comments

@mnlevy1981
Copy link
Collaborator

I set up a sandbox where the MOM DEBUG variable was always true (rather than being tied to the CESM DEBUG) because I wanted some additional output from a failing DIMCS test. I forget about that change until I ran aux_mom and the PET failed with errors like

138:MPT ERROR: shared memory sequence number error received 25354 expected 25355

So I reran the test as a PET_D, which fails in extract_surface_state():

138:
138:FATAL from PE    30: There were a total of       105 locations detected with extreme surface values!
138:
138:Image              PC                Routine            Line        Source
138:cesm.exe           0000000006E567B6  Unknown               Unknown  Unknown
138:cesm.exe           00000000059C3106  mpp_mod_mp_mpp_er          68  mpp_util_mpi.inc
138:cesm.exe           000000000252B9E8  mom_error_infra_m          23  MOM_error_infra.F90
138:cesm.exe           000000000252A0CF  mom_error_handler          84  MOM_error_handler.F90
138:cesm.exe           0000000001B26637  mom_mp_extract_su        3621  MOM.F90
138:cesm.exe           00000000019E579E  mom_ocean_model_n         433  mom_ocean_model_nuopc.F90
138:cesm.exe           000000000194559F  mom_cap_mod_mp_in         655  mom_cap.F90
138:libesmf.so         00002B7B2EA4E489  _ZN5ESMCI6FTable1        2167  ESMCI_FTable.C
138:libesmf.so         00002B7B2EA49C31  ESMCI_FTableCallE         824  ESMCI_FTable.C
138:libesmf.so         00002B7B2EFCD373  _ZN5ESMCI3VMK5ent        2273  ESMCI_VMKernel.C
138:libesmf.so         00002B7B2EFED46A  _ZN5ESMCI2VM5ente        1216  ESMCI_VM.C
138:libesmf.so         00002B7B2EA4A31A  c_esmc_ftablecall         981  ESMCI_FTable.C
138:libesmf.so         00002B7B2F61A30E  esmf_compmod_mp_e        1222  ESMF_Comp.F90
138:libesmf.so         00002B7B2FD1953B  esmf_gridcompmod_        1407  ESMF_GridComp.F90
138:libesmf.so         00002B7B30B1EED0  nuopc_driver_mp_l        2565  NUOPC_Driver.F90
138:libesmf.so         00002B7B30AEB428  nuopc_driver_mp_i        1272  NUOPC_Driver.F90
138:libesmf.so         00002B7B2EA4E489  _ZN5ESMCI6FTable1        2167  ESMCI_FTable.C
138:libesmf.so         00002B7B2EA49C31  ESMCI_FTableCallE         824  ESMCI_FTable.C
138:libesmf.so         00002B7B2EFCD373  _ZN5ESMCI3VMK5ent        2273  ESMCI_VMKernel.C
138:libesmf.so         00002B7B2EFED46A  _ZN5ESMCI2VM5ente        1216  ESMCI_VM.C
138:libesmf.so         00002B7B2EA4A31A  c_esmc_ftablecall         981  ESMCI_FTable.C
138:libesmf.so         00002B7B2F61A30E  esmf_compmod_mp_e        1222  ESMF_Comp.F90
138:libesmf.so         00002B7B2FD1953B  esmf_gridcompmod_        1407  ESMF_GridComp.F90
138:libesmf.so         00002B7B30B1EED0  nuopc_driver_mp_l        2565  NUOPC_Driver.F90
138:libesmf.so         00002B7B30AEB1EB  nuopc_driver_mp_i        1268  NUOPC_Driver.F90
138:libesmf.so         00002B7B30AD0C69  nuopc_driver_mp_i         455  NUOPC_Driver.F90
138:libesmf.so         00002B7B2EA4E489  _ZN5ESMCI6FTable1        2167  ESMCI_FTable.C
138:libesmf.so         00002B7B2EA49C31  ESMCI_FTableCallE         824  ESMCI_FTable.C
138:libesmf.so         00002B7B2EFCD373  _ZN5ESMCI3VMK5ent        2273  ESMCI_VMKernel.C
138:libesmf.so         00002B7B2EFED46A  _ZN5ESMCI2VM5ente        1216  ESMCI_VM.C
138:libesmf.so         00002B7B2EA4A31A  c_esmc_ftablecall         981  ESMCI_FTable.C
138:libesmf.so         00002B7B2F61A30E  esmf_compmod_mp_e        1222  ESMF_Comp.F90
138:libesmf.so         00002B7B2FD1953B  esmf_gridcompmod_        1407  ESMF_GridComp.F90
138:cesm.exe           0000000000438071  MAIN__                    140  esmApp.F90
138:cesm.exe           0000000000418FA2  Unknown               Unknown  Unknown
138:libc-2.22.so       00002B7B359B16E5  __libc_start_main     Unknown  Unknown
138:cesm.exe           0000000000418EA9  Unknown               Unknown  Unknown
138:MPT ERROR: Rank 138(g:138) is aborting with error code 1.
138:    Process ID: 31015, Host: r7i1n8, Program: /glade/scratch/mlevy/PET_D.TL319_t061.GMOM_JRA.cheyenne_intel.20220301_141041_z6xsrj/bld/cesm.exe
138:    MPT Version: HPE MPT 2.22  03/31/20 15:59:10
138:
138:MPT: --------stack traceback-------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant