Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not enough error checking for adding new restart variables #2616

Open
ekluzek opened this issue Jun 24, 2024 · 1 comment
Open

Not enough error checking for adding new restart variables #2616

ekluzek opened this issue Jun 24, 2024 · 1 comment
Labels
type: bug something is working incorrectly

Comments

@ekluzek
Copy link
Contributor

ekluzek commented Jun 24, 2024

Brief summary of bug

I added a new restart variable, and used the dim1name of "patch" instead of "pft".

General bug information

CTSM version you are using: branch_tags/dustemisdev.n05_ctsm5.1.dev166-5-gf48830977

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: When adding new variables to the restart file

Details of bug

An example of where bad error messaging makes it hard to find problems in the code. I found the problem by pulling it up in DDT and then realized the issue when it came up on the define part, and not the write part. I thought it might have been because of bad data in the array to write, or the interpinic_flag.

Important details of your setup / configuration so we can reproduce the bug

    call restartvar(ncid=ncid, flag=flag, varname='OBU', xtype=ncd_double,  &
         dim1name='patch', &
         long_name='Monin-Obukhov length', units='m', &
         interpinic_flag='skip', readvar=readvar, data=this%obu_patch)

Important output or errors that show the problem

The cesm.log does point to the error, but it's obfuscated enough with tons of output that it's hard to see.

 /glade/work/erik/ctsm_worktrees/dust_dev/share/src/shr_file_mod.F90         912 This routine is depricated - use shr_log_setLogUnit instead         -12
 /glade/work/erik/ctsm_worktrees/dust_dev/share/src/shr_file_mod.F90         912 This routine is depricated - use shr_log_setLogUnit instead         -13
 /glade/work/erik/ctsm_worktrees/dust_dev/share/src/shr_file_mod.F90         912 This routine is depricated - use shr_log_setLogUnit instead         -12
 /glade/work/erik/ctsm_worktrees/dust_dev/share/src/shr_file_mod.F90         912 This routine is depricated - use shr_log_setLogUnit instead         -13
Abort with message NetCDF: Invalid dimension ID or name in file /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pio_nc.c at line 812
Obtained 10 stack frames.
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(print_trace+0x32) [0x14c46a7a228c]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(piodie+0x77) [0x14c46a7a2399]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(check_netcdf2+0x242) [0x14c46a7a272d]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(check_netcdf+0x34) [0x14c46a7a24e9]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(PIOc_inq_dimid+0x3a0) [0x14c46a7c3801]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpiof.so(__pio_nf_MOD_inq_dimid_id+0xb1) [0x14c46aa138cc]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpiof.so(__pio_nf_MOD_inq_dimid_desc+0x3d) [0x14c46aa13994]
/glade/derecho/scratch/erik/ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesRs.derecho_gnu.clm-FatesCold.20240624_131200_uzttv8/bld/cesm.exe() [0x5af763]
/glade/derecho/scratch/erik/ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesRs.derecho_gnu.clm-FatesCold.20240624_131200_uzttv8/bld/cesm.exe() [0x5af871]
/glade/derecho/scratch/erik/ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesRs.derecho_gnu.clm-FatesCold.20240624_131200_uzttv8/bld/cesm.exe() [0x71c18c]

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x14c4616efd4f in ???
	at /usr/src/debug/glibc-2.31-150300.41.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#1  0x14c4616efcbb in __GI_raise
	at ../sysdeps/unix/sysv/linux/raise.c:51
#2  0x14c4616f1354 in __GI_abort
	at /usr/src/debug/glibc-2.31-150300.41.1.x86_64/stdlib/abort.c:79
#3  0x14c46a7a239d in piodie
	at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pioc_support.c:561
#4  0x14c46a7a272c in check_netcdf2
	at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pioc_support.c:683
#5  0x14c46a7a24e8 in check_netcdf
	at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pioc_support.c:632
#6  0x14c46a7c3800 in PIOc_inq_dimid
	at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pio_nc.c:812
#7  0x14c46aa138cb in __pio_nf_MOD_inq_dimid_id
	at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/flib/pio_nf.F90:519
#8  0x14c46aa13993 in __pio_nf_MOD_inq_dimid_desc
	at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/flib/pio_nf.F90:448
#9  0x5af762 in __ncdio_pio_MOD_ncd_inqdid
	at /glade/work/erik/ctsm_worktrees/dust_dev/src/main/ncdio_pio.F90.in:469
#10  0x5af870 in __ncdio_pio_MOD_ncd_defvar_bygrid
	at /glade/work/erik/ctsm_worktrees/dust_dev/src/main/ncdio_pio.F90.in:1257
#11  0x71c18b in __restutilmod_MOD_restartvar_1d_double
	at /glade/work/erik/ctsm_worktrees/dust_dev/src/utils/restUtilMod.F90.in:325
#12  0xa3cf54 in __frictionvelocitymod_MOD_restart
	at /glade/work/erik/ctsm_worktrees/dust_dev/src/biogeophys/FrictionVelocityMod.F90:443
@ekluzek ekluzek added type: bug something is working incorrectly tag: next this should get some attention in the next week or two labels Jun 24, 2024
@ekluzek
Copy link
Contributor Author

ekluzek commented Jun 24, 2024

This is in the same vein as #1913 and #144

Fixing this would just be adding dimexist options to the ncd_inqdid calls and check it.

This is something that should be done on b4b-dev. It's also the type of thing that having simple I/O testing would help with. So the functional test framework would be a good place for this to be tested in.

@ekluzek ekluzek added this to the ctsm6.0.0 (code freeze) milestone Jun 24, 2024
@ekluzek ekluzek removed the tag: next this should get some attention in the next week or two label Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug something is working incorrectly
Projects
None yet
Development

No branches or pull requests

1 participant