Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIO and hdf5 failures #932

Closed
apcraig opened this issue Feb 9, 2024 · 6 comments
Closed

PIO and hdf5 failures #932

apcraig opened this issue Feb 9, 2024 · 6 comments

Comments

@apcraig
Copy link
Contributor

apcraig commented Feb 9, 2024

Updates to the CICE IO capabilities revealed an interesting bug. The problem is partly documented in #928.

To summarize, with PIO installed from spack, setting the restart_format to hdf5 when reading a (restart) file that is not hdf5 format results in an error,

(ice_pio_check)NetCDF: Attempt to use feature that was not turned on when netCDF was built., (ice_pio_init) ERROR: Failed to open file cice_model.res.nc

To avoid this issue, the restart read file open was hardwired to 'cdf1' format in #928 which allows restart files of any format to be read, including hdf5.

  • In all cases, there were no problems writing hdf5 files, just reading a non-hdf5 with an open of format hdf5.
  • This problem did not exist for the PIO built on Derecho at the time, just for a spack version.
  • We believe that forcing 'cdf1' format for reading hdf5 files probably forces the read to be serial. There is some potential loss of performance as a result in that case, although in practice, it's probably small. Performance on writing is more critical for model performance.
  • There was some speculation this was caused by a combination of spack not building with parallel-netcdf and PIO not detecting and reverting to standard netcdf in these cases (which it should do).
  • A different workaround was found, "the variable file which is shared from fortran to the c binding, wasn't getting updated. So I made a local 'lFile' within ice_pio_init, and it fixed the problem! (See 3ef71dd)." Nobody quite understands why this fixes the PIO detection of the file type / parallelization.

Some thoughts about how else we might proceed are to

  • get help understanding why PIO isn't reverting to serial netCDF reads when trying to read a file that isn't hdf5. This is probably what needs to happen, but doesn't help with older versions of PIO, so we probably still need a workaround.
  • formally use something like nf90_inq_format in CICE to check the netCDF format of a file to be read then make sure the file is opened appropriately.
  • just leave things as they are with 'cdf1' and not worry about the potential performance boost by being able to read an hdf5 file in parallel, if that's even happening.
@anton-seaice
Copy link
Contributor

A different workaround was found, "the variable file which is shared from fortran to the c binding, wasn't getting updated. So I made a local 'lFile' within ice_pio_init, and it fixed the problem! (See 3ef71dd)." Nobody quite understands why this fixes the PIO detection of the file type / parallelization.

Please disregard this - I was having a friday afternoon moment. This workaround doesn't work.

I raised in the PIO repo: NCAR/ParallelIO#1985

@anton-seaice
Copy link
Contributor

formally use something like nf90_inq_format in CICE to check the netCDF format of a file to be read then make sure the file is opened appropriately.

This might be hard because even if the file format is determined, it may not be easy to figure out which formats are supported by the linked PIO/Netcdf libraries. (Although there is this header file in the pio clib with #defines for pnetcdf/netcdf4.)

@anton-seaice
Copy link
Contributor

anton-seaice commented Feb 16, 2024

It turns out this has already been fixed, see addition of 'ierr == NC_ENOTBUILT'

NCAR/ParallelIO@e437a94#diff-205cd9c480611213ad871801509790fd76b6068519d6ead92e2aeb7321d82974

I updated our PIO build to 2.6.2 and the issue went away. (I was using 2.5.10)

@anton-seaice
Copy link
Contributor

This issue was resolved in PIO 2.6 - should we put a patch in for the older PIO versions still? @apcraig

@apcraig
Copy link
Contributor Author

apcraig commented Jul 22, 2024

It makes sense to me to put a workaround in CICE for the time being. A number of folks might be using older versions of PIO. @anton-seaice, is there a clean and obvious fix that we can PR? I haven't looked closely myself.

@apcraig
Copy link
Contributor Author

apcraig commented Aug 8, 2024

Closed with #966 improvements

@apcraig apcraig closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants