You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some rather large (gzipped) NIFTI files I need to read without first buffering in memory (so reading in slices).
When loading the image via nibabel.load, slicer and dataobj appear to be re-reading from the beginning each time, resulting in quadratic time complexity with the number of slices taken.
Since I'm only interested in proceeding forward through the file, it would seem that the time complexity here should be linear - indeed, linear time complexity can be obtained by updating the code from an old question:
from io import BytesIO
from nibabel import FileHolder, Nifti1Image
from gzip import GzipFile
fh = FileHolder(fileobj=GzipFile(niftipath))
img = Nifti1Image.from_file_map({'header': fh, 'image': fh})
In this case, since GzipFile preserves the current decompression state, proceeding strictly forward in the file works at the expected speed (much faster).
Is there a way to obtain this same slicing performance using the nibabel.load() API (such as by passing a GzipFile, etc)? This would be greatly preferable, as it abstracts away file formats.
The text was updated successfully, but these errors were encountered:
Thanks! @effigies
Looking at the indexed-gzip docs, I was able to find the flag - keep_file_open = True
While indexed-gzip alone does help, that's all that is needed for this use case.
Still confused as to why keep_file_open is off by default, but enabling it seems to be a solution.
Unless the defaults or relevant documentation (see slicer section - no mention) needs to be revisited to make keep_file_open/indexed-gzip more visible, I'm good to close this.
With indexed gzip, you should not need to set keep file open to get almost identical performance.
The reason it's off by default is that, when working with many files, you can exhaust file handle quotas, and the lifetimes of file handles are difficult to reason about.
Definitely good to update the docs.
ecc521
changed the title
Quadratic Time Complexity reading gzipped NIFTIs in slices
Update slicer docs for indexed-gzip & keep_file_open
Sep 27, 2021
I have some rather large (gzipped) NIFTI files I need to read without first buffering in memory (so reading in slices).
When loading the image via nibabel.load, slicer and dataobj appear to be re-reading from the beginning each time, resulting in quadratic time complexity with the number of slices taken.
Since I'm only interested in proceeding forward through the file, it would seem that the time complexity here should be linear - indeed, linear time complexity can be obtained by updating the code from an old question:
In this case, since GzipFile preserves the current decompression state, proceeding strictly forward in the file works at the expected speed (much faster).
Is there a way to obtain this same slicing performance using the nibabel.load() API (such as by passing a GzipFile, etc)? This would be greatly preferable, as it abstracts away file formats.
The text was updated successfully, but these errors were encountered: