CMD (idea): compress #21

yarikoptic · 2019-10-11T18:12:18Z

I have noted that network traffic while rcloning Svoboda's data is only about 10% of the local "write" IO .

That observation is confirmed by simply compressing the obtained .nwb files using tar/gz:

smaug:/mnt/btrfs/datasets/datalad/crawl-misc/svoboda-rclone/Exported NWB 2.0
$> du -scm Chen\ 2017*
35113   Chen 2017
3298    Chen 2017.tgz
38410   total

so indeed -- x10 factor!

Apparently hdmf/pynwb does not bother compressing stored in the .nwb data arrays. They do both document ability to pass compression parameters down (to h5py I guess) though, but as far as I saw it, compression is not on by default. Sure thing hdf5 end compression ration might not reach 10 since not all data will be compressed, but I expect that it will be notable.

As we keep running into those, it might be valuable to provide a dandi compress command which would take care about (re)compressing provided .nwb files (inplace or into a new file).
Perspective interface:

dandi compress [-i|--inplace] [-o|--output FILE] [-c|--compression METHOD (default gzip)] [-l|--level LEVEL (default 5)] [FILES]

--inplace to explicitly state to (re)compress each file in place (might want to do not really "inplace" but rather into a new file, and then replace old one -- this would provide a better workflow for git-annex'ed files, where original ones by default would be read/only)
--output filename - where to store output file (then a single FILE is expected to be provided)

The text was updated successfully, but these errors were encountered:

yarikoptic · 2019-10-11T19:11:45Z

woohoo. By using an external tool which is already available, and using gzip with level 5 compression:

for f in Chen\ 2017/*nwb; do h5repack -v -f GZIP=5 "$f" "${f/\//_comp-gzip5/}"; done

I got

$> du -scm Chen\ 2017*
35113   Chen 2017
3324    Chen 2017_comp-gzip5
3298    Chen 2017.tgz

so almost exactly the same compression factor as using external tar/gz! testing with level 1 and 9 now to see the spread. And then will chime in to pynwb ppl

yarikoptic · 2022-07-27T14:38:29Z

@bendichter (attn @satra ) as you have recently explored compression within NWB, do you think it would be worthwhile to have dandi compress to expose it or we better deffer this functionality to some more specialized nwb tools of a kind since dandi-cli deals with all kinds of data types (so in principle we could add compress functionality for zarrs and tifs I guess).

satra · 2022-07-27T16:37:04Z

i would leave this in the nwb validator to inform the user and in the nwb conversion tools. i'm not sure this should be a functionality in dandi.

yarikoptic · 2024-07-18T18:06:43Z

I believe there was some ideas alongside of this in nwb inspector, right @bendichter ? overall, probably should not be in dandi client since not dandi specific, so I will close

bendichter · 2024-07-18T18:55:32Z

Compression is now the default behavior for NeuroConv (and NWB GUIDE)
When you have an NWB File is memory, you can use NeuroConv to automatically apply our recommended chunking and compression to each dataset before writing to disk
NWB Inspector will inform the user if they have large datasets that are not compressed
One thing we'd like to do, but have not done yet, is to have a function to "repack" an NWB file. This function would take as input an uncompressed NWB file and will produce a file where each dataset is compressed according to our recommendations. See the issue here: [Feature]: repack NWB file catalystneuro/neuroconv#892

yarikoptic added the enhancement New feature or request label Oct 11, 2019

yarikoptic mentioned this issue Oct 11, 2019

upload: option to compress traffic #23

Closed

yarikoptic added the UX label Jul 27, 2022

yarikoptic closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMD (idea): compress #21

CMD (idea): compress #21

yarikoptic commented Oct 11, 2019

yarikoptic commented Oct 11, 2019

yarikoptic commented Jul 27, 2022

satra commented Jul 27, 2022

yarikoptic commented Jul 18, 2024

bendichter commented Jul 18, 2024

CMD (idea): compress #21

CMD (idea): compress #21

Comments

yarikoptic commented Oct 11, 2019

yarikoptic commented Oct 11, 2019

yarikoptic commented Jul 27, 2022

satra commented Jul 27, 2022

yarikoptic commented Jul 18, 2024

bendichter commented Jul 18, 2024