-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMD (idea): compress #21
Comments
woohoo. By using an external tool which is already available, and using gzip with level 5 compression:
I got $> du -scm Chen\ 2017*
35113 Chen 2017
3324 Chen 2017_comp-gzip5
3298 Chen 2017.tgz so almost exactly the same compression factor as using external tar/gz! testing with level 1 and 9 now to see the spread. And then will chime in to pynwb ppl |
@bendichter (attn @satra ) as you have recently explored compression within NWB, do you think it would be worthwhile to have |
i would leave this in the nwb validator to inform the user and in the nwb conversion tools. i'm not sure this should be a functionality in dandi. |
I believe there was some ideas alongside of this in nwb inspector, right @bendichter ? overall, probably should not be in |
|
I have noted that network traffic while rcloning Svoboda's data is only about 10% of the local "write" IO .
That observation is confirmed by simply compressing the obtained .nwb files using tar/gz:
so indeed -- x10 factor!
Apparently hdmf/pynwb does not bother compressing stored in the .nwb data arrays. They do both document ability to pass compression parameters down (to h5py I guess) though, but as far as I saw it, compression is not on by default. Sure thing hdf5 end compression ration might not reach 10 since not all data will be compressed, but I expect that it will be notable.
As we keep running into those, it might be valuable to provide a
dandi compress
command which would take care about (re)compressing provided .nwb files (inplace or into a new file).Perspective interface:
--inplace
to explicitly state to (re)compress each file in place (might want to do not really "inplace" but rather into a new file, and then replace old one -- this would provide a better workflow for git-annex'ed files, where original ones by default would be read/only)--output filename
- where to store output file (then a single FILE is expected to be provided)The text was updated successfully, but these errors were encountered: