You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am very new to kerchunk but I am trying to create a json file using zarr starting from a series of netcdf files, which might have unequal time (they can be 1 or 12). I do not want to replicate the data with zarr, for both storage and backward compatibility reasons.
Here below a very basic example, I can attach the data but I think it is clear what is done here.
ValueError: Found chunk size mismatch:
at prefix 2t in iteration 1 (file None)
new chunk: [12, 180, 360]
chunks so far: [1, 180, 360]
Browsing various issues in the repository (as this #430 (comment)) , it seems that this is due to a discussed limitation of Zarr that does not allow for unequal chunk sizes, which goes beyond kerchunk.
However, I am wondering if there is a way to force the chunk when accessing the data so that for example if I set chunks={"time": 1}, as for example done with xarray, I should be able to still load the data.
Thanks a lot for any hint!
The text was updated successfully, but these errors were encountered:
oloapinivad
changed the title
How to create json file with c
Create a MultiZarr json file from netcdf files of unequal time length.
Apr 9, 2024
No, you unfortunately cannot "subchunk" the data that have chunk sizes > 1. The sole exception is completely uncompressed/encoded data, which I assume is not your situation.
Explanation:
suppose you have a chunk of data in your original file of size 2 in time (+ some other dimensions). If we were to try to present this as chunk size 1 to zarr, when accessing time=0, it would need to load the zeroth chunk, decompress it, and slice it. When loading time=1, it would have to load and decompress the very same slice.
This "load-and-slice" logic does not exist, and clearly would be very inefficient. It would further be complicated in the case where chunks cross boundaries (original size 7, desired size 2). So we keep to logical 1-1 mapping of chunks, and remain therefore limited by zarr's model.
Therefore I will proceed creating two different json files, one for the 12-step chunk and one for 1-step chunk, and then merging afterwards when opening them. in principle this could work!
Hi there,
I am very new to
kerchunk
but I am trying to create a json file using zarr starting from a series of netcdf files, which might have unequal time (they can be 1 or 12). I do not want to replicate the data with zarr, for both storage and backward compatibility reasons.Here below a very basic example, I can attach the data but I think it is clear what is done here.
This fails with:
Browsing various issues in the repository (as this #430 (comment)) , it seems that this is due to a discussed limitation of Zarr that does not allow for unequal chunk sizes, which goes beyond
kerchunk
.However, I am wondering if there is a way to force the chunk when accessing the data so that for example if I set
chunks={"time": 1}
, as for example done with xarray, I should be able to still load the data.Thanks a lot for any hint!
The text was updated successfully, but these errors were encountered: