-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving and loading arrays with boolean eltypes #189
Comments
Can you give me an example code in python-xarray which writes and reads such a Boolean array? |
Sure, here's an example: >>> import xarray as xr
>>> import numpy as np
>>> x = np.random.normal(size=(4, 100)) > 0
>>> ds = xr.Dataset(
... data_vars=dict(x=(['chain', 'draw'], x)),
... coords=dict(chain=range(4), draw=range(100)),
... )
>>> ds.x.dtype
dtype('bool')
>>> ds.to_netcdf("foo.nc")
>>> ds2 = xr.open_dataset("foo.nc")
>>> ds2.x.dtype
dtype('bool')
>>> np.array_equal(ds.x, ds2.x)
True When we load julia> using NCDatasets
julia> ds = NCDataset("foo.nc");
julia> ds["x"]
x (100 × 4)
Datatype: Int8
Dimensions: draw × chain
Attributes:
dtype = bool |
Thanks for the example! When reading the variable in python-NetCDF4 package, it seems that the variable is also returned an integer. I am not aware than any other package (Matlab, Octave or R) threat the attribute This reminds me of the discussion about It is also not quite clear to me how to handle _FillValue, valid_min, valid_max, valid_range properties in this case when dtype attribute modifies the element type of an array. Unfortunately, h5py implemented boolean types is a incompatible way than xarray (using enums). So I don't think, that we should import this xarray specific extension to NCDatasets. Maybe, we can can give an API to the user so that the user can implement specific encoding/decoding functions, like function transformation(v::NCDataset.Variable)
if get(v.attrib,"dtype","") == "bool"
# encode, decode function
return x -> Int8(x), x -> Bool(x)
else
return identity, identity
end Would this be worth the effort ? The true fix would be to add a native boolean type to NetCDF/HDF5. Is there any feature request about this? |
As far as I can tell, netCDF does not support a boolean eltype, so boolean variables need to be written as integers. I'm working with some netCDF files saved using the netcdf library via xarray, which seems to handle this by saving boolean data as integers with the attribute
dtype="bool"
. Is there an option to tell NCDatasets during load time to set the eltype based on an attribute like this?The text was updated successfully, but these errors were encountered: