You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe filters should be an optional list of dictionaries, at least in the case of netcdf4, which is read, in kerchunk, by the h5py library. Futher the zarr spec indicates filters should be a list of json objects
Without this datatype change, I get pydantic type errors which I first reported in #60.
Reproducible example
In this example, I created an artificial dataset with filters as well as used the air dataset from the Usage docs since I knew that worked. It is interesting how the netcdf4 library appears to read filters from both files and the h5py library only reads filters from the artificially generated dataset. I have not yet tracked down why this is.
from netCDF4 import Dataset
import numpy as np
from virtualizarr import open_virtual_dataset
import xarray as xr
import h5py
from netCDF4 import Dataset
# Create some artificial data
data = np.random.rand(100, 100) # 100x100 array of random numbers
# Create a new NetCDF file
nc_filename = 'artificial_with_filter.nc'
nc_file = Dataset(nc_filename, 'w', format='NETCDF4')
# Define the dimensions of the data
nc_file.createDimension('x', data.shape[0])
nc_file.createDimension('y', data.shape[1])
# Create a variable with zlib compression
data_var = nc_file.createVariable('data', np.float32, ('x', 'y'), zlib=True)
# Assign the data to the variable
data_var[:] = data
# Close the file
nc_file.close()
print(f"NetCDF file '{nc_filename}' created successfully with zlib compression.")
# create an example netCDF4 file from xarray dataset
ds = xr.tutorial.open_dataset('air_temperature')
ds.to_netcdf('air.nc')
files = [('air.nc'), ('artificial_with_filter.nc')]
var_keys = ['air', 'data']
for file in files:
h5file = h5py.File(file, 'r')
nc_file = Dataset(file, 'r')
for group_name in h5file.keys():
if group_name in var_keys:
group = h5file[group_name]
h5filters = group._filters
print(f"Filters found with hdf5 for {group_name}: {h5filters}")
var = nc_file.variables[group_name]
ncfilters = var.filters()
print(f"Filters found for netcdf for '{group_name}': {ncfilters}")
open_virtual_dataset(file)
The text was updated successfully, but these errors were encountered:
I believe filters should be an optional list of dictionaries, at least in the case of netcdf4, which is read, in kerchunk, by the h5py library. Futher the zarr spec indicates filters should be a list of json objects
Without this datatype change, I get pydantic type errors which I first reported in #60.
Reproducible example
In this example, I created an artificial dataset with filters as well as used the air dataset from the Usage docs since I knew that worked. It is interesting how the netcdf4 library appears to read filters from both files and the h5py library only reads filters from the artificially generated dataset. I have not yet tracked down why this is.
The text was updated successfully, but these errors were encountered: