Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filters should be a list of dictionaries #65

Closed
abarciauskas-bgse opened this issue Mar 29, 2024 · 1 comment
Closed

Filters should be a list of dictionaries #65

abarciauskas-bgse opened this issue Mar 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@abarciauskas-bgse
Copy link
Collaborator

I believe filters should be an optional list of dictionaries, at least in the case of netcdf4, which is read, in kerchunk, by the h5py library. Futher the zarr spec indicates filters should be a list of json objects

Without this datatype change, I get pydantic type errors which I first reported in #60.

Reproducible example

In this example, I created an artificial dataset with filters as well as used the air dataset from the Usage docs since I knew that worked. It is interesting how the netcdf4 library appears to read filters from both files and the h5py library only reads filters from the artificially generated dataset. I have not yet tracked down why this is.

from netCDF4 import Dataset
import numpy as np
from virtualizarr import open_virtual_dataset
import xarray as xr
import h5py
from netCDF4 import Dataset

# Create some artificial data
data = np.random.rand(100, 100)  # 100x100 array of random numbers

# Create a new NetCDF file
nc_filename = 'artificial_with_filter.nc'
nc_file = Dataset(nc_filename, 'w', format='NETCDF4')

# Define the dimensions of the data
nc_file.createDimension('x', data.shape[0])
nc_file.createDimension('y', data.shape[1])

# Create a variable with zlib compression
data_var = nc_file.createVariable('data', np.float32, ('x', 'y'), zlib=True)

# Assign the data to the variable
data_var[:] = data

# Close the file
nc_file.close()

print(f"NetCDF file '{nc_filename}' created successfully with zlib compression.")

# create an example netCDF4 file from xarray dataset
ds = xr.tutorial.open_dataset('air_temperature')
ds.to_netcdf('air.nc')

files = [('air.nc'), ('artificial_with_filter.nc')]
var_keys = ['air', 'data']
for file in files:
    h5file = h5py.File(file, 'r')
    nc_file = Dataset(file, 'r')
    for group_name in h5file.keys():
        if group_name in var_keys:
            group = h5file[group_name]

            h5filters = group._filters
            print(f"Filters found with hdf5 for {group_name}: {h5filters}")

            var = nc_file.variables[group_name]
            ncfilters = var.filters()
            print(f"Filters found for netcdf for '{group_name}': {ncfilters}")            

    open_virtual_dataset(file)
@TomNicholas TomNicholas added the bug Something isn't working label Mar 29, 2024
@abarciauskas-bgse
Copy link
Collaborator Author

closed via #66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants