-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding chunks do not match inferred chunks #207
Comments
Hi @caiostringari I think you haven't heard from anyone, because we may need some more info to start helping you debug.
|
@abkfenris Bug description
Call Stack
BackgroundVersion
Datasets I tested withOSN Dataset (KeyError)
OSN Dataset (Works as expected!)
S3 Dataset (ValueError)
|
Thanks @xaviernogueira that let me dig into it some. I didn't end up trying with requester pays (or the OSN that works for that matter), but I was able to reproduce the It looks like an encoding may be set on If I yoink the {
"metadata": {
...
"time/.zarray": {
"chunks": [
46008
],
"compressor": {
"id": "zstd",
"level": 9
},
"dtype": "<i8",
"fill_value": null,
"filters": null,
"order": "C",
"shape": [
368064
],
"zarr_format": 2
},
"time/.zattrs": {
"_ARRAY_DIMENSIONS": [
"time"
],
"calendar": "proleptic_gregorian",
"standard_name": "time",
"units": "hours since 1979-10-01 00:00:00"
},
...
}
} This is probably over my head for Zarr specifics so I'm not sure if we should go for the encoded/inferred chunks in this case, but maybe @jhamman has some thoughts. |
So it occured to me that |
How are you launching Xpublish for the datasets (ds.rest.serve() vs xpublish.Rest()...)? ds.rest(
app_kws=dict(
title="Some title here",
description="Some description here.",
openapi_url="/dataset.json",
),
cache_kws=dict(available_bytes=1e9), # this is 1 GB worth of cache.
)
`ds.rest.serve()`. What version of Xpublish, supporting libraries, and any plugins are you using (the output of /plugins and /versions would be fantastic)? {
"dataset_info": {
"path": "xpublish.plugins.included.dataset_info.DatasetInfoPlugin",
"version": "0.3.0"
},
"module_version": {
"path": "xpublish.plugins.included.module_version.ModuleVersionPlugin",
"version": "0.3.0"
},
"plugin_info": {
"path": "xpublish.plugins.included.plugin_info.PluginInfoPlugin",
"version": "0.3.0"
},
"zarr": {
"path": "xpublish.plugins.included.zarr.ZarrPlugin",
"version": "0.3.0"
}
} {
"python": "3.11.3 | packaged by conda-forge | (main, Apr 6 2023, 08:57:19) [GCC 11.3.0]",
"python-bits": 64,
"OS": "Linux",
"OS-release": "5.15.90.1-microsoft-standard-WSL2",
"Version": "#1 SMP Fri Jan 27 02:56:13 UTC 2023",
"machine": "x86_64",
"processor": "x86_64",
"byteorder": "little",
"LC_ALL": "None",
"LANG": "C.UTF-8",
"LOCALE": "en_US.UTF-8",
"libhdf5": "1.12.2",
"libnetcdf": null,
"xarray": "2023.4.2",
"zarr": "2.14.2",
"numcodecs": "0.11.0",
"fastapi": "0.95.1",
"starlette": "0.26.1",
"pandas": "2.0.1",
"numpy": "1.24.3",
"dask": "2023.4.1",
"uvicorn": "0.22.0"
} Does this occur with other datasets (say the Xarray tutorial datasets, or others that we can try without credentials)? ** Have you tried without overwrite_encoded_chunks and ds.chunk (were those in the docs for the dataset)?** Is the server throwing the ValueError on request, or is the client (and how is your client configured to connect to the server)? Traceback (most recent call last):
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
await super().__call__(scope, receive, send)
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/fastapi/routing.py", line 237, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/xpublish/plugins/included/zarr.py", line 39, in get_zarr_metadata
zmetadata = get_zmetadata(dataset, cache, zvariables)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/xpublish/dependencies.py", line 97, in get_zmetadata
zmeta = create_zmetadata(dataset)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/xpublish/utils/zarr.py", line 126, in create_zmetadata
zmeta['metadata'][f'{key}/{array_meta_key}'] = _extract_zarray(
^^^^^^^^^^^^^^^^
File "/home/cstringari/mambaforge/envs/nwm-api/lib/python3.11/site-packages/xpublish/utils/zarr.py", line 94, in _extract_zarray
raise ValueError('Encoding chunks do not match inferred chunks')
ValueError: Encoding chunks do not match inferred chunks Since my original post, I re-built by my zarr file from scratch but got the same errors. The data is hosted on Azure and |
@caiostringari sorry it's taken a few days for me to take another look. Thanks for all that info, but I think we might need a bit more info about the dataset, it's chunk geometry, and any encoding. Can you try running this against your dataset? It's largely the guts of from xpublish.utils import zarr
for key, dvar in ds.variables.items():
da = ds[key]
encoded_da = zarr.encode_zarr_variable(dvar, name=key)
encoding = zarr.extract_zarr_variable_encoding(dvar)
zattrs = zarr._extract_dataarray_zattrs(encoded_da)
zattrs = zarr._extract_dataarray_coords(da, zattrs)
try:
extracted_zarray = zarr._extract_zarray(
encoded_da, encoding, encoded_da.dtype
)
except ValueError:
print(f"{key=}, {dvar=}")
print(f"{da=}")
print(f"{encoded_da=}")
print(f"{encoding=}")
print(f"{da.encoding=}")
print(f"{zattrs=}") The top level |
What helped in my case: After setting chunks with
|
@jhamman mentioned that setting the encoding after specifying chunks is problematic in Xarray anyways and is something they are trying to move away from, and to try |
Sorry for the delay, @wachsylon solution works my dataset! =) @abkfenris here are the outputs
|
Great! @caiostringari Were you able to see if As I kind of expected, it looks like your actual chunksize doesn't match your encoded one in all cases. At least from my glance through there, The encoded has chunks and preffered chunks defined I'm guessing you don't need to be explicitly setting your time chunks after you open, unless you need to re-chunk. |
I have this same error and Details: |
Though cf-xarray uses coordinates saved in |
I would just copy the |
Ah I sometimes see "coordinates" as an attribute instead of in encoding though I started moving it to encoding because I thought it sometimes wasn't recognized in attributes instead of encoding. Should it work with cfxarray equally well in either location? |
Inputting |
Hi,
I am having problems with the
/zarr/get_zarr_metadata
endpoint. I can start the server and I see my dataset but when I try to read data from the client side, I getValueError: Encoding chunks do not match inferred chunks
.I tried to explicitly change the chunks / encoding but it did not seem to work.
My code looks something like this:
Any ideas?
Thank you very much
The text was updated successfully, but these errors were encountered: