open_virtual_dataset with dmr++ #113

ayushnag · 2024-05-14T15:16:59Z

Closes Reading from dmrcp index files? #85
Tests added
Changes are documented in docs/releases.rst
New functions/methods are listed in api.rst
Open dataset via group= param
Use reader options for dmr file opening pattern

for more information, see https://pre-commit.ci

virtualizarr/xarray.py

virtualizarr/dmrpp.py

chunk key parsing speedup

for more information, see https://pre-commit.ci

agoodm · 2024-05-15T02:19:32Z

Thanks for taking a look and giving my suggested changes to the chunk key parsing a try @ayushnag !

Continuing the discussion on performance I think the remaining bottlenecks (aside from your point about I/O in the cloud maybe) with this now lie primarily outside the scope of this work, and I don't expect changing XML readers to make a significant improvement.

TomNicholas · 2024-05-30T16:13:25Z

virtualizarr/xarray.py

+    group : str, default None
+        Group path within the dataset to open. For example netcdf4 and hdf5 groups


It would be nice to separate out the addition of this kwarg into a separate pull request, and implement it for the existing HDF5 reader. Then this PR wouldn't need to change the API of open_virtual_dataset.

virtualizarr/readers/dmrpp.py

ayushnag · 2024-06-27T22:49:32Z

virtualizarr/manifests/manifest.py

The int32 to int64 change had to be made since I ran into some large byte offsets with the Atlas ICE-SAT dataset. Here is an example error: OverflowError: Python integer 6751178683 out of bounds for int32

Well that justifies #177!

ayushnag · 2024-06-27T23:35:46Z

Some questions about writing unit tests:

How to load test dmrpp’s?
- These files are available over https but need netrc login (NASA Earthdata authentication)
- I will check how earthaccess gets creds and does testing
What should I compare my result to?
- My understanding is that the dmr parsed dataset should match the dataset made by vz.open_dataset(“data.nc”)
- Should everything match or are there some main portions to check? (dims, attrs, variables)
- Related to Get xarray.testing.assert_identical to work on datasets containing ManifestArrays #161

ayushnag · 2024-08-04T08:13:23Z

@TomNicholas this PR should be ready to go now. You can take a look at the code again if you wish and I can add updates. I have also added docs and release notes.

TomNicholas

This looks good, but I'm still unclear if you prefer to merge into EarthAccess or here.

TomNicholas · 2024-08-20T19:08:52Z

virtualizarr/readers/dmrpp.py

+            attrs.update(self._parse_attribute(attr_tag))
+        return xr.Dataset(
+            data_vars=data_vars,
+            coords=xr.Coordinates(coords=coord_vars, indexes={}),


Should the indexes variable here technically be passed through from open_virtual_dataset? Right now it doesn't matter, but it would in order to support #18.

Yes you're right, I will fix that

This and #113 (comment) are the only comment that I think actually need to be addressed before merging, because they affect public API. Everything else can be left until later if you prefer.

I was trying to get this fixed however I think there is an issue which is that the indexes cannot be created since we don't have the path to the actual data file. Technically the dmrpp file does have a data path but it is only the file name and not the full path. E.g. "data.nc" instead of "s3://.../data.nc". So in that case if I add support for the indexes param it will fail when indexes=None when usually in virtualizarr indexes=None indicates auto-create indexes.

What I could do instead is to treat indexes=None and indexes={} as the same but raise a warning if indexes=None that the behavior is not as expected. However I can still accept manually created indexes (e.g. {"time": Index})

You can just raise a NotImplementedError on indexes={}. But at least something that isn't silently doing an inconsistent behaviour to the other readers.

I thought indexes={} would be acceptable since that is how most people indicate that indexes should not be created?

I meant indexes=None sorry

TomNicholas · 2024-08-20T19:11:23Z

virtualizarr/readers/dmrpp.py

+        attr[attr_tag.attrib["name"]] = values[0] if len(values) == 1 else values
+        return attr
+
+    def _parse_filters(


A lot of these methods don't technically need to be methods, because they don't use self. It could all be functional instead. But this isn't very important, just a note.

Yes you're right these can be standalone. Even many of the functions that use self are just using the class constants. I will update after the initial merge

TomNicholas · 2024-08-20T19:12:47Z

virtualizarr/readers/dmrpp.py

+                "offset": int(chunk_tag.attrib["offset"]),
+                "length": int(chunk_tag.attrib["nBytes"]),
+            }
+        return ChunkManifest(entries=chunkmanifest)


Do you think you could rewrite this to use ChunkManifest.from_arrays()? It would potentially be a lot more performant. (But this could also be left to a later PR.)

Oh nice didn't know about that function. I will give it a try

docs/api.rst

virtualizarr/readers/dmrpp.py

ayushnag · 2024-08-26T14:42:51Z

@TomNicholas do you know why the test is failing? I have added dmrpp as a valid file type so it should be recognized in the unit test. It also passes when I run network tests locally

TomNicholas · 2024-08-26T14:47:05Z

I'm just looking at that. I suspect it's just a bad merge from main. Have you tried pulling the latest changes to this branch and running that locally?

virtualizarr/xarray.py

TomNicholas · 2024-08-26T15:53:45Z

Amazing piece of work here @ayushnag !!

ayushnag and others added 2 commits May 13, 2024 11:51

basic dmr parsing functionality

18b53bd

[pre-commit.ci] auto fixes from pre-commit.com hooks

47d8901

for more information, see https://pre-commit.ci

TomNicholas added references generation Reading byte ranges from archival files enhancement New feature or request labels May 14, 2024

ayushnag changed the title ~~basic dmr parsing functionality~~ open_dataset with dmr++ May 14, 2024

TomNicholas changed the title ~~open_dataset with dmr++~~ open_virtual_dataset with dmr++ May 14, 2024

TomNicholas reviewed May 14, 2024

View reviewed changes

virtualizarr/xarray.py Outdated Show resolved Hide resolved

TomNicholas reviewed May 14, 2024

View reviewed changes

virtualizarr/xarray.py Outdated Show resolved Hide resolved

TomNicholas reviewed May 14, 2024

View reviewed changes

virtualizarr/dmrpp.py Outdated Show resolved Hide resolved

TomNicholas reviewed May 14, 2024

View reviewed changes

virtualizarr/dmrpp.py Outdated Show resolved Hide resolved

ayushnag and others added 4 commits May 14, 2024 12:07

Merge branch 'TomNicholas:main' into dmr-adapter

f3bfa82

Speedup DMR chunk key parsing

aaf6af2

Merge pull request #1 from agoodm/dmr-adapter

fc8b0d8

chunk key parsing speedup

[pre-commit.ci] auto fixes from pre-commit.com hooks

7b81eeb

for more information, see https://pre-commit.ci

added groups, docs, and bug fixes

8334d0a

TomNicholas reviewed May 30, 2024

View reviewed changes

TomNicholas mentioned this pull request May 30, 2024

Reading from dmrcp index files? #85

Closed

Merge branch 'TomNicholas:main' into dmr-adapter

64d59b1

ayushnag mentioned this pull request Jun 18, 2024

Opening virtual datasets with NASA dmrpp files nsidc/earthaccess#605

Open

TomNicholas mentioned this pull request Jun 19, 2024

Opening virtual datasets (dmr-adapter) nsidc/earthaccess#606

Draft

8 tasks

ayushnag added 2 commits June 21, 2024 15:51

Merge branch 'zarr-developers:main' into dmr-adapter

1a3b787

rework hdf5 parser and group logic

7580fdc

ayushnag commented Jun 27, 2024

View reviewed changes

TomNicholas mentioned this pull request Jun 29, 2024

GDAL Virtual Rasters #166

Open

Merge remote-tracking branch 'upstream/main' into dmr-adapter

52ceba0

ayushnag mentioned this pull request Jul 8, 2024

Missing zlib (deflate) compression level from dmrpp files OPENDAP/bes#954

Open

ayushnag added 2 commits July 10, 2024 13:41

update attrs cast to python dtype

b1f9aee

parser passing tests

ae29176

ayushnag temporarily deployed to test-release August 3, 2024 06:12 — with GitHub Actions Inactive

ayushnag added 3 commits August 4, 2024 12:50

add dmrpp api docs

7638092

resolve conflict

83cb586

resolve releases conflict

cb6feff

ayushnag temporarily deployed to test-release August 4, 2024 07:42 — with GitHub Actions Inactive

TomNicholas mentioned this pull request Aug 9, 2024

Replace this package with a VirtualiZarr reader? MITgcm/xmitgcm#337

Open

TomNicholas reviewed Aug 20, 2024

View reviewed changes

mdsumner mentioned this pull request Aug 20, 2024

Support HDF4? #216

Open

TomNicholas reviewed Aug 23, 2024

View reviewed changes

docs/api.rst Outdated Show resolved Hide resolved

ayushnag and others added 2 commits August 25, 2024 23:27

indexes and docs fix

888ce32

Merge branch 'main' into dmr-adapter

3e15e8e

TomNicholas temporarily deployed to test-release August 26, 2024 14:28 — with GitHub Actions Inactive

TomNicholas reviewed Aug 26, 2024

View reviewed changes

virtualizarr/readers/dmrpp.py Outdated Show resolved Hide resolved

Fix type hint for shape

ee23ec0

TomNicholas approved these changes Aug 26, 2024

View reviewed changes

TomNicholas temporarily deployed to test-release August 26, 2024 14:31 — with GitHub Actions Inactive

TomNicholas reviewed Aug 26, 2024

View reviewed changes

virtualizarr/xarray.py Outdated Show resolved Hide resolved

change how FileType is used

d9337ff

TomNicholas temporarily deployed to test-release August 26, 2024 14:54 — with GitHub Actions Inactive

TomNicholas reviewed Aug 26, 2024

View reviewed changes

virtualizarr/xarray.py Outdated Show resolved Hide resolved

Change FileType check again

6bb9218

TomNicholas temporarily deployed to test-release August 26, 2024 15:04 — with GitHub Actions Inactive

fix storage_options bug

d1948d4

TomNicholas temporarily deployed to test-release August 26, 2024 15:29 — with GitHub Actions Inactive

TomNicholas merged commit d7f0c57 into zarr-developers:main Aug 26, 2024
8 checks passed

TomNicholas mentioned this pull request Aug 26, 2024

Improvements to the DMR++ parser #230

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

open_virtual_dataset with dmr++ #113

open_virtual_dataset with dmr++ #113

ayushnag commented May 14, 2024 •

edited

Loading

agoodm commented May 15, 2024

TomNicholas May 30, 2024

TomNicholas Aug 9, 2024

ayushnag Jun 27, 2024 •

edited

Loading

TomNicholas Aug 9, 2024

ayushnag commented Jun 27, 2024 •

edited

Loading

ayushnag commented Aug 4, 2024

TomNicholas left a comment

TomNicholas Aug 20, 2024

ayushnag Aug 22, 2024

TomNicholas Aug 23, 2024

ayushnag Aug 23, 2024

TomNicholas Aug 23, 2024

ayushnag Aug 23, 2024

TomNicholas Aug 23, 2024

TomNicholas Aug 20, 2024

ayushnag Aug 22, 2024

TomNicholas Aug 20, 2024

ayushnag Aug 22, 2024

ayushnag commented Aug 26, 2024

TomNicholas commented Aug 26, 2024

TomNicholas commented Aug 26, 2024

		group : str, default None
		Group path within the dataset to open. For example netcdf4 and hdf5 groups

open_virtual_dataset with dmr++ #113

open_virtual_dataset with dmr++ #113

Conversation

ayushnag commented May 14, 2024 • edited Loading

agoodm commented May 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayushnag Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayushnag commented Jun 27, 2024 • edited Loading

ayushnag commented Aug 4, 2024

TomNicholas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayushnag commented Aug 26, 2024

TomNicholas commented Aug 26, 2024

TomNicholas commented Aug 26, 2024

ayushnag commented May 14, 2024 •

edited

Loading

ayushnag Jun 27, 2024 •

edited

Loading

ayushnag commented Jun 27, 2024 •

edited

Loading