Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_virtual_dataset with dmr++ #113

Merged
merged 28 commits into from
Aug 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
18b53bd
basic dmr parsing functionality
ayushnag May 13, 2024
47d8901
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 14, 2024
f3bfa82
Merge branch 'TomNicholas:main' into dmr-adapter
ayushnag May 14, 2024
aaf6af2
Speedup DMR chunk key parsing
agoodm May 14, 2024
fc8b0d8
Merge pull request #1 from agoodm/dmr-adapter
ayushnag May 14, 2024
7b81eeb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 14, 2024
8334d0a
added groups, docs, and bug fixes
ayushnag May 16, 2024
64d59b1
Merge branch 'TomNicholas:main' into dmr-adapter
ayushnag Jun 3, 2024
1a3b787
Merge branch 'zarr-developers:main' into dmr-adapter
ayushnag Jun 21, 2024
7580fdc
rework hdf5 parser and group logic
ayushnag Jun 27, 2024
52ceba0
Merge remote-tracking branch 'upstream/main' into dmr-adapter
ayushnag Jul 3, 2024
b1f9aee
update attrs cast to python dtype
ayushnag Jul 10, 2024
ae29176
parser passing tests
ayushnag Jul 14, 2024
6e763f9
match main manifest dtypes
ayushnag Jul 14, 2024
0824ed2
Merge branch 'zarr-developers:main' into dmr-adapter
ayushnag Jul 15, 2024
659ab65
Merge branch 'zarr-developers:main' into dmr-adapter
ayushnag Jul 15, 2024
b8531c8
Merge branch 'zarr-developers:main' into dmr-adapter
ayushnag Jul 19, 2024
0125d71
Merge branch 'zarr-developers:main' into dmr-adapter
ayushnag Aug 2, 2024
ef8aa9c
modularize dmrpp.py
ayushnag Aug 3, 2024
7638092
add dmrpp api docs
ayushnag Aug 4, 2024
83cb586
resolve conflict
ayushnag Aug 4, 2024
cb6feff
resolve releases conflict
ayushnag Aug 4, 2024
888ce32
indexes and docs fix
ayushnag Aug 25, 2024
3e15e8e
Merge branch 'main' into dmr-adapter
TomNicholas Aug 26, 2024
ee23ec0
Fix type hint for shape
TomNicholas Aug 26, 2024
d9337ff
change how FileType is used
TomNicholas Aug 26, 2024
6bb9218
Change FileType check again
TomNicholas Aug 26, 2024
d1948d4
fix storage_options bug
TomNicholas Aug 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions virtualizarr/manifests/manifest.py
Copy link
Contributor Author

@ayushnag ayushnag Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int32 to int64 change had to be made since I ran into some large byte offsets with the Atlas ICE-SAT dataset. Here is an example error: OverflowError: Python integer 6751178683 out of bounds for int32

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that justifies #177!

Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@ class ChunkManifest:
"""

_paths: np.ndarray[Any, np.dtypes.StringDType] # type: ignore[name-defined]
_offsets: np.ndarray[Any, np.dtype[np.int32]]
_lengths: np.ndarray[Any, np.dtype[np.int32]]
_offsets: np.ndarray[Any, np.dtype[np.int64]]
_lengths: np.ndarray[Any, np.dtype[np.int64]]

def __init__(self, entries: dict) -> None:
"""
Expand Down Expand Up @@ -100,8 +100,8 @@ def __init__(self, entries: dict) -> None:

# Initializing to empty implies that entries with path='' are treated as missing chunks
paths = np.empty(shape=shape, dtype=np.dtypes.StringDType()) # type: ignore[attr-defined]
offsets = np.empty(shape=shape, dtype=np.dtype("int32"))
lengths = np.empty(shape=shape, dtype=np.dtype("int32"))
offsets = np.empty(shape=shape, dtype=np.dtype("int64"))
lengths = np.empty(shape=shape, dtype=np.dtype("int64"))

# populate the arrays
for key, entry in entries.items():
Expand All @@ -128,8 +128,8 @@ def __init__(self, entries: dict) -> None:
def from_arrays(
cls,
paths: np.ndarray[Any, np.dtype[np.dtypes.StringDType]], # type: ignore[name-defined]
offsets: np.ndarray[Any, np.dtype[np.int32]],
lengths: np.ndarray[Any, np.dtype[np.int32]],
offsets: np.ndarray[Any, np.dtype[np.int64]],
lengths: np.ndarray[Any, np.dtype[np.int64]],
) -> "ChunkManifest":
"""
Create manifest directly from numpy arrays containing the path and byte range information.
Expand Down Expand Up @@ -161,11 +161,11 @@ def from_arrays(
raise ValueError(
f"paths array must have a numpy variable-length string dtype, but got dtype {paths.dtype}"
)
if offsets.dtype != np.dtype("int32"):
if offsets.dtype != np.dtype("int64"):
raise ValueError(
f"offsets array must have 32-bit integer dtype, but got dtype {offsets.dtype}"
)
if lengths.dtype != np.dtype("int32"):
if lengths.dtype != np.dtype("int64"):
raise ValueError(
f"lengths array must have 32-bit integer dtype, but got dtype {lengths.dtype}"
)
Expand Down
Loading