Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pydantic models as dataclasses #210

Merged

Conversation

TomAugspurger
Copy link
Contributor

This removes our pydantic dependency by reimplementing them with dataclasses. There are a few breaking changes:

  1. The classes won't automatically cast the argumetns to the declared type. IMO, that's the preferable behavior. Some backwards compatability shims have been added for np.dtype, but perhpas we want to remove that too.
  2. The models won't have any of the methods they previously inherited from pydantic.BaseModel. This is probably good for user-facing objects, we now have full control over the public API.
  3. We had to reorder some of the fields on ZArray, since dataclasses is stricter about positional arguments. I've aligned the order with zarr.create.

The tests did need to be updated in a few places (we compare some strings, and the order changed, and we don't automatically cast lists to tuples in the init for shape and size).

This removes our pydantic dependency by reimplementing them with
dataclasses. There are a few breaking changes:

1. The classes won't automatically cast the argumetns to the declared
   type. IMO, that's the preferable behavior. Some backwards
   compatability shims have been added for np.dtype, but perhpas we
   want to remove that too.
2. The models won't have any of the methods they previously inherited
   from pydantic.BaseModel. This is probably good for user-facing
   objects, we now have full control over the public API.
3. We had to reorder some of the fields on ZArray, since dataclasses
   is stricter about positional arguments. I've aligned the order
   with `zarr.create`.

The tests did need to be updated in a few places (we compare some
strings, and the order changed, and we don't automatically cast lists to
tuples in the init for shape and size).
virtualizarr/manifests/manifest.py Outdated Show resolved Hide resolved
order: Optional[Literal["C"] | Literal["F"]] = None,
shape: Optional[tuple[int, ...]] = None,
zarr_format: Optional[Literal[2] | Literal[3]] = None,
**kwargs: Any,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the list of fields that can be replaced here in the signature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As opposed to letting users call dataclasses.replace? I think ZArray shouldn't have fields that aren't replaceable, are there any fields like that?

Copy link
Member

@TomNicholas TomNicholas Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if dataclasses.replace exists already then explicitly writing the list of fields isn't a good enough reason to override the default method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no wait dataclasses.replace is a module-level function, not a method on the dataclass type. In that case I change my mind - we should list out the arguments and call dataclasses.replace from within the ZArray.replace method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ZArray shouldn't have fields that aren't replaceable, are there any fields like that?

Just to be clear, the behavior here (and on main) is to return a new instance of ZArray with the same values as the old instance, but new values specified in the keyword arguments.

we should list out the arguments

Sounds good. I think we'll be able to do that (we'll need to remember to keep the two signatures in step though).

@TomAugspurger
Copy link
Contributor Author

This will cause a conflict with #206, so waiting on that to go first.

docs/releases.rst Outdated Show resolved Hide resolved
@@ -94,7 +94,7 @@ def test_accessor_to_kerchunk_dict(self):
"refs": {
".zgroup": '{"zarr_format":2}',
".zattrs": "{}",
"a/.zarray": '{"chunks":[2,3],"compressor":null,"dtype":"<i8","fill_value":null,"filters":null,"order":"C","shape":[2,3],"zarr_format":2}',
"a/.zarray": '{"shape":[2,3],"chunks":[2,3],"dtype":"<i8","fill_value":null,"order":"C","compressor":null,"filters":null,"zarr_format":2}',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So order of fields in the kerchunk references matters now? Did it matter before? This should be fine but it's also the sort of thing that kerchunk might not be consistent about...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it still didn't matter before and still doesn't, as far as using Kerhcunk goes :) This reason this matters for the test is because we're doing string comparison on the "a/.zarray" field. We could also parse that JSON into a dict before doing the comparision, and I wouldn't have needed to update the expected.

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great @TomAugspurger !

@TomNicholas TomNicholas added the dependencies Updates a dependency label Aug 5, 2024
@TomAugspurger TomAugspurger mentioned this pull request Aug 6, 2024
3 tasks
@TomAugspurger TomAugspurger marked this pull request as ready for review August 7, 2024 17:54
@TomAugspurger
Copy link
Contributor Author

Planning to merge this tonight.

@TomAugspurger TomAugspurger merged commit fdab54c into zarr-developers:main Aug 8, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Updates a dependency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nit: __post_init__ vs pydantic @model_validator(mode="after")
3 participants