Nit: `__post_init__` vs pydantic `@model_validator(mode="after")` #204

TomNicholas · 2024-07-30T00:21:55Z

In the ZArray class then after #193 we now have multiple types of validation checks, which together look like this:

from pydantic import (
    BaseModel,
    field_validator,
    model_validator,
)

class ZArray(BaseModel):
    ...

    @field_validator("dtype")
    @classmethod
    def validate_dtype(cls, dtype) -> np.dtype:
        # Your custom validation logic here
        # Convert numpy.dtype to a format suitable for Pydantic
        return np.dtype(dtype)

    def __post_init__(self) -> None:
        if len(self.shape) != len(self.chunks):
            raise ValueError(
                "Dimension mismatch between array shape and chunk shape. "
                f"Array shape {self.shape} has ndim={self.shape} but chunk shape {self.chunks} has ndim={len(self.chunks)}"
            )

    @model_validator(mode="after")
    def _check_fill_value(self) -> Self:
        if self.fill_value is None:
            self.fill_value = ZARR_DEFAULT_FILL_VALUE.get(self.dtype, 0.0)
        return self

It seems to me like having both __post_init__ and @model_validator(mode="after") is unnecessarily complicated. I think _check_fill_value should just be rewritten using @field_validator like validate_dtype is.

The check inside __post_init__ can't be done using field_validator as it involves comparing two correctly-instatiated fields against one another. It is not clear to me whether best practice with pydantic is to use __post_init__ or @model_validator(mode="after") - both seem supported, and to do similar things.

@ghidalgo3 is there some subtlety I've missed here or can I clean this up a bit? (I know eventually this should all be replaced by ZarrV3Metadata in #175.)

The text was updated successfully, but these errors were encountered:

ghidalgo3 · 2024-07-30T00:51:07Z

_check_fill_value depends on both self.fill_value and self.dtype so it needs to be a @model_validator and not just a @field_validator. As for using __post_init__, I think it can be replaced by another @model_validator function with a better name like _chunk_dimensions_match_data_check.

It seems to me like having both post_init and @model_validator(mode="after") is unnecessarily complicated.

I agree. My interpretation (and I am far from a Python expert) is that __post_init__ is a stepping stone from stdlib.dataclass to a pydantic model. If you're already starting with pydantic, there's no need to have a __post_init__ and instead you should write @model_validators.

However, one very valid argument for __post_init__ + dataclasses is that @model_validator is a pydantic>=2.0 feature and __post_init__ still works in pydantic>=1.10 if you use it dataclasses. A surprising number of libraries are still incompatible with pydantic>=2, and I even bit my own tongue after adding the call to @model_validator because it then forced a dependency on pydantic2 that I wasn't expecting!

If you do consolidate everything around model_validator, it might be a good idea to also put a version constraint in the pyproject.toml for pydantic to let users know they need pydantic>=2.0 (or whenever @model_validator was introduced).

TomNicholas · 2024-07-30T02:10:24Z

_check_fill_value depends on both self.fill_value and self.dtype so it needs to be a @model_validator and not just a @field_validator.

👍 fair point.

Interesting point about dependencies. I would have thought though, that we actually should be trying to move away from having any pydantic dependency at all... Zarr upstream specifically chose not to have that dependency, and the main place we use it is ZArray, which we're hoping to replace with zarr.V3MetaData. (The other places we used it were in the ChunkManifest, which we stopped needing after switching to numpy arrays as an intermediate format, and in the ChunkEntry, which honestly doesn't do that much as a class.)

As for pydantic v1 vs v2 I think the fact we're planning to get rid of it makes it less of an issue. We also do require relatively recently versions of packages anyway due to our numpy>=2.0 dependency.

Overall 🤷‍♂️ Maybe let's just use __post_init__ on the basis that it's closer to ArrayV3Metadata, which is a dataclass?

ghidalgo3 · 2024-07-30T14:30:22Z

Sounds good to me! Then you'll need to make ZArray a @dataclass and move all the field validation logic into __post_init__.

TomAugspurger · 2024-07-30T14:43:28Z

+1 to moving away from pydantic and just using dataclasses. I can take that on at some point this week.

TomAugspurger mentioned this issue Aug 5, 2024

Implement pydantic models as dataclasses #210

Merged

5 tasks

TomAugspurger closed this as completed in #210 Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nit: `__post_init__` vs pydantic `@model_validator(mode="after")` #204

Nit: `__post_init__` vs pydantic `@model_validator(mode="after")` #204

TomNicholas commented Jul 30, 2024

ghidalgo3 commented Jul 30, 2024

TomNicholas commented Jul 30, 2024

ghidalgo3 commented Jul 30, 2024

TomAugspurger commented Jul 30, 2024

Nit: __post_init__ vs pydantic @model_validator(mode="after") #204

Nit: __post_init__ vs pydantic @model_validator(mode="after") #204

Comments

TomNicholas commented Jul 30, 2024

ghidalgo3 commented Jul 30, 2024

TomNicholas commented Jul 30, 2024

ghidalgo3 commented Jul 30, 2024

TomAugspurger commented Jul 30, 2024

Nit: `__post_init__` vs pydantic `@model_validator(mode="after")` #204

Nit: `__post_init__` vs pydantic `@model_validator(mode="after")` #204