Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/1446: Ensure Pydantic Models Can Be Created withtyping.pyspark.DataFrame or typing.pyspark_sql.DataFrame Generic #1447

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

brayan07
Copy link

@brayan07 brayan07 commented Dec 15, 2023

In this PR we resolve the issue reported in #1446, where any Pydantic model with a pandera.typing.pyspark.DataFrame or pandera.typing.pyspark_sql.DataFrame always throws a confusing ValidationError.

For clarity, we want to make sure the following leads to the expected behavior:

import pyspark.sql.types as T

from pandera.pyspark import DataFrameModel, Field
from pandera.typing.pyspark_sql import DataFrame
from pydantic import BaseModel
from pyspark.sql import SparkSession


class SampleSchema(DataFrameModel):
    """
    Sample schema model with data checks.
    """

    product: T.StringType() = Field()
    price: T.IntegerType() = Field()


class PydanticContainer(BaseModel):
    """
    Pydantic container with a DataFrameModel as a field.
    """

    data: DataFrame[SampleSchema]

    class Config:
        arbitrary_types_allowed = True

We do this by creating a _PydanticIntegrationMixIn that can be used by both pandera.typing.pyspark_sql.DataFrame and pandera.typing.pyspark.DataFrame.

The content of the mixin is a variation of the methods used in pandera.typing.pandas.DataFrame.

Note:
We assume that any pyspark dataframe used in a pydantic model will be validated eagerly for both pyspark.pandas and pyspark_sql. The default behavior for pyspark_sql dataframes is normally lazy validation, but this makes less sense to me when using a Pydantic model.

* Disable irrelevant pylint warnings

Signed-off-by: Brayan Jaramillo <[email protected]>
Signed-off-by: Brayan Jaramillo <[email protected]>
@cosmicBboy
Copy link
Collaborator

Thanks for the PR @brayan07! Looks like there are some lint and unit test errors. Be sure to run tests and setup pre-commit in your dev env to make sure those are passing.

@brayan07
Copy link
Author

brayan07 commented Dec 19, 2023

Still running into issues with tests unrelated to new code locally. Will try to resolve before pushing again. Thanks!

@brayan07
Copy link
Author

I'm getting the same failed tests locally for the main branch, as well as for this branch, with make nox-conda. I don't think it's what I added but something in the dev setup. Would it be alright if we ran the CI workflow one more time to help me debug?

@cosmicBboy cosmicBboy closed this Jan 25, 2024
@cosmicBboy cosmicBboy reopened this Jan 25, 2024
@cosmicBboy
Copy link
Collaborator

Hi @brayan07 sorry for the delayed review on this!

I believe the test errors are coming from from pydantic import GetCoreSchemaHandler. Will need to move that import into the PYDANTIC_V2 conditional

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants