Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: SQLModel table model not validated #1696

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

AlpAribal
Copy link
Contributor

When using a SQLModel class with table=True as dtype, schema is not validated. This is because such SQLModel classes do not validate data at init time (see here). This PR solves this by explicitly calling model_validate/parse_obj instead of instantiating the class.

Minimal example (python=3.8 sqlmodel=0.0.19 pandera=0.19.3):

# example.py
import pandas as pd
import pandera as pa
from pandera.engines.pandas_engine import PydanticModel
from sqlmodel import SQLModel, Field

class Record(SQLModel, table=True):
    name: str = Field(primary_key=True)

class PydanticSchema(pa.DataFrameModel):
    class Config:
        dtype = PydanticModel(Record)

df = pd.DataFrame({"name": [3]})
PydanticSchema.validate(df)

Without the fix, validation succeeds while only emitting a warning from model_dump():

$ python example.py 
UserWarning: Pydantic serializer warnings:
  Expected `str` but got `int` - serialized value may not be as expected

With the fix, a SchemaError is raised:

$ python example.py 
...
pandera.errors.SchemaError: Error while coercing 'PydanticSchema' to type <class '__main__.Record'>: Could not coerce <class 'pandas.core.frame.DataFrame'> data_container into type <class '__main__.Record'>
   index failure_case
0      0  {'name': 1}

@AlpAribal AlpAribal force-pushed the bugfix/sqlmodel-table-not-validated branch from ce0553f to 9935ffe Compare June 19, 2024 22:10
@cosmicBboy
Copy link
Collaborator

thanks @AlpAribal do you mind rebasing these changes onto the main branch? failing tests should go away aftaer that

Copy link

codecov bot commented Jun 22, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.55%. Comparing base (812b2a8) to head (59e7f30).
Report is 119 commits behind head on main.

Files Patch % Lines
pandera/engines/pandas_engine.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1696      +/-   ##
==========================================
- Coverage   94.28%   93.55%   -0.73%     
==========================================
  Files          91      117      +26     
  Lines        7013     8843    +1830     
==========================================
+ Hits         6612     8273    +1661     
- Misses        401      570     +169     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AlpAribal AlpAribal force-pushed the bugfix/sqlmodel-table-not-validated branch from 9935ffe to 39451bc Compare June 22, 2024 20:40
@cosmicBboy
Copy link
Collaborator

linter is making some valid complaints:

************* Module pandera.engines.pandas_engine
pandera/engines/pandas_engine.py:1320:26: E1101: Instance of 'Field' has no 'model_validate' member (no-member)
pandera/engines/pandas_engine.py:13[22](https://github.com/unionai-oss/pandera/actions/runs/9628170517/job/26571671455?pr=1696#step:10:23):26: E1101: Instance of 'Field' has no 'parse_obj' member (no-member)

You might have to do something like:

            try:
                _type = typing.cast(Type[BaseModel], self.type)
                # pylint: disable=not-callable
                if PYDANTIC_V2:
                    row = self.type.model_validate(row).model_dump()
                else:
                    row = self.type.parse_obj(row).dict()

@AlpAribal AlpAribal force-pushed the bugfix/sqlmodel-table-not-validated branch from 6e15ee3 to a61a828 Compare June 24, 2024 20:40
@AlpAribal
Copy link
Contributor Author

You might have to do something like:

            try:
                _type = typing.cast(Type[BaseModel], self.type)
                # pylint: disable=not-callable
                if PYDANTIC_V2:
                    row = self.type.model_validate(row).model_dump()
                else:
                    row = self.type.parse_obj(row).dict()

Unfortunately, casting did not work, pylint still complains. This SO answer hints that pylint discards the type hint and uses the actual value (Field in this case). I was able to pass pylint by moving the declaration of self.type into __init__, but I am not sure if there can be any unwanted side effects of this. Another alternative is to ignore this particular pylint warning.

@cosmicBboy
Copy link
Collaborator

@AlpAribal I'm okay with ignoring the pylint warning

@AlpAribal AlpAribal force-pushed the bugfix/sqlmodel-table-not-validated branch from 13198fc to d5b7608 Compare July 6, 2024 08:13
@AlpAribal AlpAribal force-pushed the bugfix/sqlmodel-table-not-validated branch from d5b7608 to 59e7f30 Compare July 6, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants