Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate bytes based on ser_json_bytes #1308

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

josh-newman
Copy link
Contributor

@josh-newman josh-newman commented May 31, 2024

Change Summary

ser_json_bytes transforms values (with base64 encoding) during serialization. But validation doesn't do a complementary base64 decode, so a serialization round-trip into the same model type yields an unequal object.

Related issue number

None for this directly. Other users have mentioned base64 decoding, though: pydantic/pydantic#7000 (comment)

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review

Selected Reviewer: @davidhewitt

@josh-newman
Copy link
Contributor Author

please review

@josh-newman
Copy link
Contributor Author

CC @jcharum

Copy link

codecov bot commented May 31, 2024

Codecov Report

Attention: Patch coverage is 85.13514% with 11 lines in your changes missing coverage. Please review.

Files Patch % Lines
src/input/input_string.rs 0.00% 9 Missing ⚠️
src/input/input_json.rs 93.75% 1 Missing ⚠️
src/input/input_python.rs 87.50% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@josh-newman
Copy link
Contributor Author

I'm also interested in accepting both standard and URL-safe base64 encoding. If this PR is acceptable, I plan to send a separate one for that: d8e-ai/pydantic-core@validate-base64...d8e-ai:pydantic-core:validate-base64-any

Copy link

codspeed-hq bot commented May 31, 2024

CodSpeed Performance Report

Merging #1308 will not alter performance

Comparing d8e-ai:validate-base64 (c5b4f7f) with main (0e6b377)

Summary

✅ 155 untouched benchmarks

@davidhewitt
Copy link
Contributor

Thanks for the PR! I fully support the use case and think this makes sense.

That said, I worry about silently breaking user code by changing the meaning of the existing option in this way. How about adding a new flag, e.g. json_bytes_encoding='base64' and preferring that over the existing flag?

Or we could add val_json_bytes so validation and serialization are controlled separately.

@josh-newman
Copy link
Contributor Author

Either option works for me. I don't currently know of a use case where someone would want base64 encoding only and not decoding, so I'd lean towards the new bidirectional encoding flag. But maybe that use case exists, or maybe there's a pattern in other Pydantic config to follow (that I'm not familiar with)?

I'm happy to make changes for whichever you recommend! (And please feel free to make changes yourself, too, if you prefer.)

@davidhewitt
Copy link
Contributor

Having discussed with @sydney-runkle and @samuelcolvin, I think we would prefer to go with a new val_json_bytes option so that users who do have a use-case where they don't want to change validation can continue to use ser_json_bytes.

I agree that the bidirectional encoding flag is probably what most people actually need, so if you strongly want that I'd probably accept it; with the individual flags it should be an easy layer on top which just sets both of them (and we can error if it is set as well as either individual flag).

It's a shame to have the complexity of both forms, but hey, it's where we've ended up. It's possible we could consider deprecating the individual flags in V3.

@josh-newman
Copy link
Contributor Author

Sounds good, I've switched to a new val_json_bytes. This should be ok for us; we're setting ser_json_bytes in a common place for many of our models so now we'll just set both.

I'm guessing there's some more work to do with the new config key (docs, Pydantic's ConfigDict). If this looks good I can open a PR on the other repo for that (maybe after the next release?). Or let me know if you normally handle these things differently.

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good! I'd like to see a couple of changes for consistency with the rest of the library...

As for updating the Python code in pydantic, yes that's a good idea. I'd suggest opening a PR already, you can test it all locally and then add pytest.xfail markers in the PR which we can then remove when this support gets released in pydantic-core.

Comment on lines 389 to 390
with pytest.raises(ValueError):
v.validate_json('"wrong!"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably need to expect a ValidationError here, where the input was bytes but the wrong format. It might mean adding a new error type e.g. bytes_format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (I think; let me know if I did the right thing with the error type.)

Ok(bytes) => Ok(EitherBytes::from(bytes)),
Err(err) => Err(PyValueError::new_err(format!("Base64 decode error: {err}"))),
},
BytesMode::Hex => Err(PyValueError::new_err("Hex deserialization is not supported")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be very desirable to add hex support at the same time as adding this flag. The hex crate is a common standard in the Rust ecosystem and we could use it trivially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

josh-newman added a commit to d8e-ai/pydantic that referenced this pull request Jun 26, 2024
@josh-newman
Copy link
Contributor Author

I've created pydantic/pydantic#9770. I'm guessing it needs to wait until after the pydantic-core upgrade, otherwise a user may see this new option and be confused about why it doesn't work?

@josh-newman
Copy link
Contributor Author

I've created pydantic/pydantic#9772 to address the test-pydantic-integration failure.

@josh-newman
Copy link
Contributor Author

Well, I think I'm stuck 😅. pydantic/pydantic#9772 has test failures saying the new documented error type isn't in pydantic-core, and this PR has a CI failure saying the new error type isn't documented there.

Copy link
Contributor Author

@josh-newman josh-newman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I forgot to publish these comments earlier.)

Comment on lines 389 to 390
with pytest.raises(ValueError):
v.validate_json('"wrong!"')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (I think; let me know if I did the right thing with the error type.)

Ok(bytes) => Ok(EitherBytes::from(bytes)),
Err(err) => Err(PyValueError::new_err(format!("Base64 decode error: {err}"))),
},
BytesMode::Hex => Err(PyValueError::new_err("Hex deserialization is not supported")),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants