Validate bytes based on ser_json_bytes #1308

josh-newman · 2024-05-31T23:47:45Z

Change Summary

ser_json_bytes transforms values (with base64 encoding) during serialization. But validation doesn't do a complementary base64 decode, so a serialization round-trip into the same model type yields an unequal object.

Related issue number

None for this directly. Other users have mentioned base64 decoding, though: pydantic/pydantic#7000 (comment)

Checklist

Unit tests for the changes exist
Documentation reflects the changes where applicable
Pydantic tests pass with this pydantic-core (except for expected changes)
My PR is ready to review

Selected Reviewer: @davidhewitt

josh-newman · 2024-05-31T23:47:53Z

please review

josh-newman · 2024-05-31T23:48:00Z

CC @jcharum

codecov · 2024-05-31T23:52:41Z

Codecov Report

Attention: Patch coverage is 85.13514% with 11 lines in your changes missing coverage. Please review.

Files	Patch %	Lines
src/input/input_string.rs	0.00%	9 Missing ⚠️
src/input/input_json.rs	93.75%	1 Missing ⚠️
src/input/input_python.rs	87.50%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

josh-newman · 2024-05-31T23:55:20Z

I'm also interested in accepting both standard and URL-safe base64 encoding. If this PR is acceptable, I plan to send a separate one for that: d8e-ai/pydantic-core@validate-base64...d8e-ai:pydantic-core:validate-base64-any

codspeed-hq · 2024-05-31T23:59:08Z

CodSpeed Performance Report

Merging #1308 will not alter performance

_{Comparing d8e-ai:validate-base64 (c5b4f7f) with main (0e6b377)}

Summary

✅ 155 untouched benchmarks

davidhewitt · 2024-06-03T07:11:30Z

Thanks for the PR! I fully support the use case and think this makes sense.

That said, I worry about silently breaking user code by changing the meaning of the existing option in this way. How about adding a new flag, e.g. json_bytes_encoding='base64' and preferring that over the existing flag?

Or we could add val_json_bytes so validation and serialization are controlled separately.

josh-newman · 2024-06-03T18:39:37Z

Either option works for me. I don't currently know of a use case where someone would want base64 encoding only and not decoding, so I'd lean towards the new bidirectional encoding flag. But maybe that use case exists, or maybe there's a pattern in other Pydantic config to follow (that I'm not familiar with)?

I'm happy to make changes for whichever you recommend! (And please feel free to make changes yourself, too, if you prefer.)

davidhewitt · 2024-06-10T12:53:15Z

Having discussed with @sydney-runkle and @samuelcolvin, I think we would prefer to go with a new val_json_bytes option so that users who do have a use-case where they don't want to change validation can continue to use ser_json_bytes.

I agree that the bidirectional encoding flag is probably what most people actually need, so if you strongly want that I'd probably accept it; with the individual flags it should be an easy layer on top which just sets both of them (and we can error if it is set as well as either individual flag).

It's a shame to have the complexity of both forms, but hey, it's where we've ended up. It's possible we could consider deprecating the individual flags in V3.

…te-base64

josh-newman · 2024-06-11T22:26:23Z

Sounds good, I've switched to a new val_json_bytes. This should be ok for us; we're setting ser_json_bytes in a common place for many of our models so now we'll just set both.

I'm guessing there's some more work to do with the new config key (docs, Pydantic's ConfigDict). If this looks good I can open a PR on the other repo for that (maybe after the next release?). Or let me know if you normally handle these things differently.

davidhewitt

Thanks, looks good! I'd like to see a couple of changes for consistency with the rest of the library...

As for updating the Python code in pydantic, yes that's a good idea. I'd suggest opening a PR already, you can test it all locally and then add pytest.xfail markers in the PR which we can then remove when this support gets released in pydantic-core.

davidhewitt · 2024-06-12T09:33:56Z

tests/test_json.py

+    with pytest.raises(ValueError):
+        v.validate_json('"wrong!"')


I think we probably need to expect a ValidationError here, where the input was bytes but the wrong format. It might mean adding a new error type e.g. bytes_format.

Done. (I think; let me know if I did the right thing with the error type.)

davidhewitt · 2024-06-12T09:35:55Z

src/validators/config.rs

+                Ok(bytes) => Ok(EitherBytes::from(bytes)),
+                Err(err) => Err(PyValueError::new_err(format!("Base64 decode error: {err}"))),
+            },
+            BytesMode::Hex => Err(PyValueError::new_err("Hex deserialization is not supported")),


I think it would be very desirable to add hex support at the same time as adding this flag. The hex crate is a common standard in the Rust ecosystem and we could use it trivially.

josh-newman · 2024-06-26T23:12:22Z

I've created pydantic/pydantic#9770. I'm guessing it needs to wait until after the pydantic-core upgrade, otherwise a user may see this new option and be confused about why it doesn't work?

josh-newman · 2024-06-26T23:53:00Z

I've created pydantic/pydantic#9772 to address the test-pydantic-integration failure.

josh-newman · 2024-06-26T23:58:43Z

Well, I think I'm stuck 😅. pydantic/pydantic#9772 has test failures saying the new documented error type isn't in pydantic-core, and this PR has a CI failure saying the new error type isn't documented there.

…te-base64

josh-newman

(I forgot to publish these comments earlier.)

josh-newman · 2024-06-26T23:09:21Z

tests/test_json.py

+    with pytest.raises(ValueError):
+        v.validate_json('"wrong!"')


Done. (I think; let me know if I did the right thing with the error type.)

josh-newman · 2024-06-26T23:09:38Z

src/validators/config.rs

+                Ok(bytes) => Ok(EitherBytes::from(bytes)),
+                Err(err) => Err(PyValueError::new_err(format!("Base64 decode error: {err}"))),
+            },
+            BytesMode::Hex => Err(PyValueError::new_err("Hex deserialization is not supported")),


pydantic-hooky bot added the ready for review label May 31, 2024

pydantic-hooky bot assigned davidhewitt May 31, 2024

Implement validation based on ser_json_bytes to support round trip

f8addc1

josh-newman force-pushed the validate-base64 branch from 4c44225 to f8addc1 Compare May 31, 2024 23:56

Merge remote-tracking branch 'refs/remotes/upstream/main' into valida…

c1c84e3

…te-base64

josh-newman force-pushed the validate-base64 branch from 6e7fc01 to f7ca32f Compare June 11, 2024 22:13

Switch to val_json_bytes config key

1e71ab7

josh-newman force-pushed the validate-base64 branch from f7ca32f to 1e71ab7 Compare June 11, 2024 22:24

davidhewitt reviewed Jun 12, 2024

View reviewed changes

josh-newman added 4 commits June 26, 2024 09:54

Merge remote-tracking branch 'upstream/main' into validate-base64

c415a20

Raise new ValidationError type for encoding

171a45b

Implement hex

c5930cb

Switch to hex crate

409a077

josh-newman added a commit to d8e-ai/pydantic that referenced this pull request Jun 26, 2024

Add tests anticipating pydantic/pydantic-core#1308

259b0b8

josh-newman mentioned this pull request Jun 26, 2024

Add Config.val_json_bytes pydantic/pydantic#9770

Open

5 tasks

josh-newman added 2 commits June 26, 2024 16:39

Update pyi

8018a84

Avoid nested f-string quotes, for older Pythons

5870e0b

josh-newman mentioned this pull request Jun 26, 2024

Document new ValidationError type bytes_invalid_encoding pydantic/pydantic#9772

Closed

5 tasks

Merge remote-tracking branch 'refs/remotes/upstream/main' into valida…

c5b4f7f

…te-base64

josh-newman commented Jul 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate bytes based on ser_json_bytes #1308

Validate bytes based on ser_json_bytes #1308

josh-newman commented May 31, 2024 •

edited by pydantic-hooky bot

Loading

josh-newman commented May 31, 2024

josh-newman commented May 31, 2024

codecov bot commented May 31, 2024 •

edited

Loading

josh-newman commented May 31, 2024

codspeed-hq bot commented May 31, 2024 •

edited

Loading

davidhewitt commented Jun 3, 2024

josh-newman commented Jun 3, 2024

davidhewitt commented Jun 10, 2024

josh-newman commented Jun 11, 2024

davidhewitt left a comment

davidhewitt Jun 12, 2024

josh-newman Jun 26, 2024

davidhewitt Jun 12, 2024

josh-newman Jun 26, 2024

josh-newman commented Jun 26, 2024

josh-newman commented Jun 26, 2024

josh-newman commented Jun 26, 2024

josh-newman left a comment

josh-newman Jun 26, 2024

josh-newman Jun 26, 2024

Validate bytes based on ser_json_bytes #1308

Are you sure you want to change the base?

Validate bytes based on ser_json_bytes #1308

Conversation

josh-newman commented May 31, 2024 • edited by pydantic-hooky bot Loading

Change Summary

Related issue number

Checklist

josh-newman commented May 31, 2024

josh-newman commented May 31, 2024

codecov bot commented May 31, 2024 • edited Loading

Codecov Report

josh-newman commented May 31, 2024

codspeed-hq bot commented May 31, 2024 • edited Loading

CodSpeed Performance Report

Merging #1308 will not alter performance

Summary

davidhewitt commented Jun 3, 2024

josh-newman commented Jun 3, 2024

davidhewitt commented Jun 10, 2024

josh-newman commented Jun 11, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt Jun 12, 2024

Choose a reason for hiding this comment

josh-newman Jun 26, 2024

Choose a reason for hiding this comment

davidhewitt Jun 12, 2024

Choose a reason for hiding this comment

josh-newman Jun 26, 2024

Choose a reason for hiding this comment

josh-newman commented Jun 26, 2024

josh-newman commented Jun 26, 2024

josh-newman commented Jun 26, 2024

josh-newman left a comment

Choose a reason for hiding this comment

josh-newman Jun 26, 2024

Choose a reason for hiding this comment

josh-newman Jun 26, 2024

Choose a reason for hiding this comment

josh-newman commented May 31, 2024 •

edited by pydantic-hooky bot

Loading

codecov bot commented May 31, 2024 •

edited

Loading

codspeed-hq bot commented May 31, 2024 •

edited

Loading