-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new ObjectPart type #644
base: main
Are you sure you want to change the base?
Conversation
Cool refactor. I kinda thought that we'd want to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have took a pass over it and the change looks good overall. I put a few comments too, particularly important would be the one around if we want to keep ChecksummedBytes
around and use it as an internal detail of ObjectPart
.
To get this merged, it needs to be broken down a bit so we can review properly that everything is correct. Perhaps we can split it either into multiple commits in one PR which will be squashed, or just multiple PRs. Assuming we want to merge as is...
- Add ObjectId which is fairly simple and can be reviewed very quickly. Maybe some of the lines that moved around can be included in that.
- Introduce the new
ObjectPart
, and replace usages of Part only first? (the big change) - Follow up with replacing usages of
ChecksummedBytes
in things like the data cache, and removingChecksummedBytes
Rebased on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided some comments. This change is still too large. How else can we break this down?
- Refactor
ChecksummedBytes
to adopt storing bytes and range, rather than a slice oforig_bytes
- Simple stuff like the renaming of
ChecksummedBytes::new
toChecksummedBytes::new_from_inner_data
, so the next change becomes addingoffset
only. Maybe also some of the test changes, where we just move things like constructing aBytes
to its own line. Simple to review, but noise in a risky PR like this. - Rename current
Part
toObjectPart
, so a lot of the uses are again a simple rename PR to review. - Then move the part logic into this module, moving the guard from
part
into this type.
When it's broken down like that (or similar), can we break down those commits at all?
Ultimately, we want to make this as easy as possible for the reviewer to look at and approve considering the code we're actually changing. Try and make the PRs as simple as possible so anyone can review and get these in, so we can complete this refactor in good time.
mountpoint-s3/src/checksums.rs
Outdated
/// Range over `buffer` | ||
range: Range<usize>, | ||
/// Checksum for this part metadata. | ||
/// Computed over `part_id`, `offset`, `buffer_checksum`, and `range` (but not `buffer`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to fix comment
/// Computed over `part_id`, `offset`, `buffer_checksum`, and `range` (but not `buffer`). | |
/// Computed over `object_id`, `offset`, `buffer_checksum`, and `range` (but not `buffer`). |
Maybe we just remove this sentence, and instead point to the method that computes the checksum. Otherwise, we need to maintain this comment on top of that other method.
mountpoint-s3/src/checksums.rs
Outdated
/// Returns the bytes in this part, if its integrity can be validated. | ||
pub fn into_bytes(self) -> Result<Bytes, PartValidationError> { | ||
self.validate()?; | ||
Ok(self.part_slice()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should still be behind the guard requiring object ID and offset to be provided?
Signed-off-by: Alessandro Passaro <[email protected]>
Signed-off-by: Alessandro Passaro <[email protected]>
Signed-off-by: Alessandro Passaro <[email protected]>
Signed-off-by: Alessandro Passaro <[email protected]>
The `ObjectPart` type represents a slice of the content of an S3 object. It contains the information required to identify the object it belongs to and its offset in it. It also maintains checksums to validate its integrity. The new type is used in the prefetcher and replaces both `ChecksummedBytes` and the original `prefetcher::ObjectPart`. Signed-off-by: Alessandro Passaro <[email protected]>
Description of change
The
ObjectPart
type represents a slice of the content of an S3 object. It contains the information required to identify the object it belongs to and its offset in it. It also maintains checksums to validate its integrity. The new type is used in the prefetcher and replaces bothChecksummedBytes
andPart
.Does this change impact existing behavior?
No.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).