Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new ObjectPart type #644

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Introduce new ObjectPart type #644

wants to merge 5 commits into from

Conversation

passaro
Copy link
Contributor

@passaro passaro commented Nov 28, 2023

Description of change

The ObjectPart type represents a slice of the content of an S3 object. It contains the information required to identify the object it belongs to and its offset in it. It also maintains checksums to validate its integrity. The new type is used in the prefetcher and replaces both ChecksummedBytes and Part.

Does this change impact existing behavior?

No.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@passaro passaro temporarily deployed to PR integration tests November 28, 2023 17:27 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 28, 2023 17:27 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 28, 2023 17:27 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests November 28, 2023 17:27 — with GitHub Actions Inactive
@jamesbornholt
Copy link
Member

Cool refactor. I kinda thought that we'd want to use ChecksummedBytes to protect other stuff like SSE keys, but maybe that's more trouble than it's worth?

Copy link
Contributor

@dannycjones dannycjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have took a pass over it and the change looks good overall. I put a few comments too, particularly important would be the one around if we want to keep ChecksummedBytes around and use it as an internal detail of ObjectPart.

To get this merged, it needs to be broken down a bit so we can review properly that everything is correct. Perhaps we can split it either into multiple commits in one PR which will be squashed, or just multiple PRs. Assuming we want to merge as is...

  • Add ObjectId which is fairly simple and can be reviewed very quickly. Maybe some of the lines that moved around can be included in that.
  • Introduce the new ObjectPart, and replace usages of Part only first? (the big change)
  • Follow up with replacing usages of ChecksummedBytes in things like the data cache, and removing ChecksummedBytes

mountpoint-s3/src/prefetch/part_stream.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/part.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/part.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/checksums.rs Outdated Show resolved Hide resolved
@passaro passaro temporarily deployed to PR integration tests December 8, 2023 10:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 8, 2023 10:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 8, 2023 10:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 8, 2023 10:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 8, 2023 10:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 12, 2023 11:15 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 12, 2023 11:16 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 12, 2023 11:16 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 12, 2023 11:16 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 12, 2023 11:16 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests December 12, 2023 11:16 — with GitHub Actions Inactive
@passaro
Copy link
Contributor Author

passaro commented Dec 12, 2023

Rebased on main which now includes the ObjectId change. I've temporarily left ObjectPart in the checksums module to make it easier to review the changes compared to ChecksummedBytes.
I'll move it to the object module once reviewed.

Copy link
Contributor

@dannycjones dannycjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided some comments. This change is still too large. How else can we break this down?

  • Refactor ChecksummedBytes to adopt storing bytes and range, rather than a slice of orig_bytes
  • Simple stuff like the renaming of ChecksummedBytes::new to ChecksummedBytes::new_from_inner_data, so the next change becomes adding offset only. Maybe also some of the test changes, where we just move things like constructing a Bytes to its own line. Simple to review, but noise in a risky PR like this.
  • Rename current Part to ObjectPart, so a lot of the uses are again a simple rename PR to review.
  • Then move the part logic into this module, moving the guard from part into this type.

When it's broken down like that (or similar), can we break down those commits at all?

Ultimately, we want to make this as easy as possible for the reviewer to look at and approve considering the code we're actually changing. Try and make the PRs as simple as possible so anyone can review and get these in, so we can complete this refactor in good time.

/// Range over `buffer`
range: Range<usize>,
/// Checksum for this part metadata.
/// Computed over `part_id`, `offset`, `buffer_checksum`, and `range` (but not `buffer`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to fix comment

Suggested change
/// Computed over `part_id`, `offset`, `buffer_checksum`, and `range` (but not `buffer`).
/// Computed over `object_id`, `offset`, `buffer_checksum`, and `range` (but not `buffer`).

Maybe we just remove this sentence, and instead point to the method that computes the checksum. Otherwise, we need to maintain this comment on top of that other method.

mountpoint-s3/src/checksums.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/checksums.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/checksums.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/checksums.rs Outdated Show resolved Hide resolved
mountpoint-s3/src/checksums.rs Outdated Show resolved Hide resolved
Comment on lines 57 to 61
/// Returns the bytes in this part, if its integrity can be validated.
pub fn into_bytes(self) -> Result<Bytes, PartValidationError> {
self.validate()?;
Ok(self.part_slice())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be behind the guard requiring object ID and offset to be provided?

mountpoint-s3/src/checksums.rs Outdated Show resolved Hide resolved
@passaro passaro temporarily deployed to PR integration tests February 15, 2024 13:52 — with GitHub Actions Inactive
@passaro passaro had a problem deploying to PR integration tests March 11, 2024 10:50 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests March 11, 2024 10:50 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests March 11, 2024 10:50 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests March 11, 2024 10:50 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests March 11, 2024 10:50 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests March 11, 2024 10:50 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests March 11, 2024 10:50 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests May 3, 2024 15:07 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests May 3, 2024 15:07 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests May 3, 2024 15:07 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests May 3, 2024 15:07 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests May 3, 2024 15:07 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests May 3, 2024 15:07 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests May 3, 2024 15:07 — with GitHub Actions Failure
Signed-off-by: Alessandro Passaro <[email protected]>
Signed-off-by: Alessandro Passaro <[email protected]>
The `ObjectPart` type represents a slice of the content of an S3 object. It contains the information required to identify the object it belongs to and its offset in it. It also maintains checksums to validate its integrity. The new type is used in the prefetcher and replaces both `ChecksummedBytes` and the original `prefetcher::ObjectPart`.

Signed-off-by: Alessandro Passaro <[email protected]>
@passaro passaro temporarily deployed to PR integration tests May 3, 2024 19:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests May 3, 2024 19:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests May 3, 2024 19:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests May 3, 2024 19:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests May 3, 2024 19:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests May 3, 2024 19:56 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests May 3, 2024 19:56 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants