Only support compressed reads if the compression setting is present #8238

arpad-m · 2024-07-02T17:06:41Z

PR #8106 was created with the assumption that no blob is larger than 256 MiB. Due to #7852 we have checking for writes of blobs larger than that limit, but we didn't have checking for reads of such large blobs: in theory, we could be reading these blobs every day but we just don't happen to write the blobs for some reason.

Therefore, we now add a warning for reads of such large blobs as well.

To make deploying compression less dangerous, we therefore only assume a blob is compressed if the compression setting is present in the config. This also means that we can't back out of compression once we enabled it.

Part of #5431

Otherwise, warn

arpad-m · 2024-07-02T17:09:14Z

I'd like most of the changes of this PR to be temporary: that is, eventually the compression setting should disappear and we assume any blob larger than 256 MiB must be a compressed blob. That change can happen however months in the future.

github-actions · 2024-07-02T17:57:34Z

3000 tests run: 2885 passed, 0 failed, 115 skipped (full report)

Code coverage* (full report)

functions: 32.7% (6934 of 21215 functions)
lines: 50.0% (54354 of 108614 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
4220e2f at 2024-07-02T17:57:34.054Z :recycle:}

pageserver/src/tenant/blob_io.rs

koivunej · 2024-07-03T09:45:59Z

pageserver/src/tenant/blob_io.rs

+            if compression_bits > BYTE_UNCOMPRESSED {
+                warn!("reading key above future limit ({len} bytes)");
+            }


Doesn't this mean that we are reading a previously compressed as uncompressed?

Shouldn't we also or alternatively check if the read bytes start with the zstd fourcc/magic?

Anyway, I am having a hard time parsing this compression_bits. I get it that here it can be the u32 after anding it with 0x0f or 0x7f which means -- aha, it is never masked.

Related note: On line L332R341 the LEN_COMPRESSION_BIT_MASK is used as literal 0xf0:

-assert_eq!(len_buf[0] & 0xf0, 0); +assert_eq!(len_buf[0] & LEN_COMPRESSION_BIT_MASK, 0);

Ok... So perhaps I know understand. Possible compression_bits are:

match compression_bits >> 4 { 0 => /* image layer written before the compression support or small value? */, 1..8 => /* reserved */, 8 => /* uncompressed blob */, 9 => /* zstd */, 10..=15 => /* undefined or written before compression support and too large, which we warn here? */ _ => unreachable!("u4"), }

If this is correctly understood, then okay maybe ... The compression_bits > BYTE_COMPRESSED just looks so off, in my mind a bitfield doesn't support ordered comparison. It'd be nice to have enums and matches for these. Err nope that cannot be correct.

Did you test that this warning is produced with some hand-crafted image file?

EDIT: rust snippet had 1..8 and 10..=15 wrong way around, possibly.

Doesn't this mean that we are reading a previously compressed as uncompressed?

If the compression setting is disabled, yes. This has the consequence that we can't turn off compression easily any more, but I think it's okay to have it for a while, after which point we'll (mostly) revert this PR.

It'd be nice to have enums and matches for these

match is not good for that as FOOBAR => is equivalent to a variable capture.

Is my match correct? Did you test if this is hit? Will that be used as a success criteria for the compression support? If so, what is the plan to read all image layers?

Other questions remain, magic/fourcc instead of reserving more bits?

Is my match correct?

almost correct, the 1..8 range doesn't have the highest bit set so it's an indicator for small uncompressed values.

match compression_bits >> 4 { 0..8 => /* small, uncompressed value below 128 bytes */, 8 => /* uncompressed blob */, 9 => /* zstd */, 10..=15 => /* reserved or written before compression support and too large, which we warn here */ _ => unreachable!("u4"), }

Will that be used as a success criteria for the compression support?

What do you mean, can you expand?

If so, what is the plan to read all image layers?

??

magic/fourcc instead of reserving more bits?

What do you mean by that? I do not think we should autodetect zstd magic here.

Christian wants this to be merged as-is, doing that now.

success criteria
plan to read all image layers

Christian mentioned these in #8238 (comment). I was wondering similarly what are the next steps.

magic/fourcc

What do you mean by that? I do not think we should autodetect zstd magic here.

zstd always starts the compressed bytes with the same 4 magic bytes. I was thinking if we should instead use that knowledge instead of awkwardly reserving bits, as I had no idea what was the plan for the next step.. But yeah, seems there is a plan after all.

For future reference, the slack thread where I proposed to merge asap is https://neondb.slack.com/archives/C074JQU7NER/p1720021276202079

problame

side question: why is PageServerConf::image_compression an Option? Like with EvictionPolicy, a simple ImageCompressionAlgorithm::Disabled and been designed as the #[default].

To make deploying compression less dangerous, we therefore only assume a blob is compressed if the compression setting is present in the config.
This also means that we can't back out of compression once we enabled it.

I'm confused by this.

IMO once we have written the first compressed blob to S3, we'll have to support decompressing layers with compressed blobs into perpituity. The PageServerConf::image_compression conflig flag should only be controlling whether we write new blobs with compression or not. But not affect reading of exisitng blobs.

So, I guess I'm rejecting the fundamental idea behind this PR. Or I'm missing something.

pageserver/src/tenant/storage_layer/image_layer.rs

arpad-m · 2024-07-03T14:02:32Z

why is PageServerConf::image_compression an Option?

That's what the option type is for, to indicate None. But of course it can be also inlined.

we'll have to support decompressing layers with compressed blobs into perpituity.

yes, this PR is only meant for a temporary period until compression is rolled out everywhere. at that point, I'd like to revert most of it. Of course, for the transition period, and maybe also beyond, it might make sense to have the ability to easily revert the config setting.

The risk is that we might be reading 256 MiB large blobs in production right now, and on deploy of current main we yield corrupted data. This PR is just to avoid that situation.

Maybe both of your suggestions could be combined, and one could change ImageCompressionAlgorithm to:

enum ImageCompressionAlgorithm {
    DisabledNoDecompress,
    Disabled,
    Zstd { level: Option<i8> },
}

ImageCompressionAlgorithm::DisabledNoDecompress would interpret any blobs larger than 256 MiB as uncompressed, while ImageCompressionAlgorithm::Disabled would still decompress them. DisabledNoDecompress would be the default at the start, for a couple of weeks until compression is rolled out. Then, we can switch to Zstd or Disabled as the default, and remove the DisabledNoDecompress variant.

How does that sound?

problame · 2024-07-03T15:08:48Z

Ok, thanks for clarifying.

I think it has to go like so:

roll out software that guarantees we don't write new blobs >= 256MiB
then do a full scrubber run, ensuring that all blobs are < 256MiB
invariant now: all length fields have the LEN_COMPRESSION_BIT_MASK bits zeroed, so, we can use them for compression
start shipping/enabling code that interprets the LEN_COMPRESSION_BIT_MASK

But let's take that discussion into Slack.

Regarding this PR, I now understand its importance and we should definitely merge it before the code from #8106 hits prod, because that would mean doing (4) before invariant (3) is established.

problame

Approving this, the decompress code needs to be dead at runtime until we've established invariant (3), see my previous post.

@koivunej

As per @koivunej 's request in #8238 (comment) , use a runtime param instead of monomorphizing the function based on the value. Part of #5431

This flattens the compression algorithm setting, removing the `Option<_>` wrapping layer and making handling of the setting easier. It also adds a specific setting for *disabled* compression with the continued ability to read copmressed data, giving us the option to more easily back out of a compression rollout, should the need arise, which was one of the limitations of #8238. Implements my suggestion from #8238 (comment) , inspired by Christian's review in #8238 (review) . Part of #5431

arpad-m added 2 commits July 2, 2024 18:24

Add read_compressed param to BlockCursor

c2e54e9

Only support compressed reads if the setting is present

4220e2f

Otherwise, warn

arpad-m requested a review from a team as a code owner July 2, 2024 17:06

arpad-m requested a review from problame July 2, 2024 17:06

arpad-m mentioned this pull request Jul 2, 2024

Epic: pageserver image layer compression #5431

Open

16 tasks

arpad-m requested a review from koivunej July 3, 2024 09:10

koivunej reviewed Jul 3, 2024

View reviewed changes

pageserver/src/tenant/blob_io.rs Show resolved Hide resolved

koivunej reviewed Jul 3, 2024

View reviewed changes

problame reviewed Jul 3, 2024

View reviewed changes

pageserver/src/tenant/storage_layer/image_layer.rs Show resolved Hide resolved

problame approved these changes Jul 3, 2024

View reviewed changes

arpad-m merged commit e0891ec into main Jul 3, 2024
64 checks passed

arpad-m deleted the arpad/compression_2 branch July 3, 2024 16:02

arpad-m mentioned this pull request Jul 3, 2024

Use bool param for round_trip_test_compressed #8252

Merged

arpad-m added a commit that referenced this pull request Jul 4, 2024

Use bool param for round_trip_test_compressed (#8252)

a004d27

As per @koivunej 's request in #8238 (comment) , use a runtime param instead of monomorphizing the function based on the value. Part of #5431

arpad-m mentioned this pull request Jul 4, 2024

Flatten compression algorithm setting #8265

Merged

arpad-m mentioned this pull request Jul 6, 2024

Remove ImageCompressionAlgorithm::DisabledNoDecompress #8300

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only support compressed reads if the compression setting is present #8238

Only support compressed reads if the compression setting is present #8238

arpad-m commented Jul 2, 2024 •

edited

Loading

arpad-m commented Jul 2, 2024

github-actions bot commented Jul 2, 2024

koivunej Jul 3, 2024

koivunej Jul 3, 2024 •

edited

Loading

arpad-m Jul 3, 2024

koivunej Jul 3, 2024

arpad-m Jul 3, 2024

arpad-m Jul 3, 2024

koivunej Jul 4, 2024

problame Jul 4, 2024

problame left a comment

arpad-m commented Jul 3, 2024

problame commented Jul 3, 2024

problame left a comment

Only support compressed reads if the compression setting is present #8238

Only support compressed reads if the compression setting is present #8238

Conversation

arpad-m commented Jul 2, 2024 • edited Loading

arpad-m commented Jul 2, 2024

github-actions bot commented Jul 2, 2024

3000 tests run: 2885 passed, 0 failed, 115 skipped (full report)

Code coverage* (full report)

koivunej Jul 3, 2024

Choose a reason for hiding this comment

koivunej Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

arpad-m Jul 3, 2024

Choose a reason for hiding this comment

koivunej Jul 3, 2024

Choose a reason for hiding this comment

arpad-m Jul 3, 2024

Choose a reason for hiding this comment

arpad-m Jul 3, 2024

Choose a reason for hiding this comment

koivunej Jul 4, 2024

Choose a reason for hiding this comment

problame Jul 4, 2024

Choose a reason for hiding this comment

problame left a comment

Choose a reason for hiding this comment

arpad-m commented Jul 3, 2024

problame commented Jul 3, 2024

problame left a comment

Choose a reason for hiding this comment

arpad-m commented Jul 2, 2024 •

edited

Loading

koivunej Jul 3, 2024 •

edited

Loading