-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Reduce memory consumption when migrating huge values #4119
Conversation
Before this PR: We serialized a `RESTORE` command for each entry into a string, and then push that string to the wire. This means that, when serializing an entry of size X, we consume 2X memory during the migration. This PR: Instead of serializing into a string, we serialize into the wire directly. Luckily, only a small modification is needed in the way we interact with `crc64`, which works well even in chunks. Fixes #4100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would worth adding a small test?
RdbSerializer serializer(compression_mode, [&](size_t size, SerializerBase::FlushState) { | ||
auto size_before = total_size; | ||
flush(&serializer); | ||
DCHECK_EQ(size, total_size - size_before); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to flush unconditionally on large values ? (even when big value serialization is off ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I was under the (false) assumption that this is called only above serialization_max_chunk_size
threshold. I'll modify this area.
} | ||
|
||
total_size += sv.size(); | ||
crc = crc64(crc, reinterpret_cast<const uint8_t*>(sv.data()), sv.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice you can accumulate crc based on the previous value + the new serialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤓
io::StringSink s; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do want to flush directly to a string and then to the sink ? Is there a way to do it directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason I do it as such is to be able to compute the CRC value for the additional data, so we can flush to the sink directly (otherwise we can't compute the crc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(otherwise we can't compute the crc)
Yep that answers it 😄
Anddd welcome back 🥳 |
@chakaz I think we need the same for journalStreamer/RestoreStreamer |
Before this PR:
We serialized a
RESTORE
command for each entry into a string, and then push that string to the wire.This means that, when serializing an entry of size X, we consume 2X memory during the migration.
This PR:
Instead of serializing into a string, we serialize into the wire directly. Luckily, only a small modification is needed in the way we interact with
crc64
, which works well even in chunks.Fixes #4100