-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: test metrics for huge value serialization #4262
Conversation
kostasrim
commented
Dec 5, 2024
•
edited
Loading
edited
- assert preemptions in replication test
Signed-off-by: kostas <[email protected]>
@@ -425,10 +425,6 @@ def create(self, existing_port=None, path=None, version=100, **kwargs) -> DflyIn | |||
args.setdefault("list_experimental_v2") | |||
args.setdefault("log_dir", self.params.log_dir) | |||
|
|||
if version >= 1.21 and "serialization_max_chunk_size" not in args: | |||
# Add 1 byte limit for big values | |||
args.setdefault("serialization_max_chunk_size", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh but I do want to have serialization_max_chunk_size for all tests, not just test_replication_all
Just use default of 1024 instead of 1 f.e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to write we do but then I realized I defaulted it to 32_MB 🤣 so yes we don't 🤦 Will fix it!
huge_value_csize=1, | ||
huge_value_size=512, | ||
# 2 huge entries per container/key as default | ||
huge_value_csize=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this configuration means that 1% of keys has 2 huge entries in the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
tests/dragonfly/cluster_test.py
Outdated
@@ -1433,7 +1433,7 @@ async def test_migration_with_key_ttl(df_factory): | |||
assert await nodes[1].client.execute_command("stick k_sticky") == 0 | |||
|
|||
|
|||
@dfly_args({"proactor_threads": 4, "cluster_mode": "yes", "serialization_max_chunk_size": 0}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont really understand how this is passing now.. you did this change setting serialization_max_chunk_size=0 because they failed when you created this big value serialization PR with disabling the compression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I thought they got fixed but they failed:
https://github.com/dragonflydb/dragonfly/actions/runs/12182325655/job/33981070627#step:12:1737 here and also cascaded (I introduced this f9f93b1 to mitigate this effect).
I will revert this
key_target = seeder_config["key_target"] | ||
# Roughly 1% of the key target will be a big value with 2 elements of 512 bytes each | ||
estimated_preemptions = key_target * (0.01) | ||
assert preemptions > estimated_preemptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also make a check that we dont preempt for all keys?
@@ -426,8 +426,7 @@ def create(self, existing_port=None, path=None, version=100, **kwargs) -> DflyIn | |||
args.setdefault("log_dir", self.params.log_dir) | |||
|
|||
if version >= 1.21 and "serialization_max_chunk_size" not in args: | |||
# Add 1 byte limit for big values | |||
args.setdefault("serialization_max_chunk_size", 1) | |||
args.setdefault("serialization_max_chunk_size", 16384) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the huge value in seeder is 512, how come serialization_max_chunk_size=16384 will get to check the big value serialisation flow? I am missing something in the math.
lets say we have a backet that contains this key with big value elements, the number of big value entry is 2 so the this big value key serialization size is ~1K so how setting serialization_max_chunk_size=16384 makes us preempt in this container serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simple. We flush every 32k
(look PushSerializedToChannel in IterateBucketsFb), a bucket has 12 slots on 512 bytes each (and big value is 1k) so:
First bucket -> 6k serialized (no big values) and no preemptions
Second bucket -> 12k serialized(no big values) and no preemptions
Third bucket -> slot 1, slot 2... slot 7 = 15.5k serialized -> next slot
overshoots and gets Flushed
Fourth bucket -> 0k serialized start again
So we hit it every third bucket. IMO finding a good value for this is hard, the important part is as you said to hit both flows. Furthermore, if you increase the load and add some really big values we will fail the test since the gh runner is only 8gigs and we have a test case with 1 million elements which roughlty gives us 10k big values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. so in this case the huge values that you insert actually does not trigger this preemption but instead its just the fact that we flush only if the buffer is >32k, right?
so the preemptions stats are not really related to the number of big values inserted by seeder (because they are not really big) but instead they are related to the average bucket serialze size.
so if the bucket serialized size is 6k you will preempt every 3rd bucket.
which actually means that while the max chuck size is lower than 32K the big value serialziation logic will be triggered a lot a not only if we have big values in this bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which actually means that while the max chuck size is lower than 32K the big value serialziation logic will be triggered a lot a not only if we have big values in this bucket.
Yep, it's what I mentioned to you yesterday - big value serialization vs big bucket serialization. This is inherent to the logic we have now but it's easy to decouple shall we wish to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. so in this case the huge values that you insert actually does not trigger this preemption but instead its
It might or might not. Depend on which value "breaks" the threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in our server the value of serialization_max_chunk_size should be bigger than 32K the constant min blob size.
for tests just to make sure our logic works well its ok to make serialization_max_chunk_size smaller, but I actually dont see the point of the seeder with big values in this flow, as they are not realy big to trigger this flow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I actually dont see the point of the seeder with big values in this flow, as they are not realy big to trigger this flow
Yep and for that we can add a specific test case which can also be fast to run (with relatively low load but huge values). That way we get both :)
p.s. For now at least we test both preemption and non preemptions regardless if the value is really huge. (I also have a test that checks the memory is not doubled -- see memory test in reg tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if you agree with me on that lets make the default huge_value_percentage = 0 and huge_value_size = 10000
and we can just add a specific test case where huge_value_percentage = 2
Signed-off-by: kostas <[email protected]>
redis.apcall('LPUSH', key, unpack(randstr_sequence(is_huge))) | ||
--- TODO -- investigate why second case of replication_test_all fails | ||
--- we somehow create a quicklist that is circular and we deadlock | ||
redis.apcall('LPUSH', key, unpack(randstr_sequence(false))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adiholden that's a bug somehow in our list code
local action = math.random(1, 4) | ||
if action == 1 then | ||
redis.apcall('RPOP', key) | ||
elseif action == 2 then | ||
redis.apcall('LPOP', key) | ||
elseif action == 3 then | ||
redis.apcall('LPUSH', key, randstr(is_huge)) | ||
redis.apcall('LPUSH', key, randstr(false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we turn off big value for all modification ops because it can easilly get out of hand and create containers with a lot of huge entries. As this is uncontrolled, I modified this.
@@ -114,7 +115,7 @@ function LG_funcs.add_hash(key, keys) | |||
local blobs | |||
local is_huge = keys[key] | |||
if is_huge then | |||
blobs = dragonfly.randstr(LG_funcs.huge_value_size, LG_funcs.csize / 2) | |||
blobs = dragonfly.randstr(LG_funcs.huge_value_size, LG_funcs.huge_value_csize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain how this is working if line 129 does
for i = 2, LG_funcs.csize, 2 do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It's a bug. It should depend on is_huge
. Will fix it in 2 minutes