Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: test metrics for huge value serialization #4262

Merged
merged 8 commits into from
Dec 12, 2024
Merged

chore: test metrics for huge value serialization #4262

merged 8 commits into from
Dec 12, 2024

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Dec 5, 2024

  • assert preemptions in replication test

@kostasrim kostasrim self-assigned this Dec 5, 2024
@@ -425,10 +425,6 @@ def create(self, existing_port=None, path=None, version=100, **kwargs) -> DflyIn
args.setdefault("list_experimental_v2")
args.setdefault("log_dir", self.params.log_dir)

if version >= 1.21 and "serialization_max_chunk_size" not in args:
# Add 1 byte limit for big values
args.setdefault("serialization_max_chunk_size", 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh but I do want to have serialization_max_chunk_size for all tests, not just test_replication_all
Just use default of 1024 instead of 1 f.e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to write we do but then I realized I defaulted it to 32_MB 🤣 so yes we don't 🤦 Will fix it!

huge_value_csize=1,
huge_value_size=512,
# 2 huge entries per container/key as default
huge_value_csize=2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this configuration means that 1% of keys has 2 huge entries in the container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@kostasrim kostasrim requested a review from adiholden December 9, 2024 07:46
@@ -1433,7 +1433,7 @@ async def test_migration_with_key_ttl(df_factory):
assert await nodes[1].client.execute_command("stick k_sticky") == 0


@dfly_args({"proactor_threads": 4, "cluster_mode": "yes", "serialization_max_chunk_size": 0})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont really understand how this is passing now.. you did this change setting serialization_max_chunk_size=0 because they failed when you created this big value serialization PR with disabling the compression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I thought they got fixed but they failed:
https://github.com/dragonflydb/dragonfly/actions/runs/12182325655/job/33981070627#step:12:1737 here and also cascaded (I introduced this f9f93b1 to mitigate this effect).

I will revert this

key_target = seeder_config["key_target"]
# Roughly 1% of the key target will be a big value with 2 elements of 512 bytes each
estimated_preemptions = key_target * (0.01)
assert preemptions > estimated_preemptions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also make a check that we dont preempt for all keys?

@kostasrim kostasrim requested a review from adiholden December 9, 2024 13:52
@kostasrim kostasrim changed the title chore: default serialization_max_chunk_size and test metrics chore: test metrics for huge value serialization Dec 9, 2024
@@ -426,8 +426,7 @@ def create(self, existing_port=None, path=None, version=100, **kwargs) -> DflyIn
args.setdefault("log_dir", self.params.log_dir)

if version >= 1.21 and "serialization_max_chunk_size" not in args:
# Add 1 byte limit for big values
args.setdefault("serialization_max_chunk_size", 1)
args.setdefault("serialization_max_chunk_size", 16384)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the huge value in seeder is 512, how come serialization_max_chunk_size=16384 will get to check the big value serialisation flow? I am missing something in the math.
lets say we have a backet that contains this key with big value elements, the number of big value entry is 2 so the this big value key serialization size is ~1K so how setting serialization_max_chunk_size=16384 makes us preempt in this container serialization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simple. We flush every 32k (look PushSerializedToChannel in IterateBucketsFb), a bucket has 12 slots on 512 bytes each (and big value is 1k) so:

First bucket -> 6k serialized (no big values) and no preemptions
Second bucket -> 12k serialized(no big values) and no preemptions
Third bucket -> slot 1, slot 2... slot 7 = 15.5k serialized -> next slot overshoots and gets Flushed
Fourth bucket -> 0k serialized start again

So we hit it every third bucket. IMO finding a good value for this is hard, the important part is as you said to hit both flows. Furthermore, if you increase the load and add some really big values we will fail the test since the gh runner is only 8gigs and we have a test case with 1 million elements which roughlty gives us 10k big values

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. so in this case the huge values that you insert actually does not trigger this preemption but instead its just the fact that we flush only if the buffer is >32k, right?
so the preemptions stats are not really related to the number of big values inserted by seeder (because they are not really big) but instead they are related to the average bucket serialze size.
so if the bucket serialized size is 6k you will preempt every 3rd bucket.
which actually means that while the max chuck size is lower than 32K the big value serialziation logic will be triggered a lot a not only if we have big values in this bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which actually means that while the max chuck size is lower than 32K the big value serialziation logic will be triggered a lot a not only if we have big values in this bucket.

Yep, it's what I mentioned to you yesterday - big value serialization vs big bucket serialization. This is inherent to the logic we have now but it's easy to decouple shall we wish to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. so in this case the huge values that you insert actually does not trigger this preemption but instead its

It might or might not. Depend on which value "breaks" the threshold

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our server the value of serialization_max_chunk_size should be bigger than 32K the constant min blob size.
for tests just to make sure our logic works well its ok to make serialization_max_chunk_size smaller, but I actually dont see the point of the seeder with big values in this flow, as they are not realy big to trigger this flow

Copy link
Contributor Author

@kostasrim kostasrim Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I actually dont see the point of the seeder with big values in this flow, as they are not realy big to trigger this flow

Yep and for that we can add a specific test case which can also be fast to run (with relatively low load but huge values). That way we get both :)

p.s. For now at least we test both preemption and non preemptions regardless if the value is really huge. (I also have a test that checks the memory is not doubled -- see memory test in reg tests)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if you agree with me on that lets make the default huge_value_percentage = 0 and huge_value_size = 10000
and we can just add a specific test case where huge_value_percentage = 2

redis.apcall('LPUSH', key, unpack(randstr_sequence(is_huge)))
--- TODO -- investigate why second case of replication_test_all fails
--- we somehow create a quicklist that is circular and we deadlock
redis.apcall('LPUSH', key, unpack(randstr_sequence(false)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adiholden that's a bug somehow in our list code

local action = math.random(1, 4)
if action == 1 then
redis.apcall('RPOP', key)
elseif action == 2 then
redis.apcall('LPOP', key)
elseif action == 3 then
redis.apcall('LPUSH', key, randstr(is_huge))
redis.apcall('LPUSH', key, randstr(false))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we turn off big value for all modification ops because it can easilly get out of hand and create containers with a lot of huge entries. As this is uncontrolled, I modified this.

@@ -114,7 +115,7 @@ function LG_funcs.add_hash(key, keys)
local blobs
local is_huge = keys[key]
if is_huge then
blobs = dragonfly.randstr(LG_funcs.huge_value_size, LG_funcs.csize / 2)
blobs = dragonfly.randstr(LG_funcs.huge_value_size, LG_funcs.huge_value_csize)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain how this is working if line 129 does
for i = 2, LG_funcs.csize, 2 do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It's a bug. It should depend on is_huge. Will fix it in 2 minutes

@kostasrim kostasrim merged commit b37287b into main Dec 12, 2024
9 checks passed
@kostasrim kostasrim deleted the kpr1 branch December 12, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants