chore: test metrics for huge value serialization #4262

kostasrim · 2024-12-05T13:41:02Z

assert preemptions in replication test

Signed-off-by: kostas <[email protected]>

adiholden · 2024-12-05T13:47:22Z

tests/dragonfly/instance.py

@@ -425,10 +425,6 @@ def create(self, existing_port=None, path=None, version=100, **kwargs) -> DflyIn
            args.setdefault("list_experimental_v2")
        args.setdefault("log_dir", self.params.log_dir)

-        if version >= 1.21 and "serialization_max_chunk_size" not in args:
-            # Add 1 byte limit for big values
-            args.setdefault("serialization_max_chunk_size", 1)


oh but I do want to have serialization_max_chunk_size for all tests, not just test_replication_all
Just use default of 1024 instead of 1 f.e

I was about to write we do but then I realized I defaulted it to 32_MB 🤣 so yes we don't 🤦 Will fix it!

adiholden · 2024-12-05T13:51:22Z

tests/dragonfly/seeder/__init__.py

-        huge_value_csize=1,
+        huge_value_size=512,
+        # 2 huge entries per container/key as default
+        huge_value_csize=2,


does this configuration means that 1% of keys has 2 huge entries in the container?

adiholden · 2024-12-09T07:56:49Z

tests/dragonfly/cluster_test.py

@@ -1433,7 +1433,7 @@ async def test_migration_with_key_ttl(df_factory):
    assert await nodes[1].client.execute_command("stick k_sticky") == 0


-@dfly_args({"proactor_threads": 4, "cluster_mode": "yes", "serialization_max_chunk_size": 0})


I dont really understand how this is passing now.. you did this change setting serialization_max_chunk_size=0 because they failed when you created this big value serialization PR with disabling the compression

For some reason I thought they got fixed but they failed:
https://github.com/dragonflydb/dragonfly/actions/runs/12182325655/job/33981070627#step:12:1737 here and also cascaded (I introduced this f9f93b1 to mitigate this effect).

I will revert this

adiholden · 2024-12-09T07:57:38Z

tests/dragonfly/replication_test.py

+    key_target = seeder_config["key_target"]
+    # Roughly 1% of the key target will be a big value with 2 elements of 512 bytes each
+    estimated_preemptions = key_target * (0.01)
+    assert preemptions > estimated_preemptions


can you also make a check that we dont preempt for all keys?

adiholden · 2024-12-10T08:58:06Z

tests/dragonfly/instance.py

@@ -426,8 +426,7 @@ def create(self, existing_port=None, path=None, version=100, **kwargs) -> DflyIn
        args.setdefault("log_dir", self.params.log_dir)

        if version >= 1.21 and "serialization_max_chunk_size" not in args:
-            # Add 1 byte limit for big values
-            args.setdefault("serialization_max_chunk_size", 1)
+            args.setdefault("serialization_max_chunk_size", 16384)


if the huge value in seeder is 512, how come serialization_max_chunk_size=16384 will get to check the big value serialisation flow? I am missing something in the math.
lets say we have a backet that contains this key with big value elements, the number of big value entry is 2 so the this big value key serialization size is ~1K so how setting serialization_max_chunk_size=16384 makes us preempt in this container serialization?

It's simple. We flush every 32k (look PushSerializedToChannel in IterateBucketsFb), a bucket has 12 slots on 512 bytes each (and big value is 1k) so:

First bucket -> 6k serialized (no big values) and no preemptions
Second bucket -> 12k serialized(no big values) and no preemptions
Third bucket -> slot 1, slot 2... slot 7 = 15.5k serialized -> next slot overshoots and gets Flushed
Fourth bucket -> 0k serialized start again

So we hit it every third bucket. IMO finding a good value for this is hard, the important part is as you said to hit both flows. Furthermore, if you increase the load and add some really big values we will fail the test since the gh runner is only 8gigs and we have a test case with 1 million elements which roughlty gives us 10k big values

I understand. so in this case the huge values that you insert actually does not trigger this preemption but instead its just the fact that we flush only if the buffer is >32k, right?
so the preemptions stats are not really related to the number of big values inserted by seeder (because they are not really big) but instead they are related to the average bucket serialze size.
so if the bucket serialized size is 6k you will preempt every 3rd bucket.
which actually means that while the max chuck size is lower than 32K the big value serialziation logic will be triggered a lot a not only if we have big values in this bucket.

which actually means that while the max chuck size is lower than 32K the big value serialziation logic will be triggered a lot a not only if we have big values in this bucket.

Yep, it's what I mentioned to you yesterday - big value serialization vs big bucket serialization. This is inherent to the logic we have now but it's easy to decouple shall we wish to.

I understand. so in this case the huge values that you insert actually does not trigger this preemption but instead its

It might or might not. Depend on which value "breaks" the threshold

in our server the value of serialization_max_chunk_size should be bigger than 32K the constant min blob size.
for tests just to make sure our logic works well its ok to make serialization_max_chunk_size smaller, but I actually dont see the point of the seeder with big values in this flow, as they are not realy big to trigger this flow

but I actually dont see the point of the seeder with big values in this flow, as they are not realy big to trigger this flow

Yep and for that we can add a specific test case which can also be fast to run (with relatively low load but huge values). That way we get both :)

p.s. For now at least we test both preemption and non preemptions regardless if the value is really huge. (I also have a test that checks the memory is not doubled -- see memory test in reg tests)

so if you agree with me on that lets make the default huge_value_percentage = 0 and huge_value_size = 10000
and we can just add a specific test case where huge_value_percentage = 2

Signed-off-by: kostas <[email protected]>

kostasrim · 2024-12-11T13:38:55Z

tests/dragonfly/seeder/script-genlib.lua

-    redis.apcall('LPUSH', key, unpack(randstr_sequence(is_huge)))
+    --- TODO -- investigate why second case of replication_test_all fails
+    --- we somehow create a quicklist that is circular and we deadlock
+    redis.apcall('LPUSH', key, unpack(randstr_sequence(false)))


@adiholden that's a bug somehow in our list code

kostasrim · 2024-12-11T13:39:42Z

tests/dragonfly/seeder/script-genlib.lua

    local action = math.random(1, 4)
    if action == 1 then
        redis.apcall('RPOP', key)
    elseif action == 2 then
        redis.apcall('LPOP', key)
    elseif action == 3 then
-      redis.apcall('LPUSH', key, randstr(is_huge))
+      redis.apcall('LPUSH', key, randstr(false))


we turn off big value for all modification ops because it can easilly get out of hand and create containers with a lot of huge entries. As this is uncontrolled, I modified this.

adiholden · 2024-12-12T10:28:55Z

tests/dragonfly/seeder/script-genlib.lua

@@ -114,7 +115,7 @@ function LG_funcs.add_hash(key, keys)
    local blobs
    local is_huge = keys[key]
    if is_huge then
-        blobs  = dragonfly.randstr(LG_funcs.huge_value_size, LG_funcs.csize / 2)
+        blobs  = dragonfly.randstr(LG_funcs.huge_value_size, LG_funcs.huge_value_csize)


can you explain how this is working if line 129 does
for i = 2, LG_funcs.csize, 2 do

Good catch! It's a bug. It should depend on is_huge. Will fix it in 2 minutes

chore: default serialization_max_chunk_size and test metrics

17bffb6

Signed-off-by: kostas <[email protected]>

kostasrim self-assigned this Dec 5, 2024

adiholden reviewed Dec 5, 2024

View reviewed changes

comments

db43324

kostasrim force-pushed the kpr1 branch from 08deba4 to db43324 Compare December 5, 2024 14:58

Merge branch 'main' into kpr1

90d2c5a

kostasrim requested a review from adiholden December 9, 2024 07:46

adiholden reviewed Dec 9, 2024

View reviewed changes

reverts and comments

7d10c52

kostasrim requested a review from adiholden December 9, 2024 13:52

kostasrim changed the title ~~chore: default serialization_max_chunk_size and test metrics~~ chore: test metrics for huge value serialization Dec 9, 2024

adiholden reviewed Dec 10, 2024

View reviewed changes

kostasrim added 3 commits December 11, 2024 10:36

test case

989acb7

Merge branch 'main' into kpr1

a180147

seeder fixes

5a3ada0

Signed-off-by: kostas <[email protected]>

kostasrim commented Dec 11, 2024

View reviewed changes

adiholden reviewed Dec 12, 2024

View reviewed changes

fixes

cd845ee

adiholden approved these changes Dec 12, 2024

View reviewed changes

kostasrim merged commit b37287b into main Dec 12, 2024
9 checks passed

kostasrim deleted the kpr1 branch December 12, 2024 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: test metrics for huge value serialization #4262

chore: test metrics for huge value serialization #4262

kostasrim commented Dec 5, 2024 •

edited

Loading

adiholden Dec 5, 2024

kostasrim Dec 5, 2024

adiholden Dec 5, 2024

kostasrim Dec 5, 2024

adiholden Dec 9, 2024

kostasrim Dec 9, 2024

adiholden Dec 9, 2024

adiholden Dec 10, 2024

kostasrim Dec 10, 2024

adiholden Dec 10, 2024

kostasrim Dec 10, 2024

kostasrim Dec 10, 2024

adiholden Dec 10, 2024

kostasrim Dec 10, 2024 •

edited

Loading

adiholden Dec 10, 2024

kostasrim Dec 11, 2024

kostasrim Dec 11, 2024

adiholden Dec 12, 2024

kostasrim Dec 12, 2024

		@@ -1433,7 +1433,7 @@ async def test_migration_with_key_ttl(df_factory):
		assert await nodes[1].client.execute_command("stick k_sticky") == 0


		@dfly_args({"proactor_threads": 4, "cluster_mode": "yes", "serialization_max_chunk_size": 0})

chore: test metrics for huge value serialization #4262

chore: test metrics for huge value serialization #4262

Conversation

kostasrim commented Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostasrim Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostasrim commented Dec 5, 2024 •

edited

Loading

kostasrim Dec 10, 2024 •

edited

Loading