feat: add support for big values in SeederV2 #4222

kostasrim · 2024-11-28T16:52:31Z

add support for big values in SeederV2

Signed-off-by: kostas <[email protected]>

kostasrim · 2024-11-28T16:56:59Z

tests/dragonfly/seeder/__init__.py

@@ -137,6 +138,8 @@ def __init__(
        data_size=100,
        collection_size=None,
        types: typing.Optional[typing.List[str]] = None,
+        huge_value_percentage=0,


For now I will keep a flat probability for each key in the key target to contain a huge value

kostasrim · 2024-11-28T16:57:51Z

tests/dragonfly/seeder/script-genlib.lua

-        local elements = dragonfly.randstr(LG_funcs.esize, LG_funcs.csize)
+        local elements
+        if huge_entry() then
+            -- Hard coded 10 here, meaning up to 10 huge entries per set


//TODO so I don't forget to fix it. Replace 10 with LG_funcs.csize()

please fix :)

tests/dragonfly/seeder/script-generate.lua

kostasrim · 2024-12-02T09:37:05Z

tests/dragonfly/seeder/script-genlib.lua

+    LG_funcs.huge_value_size = large_val_sz
+end
+
+local function huge_entry()


I would like to expose this as a metric such that once the seeder finishes it will preempt how many big values it created. However, since this code is a script I don't see a "smart way".

Maybe a seeder can create a key in dragonfly (set big_values number_of_big_values) which can then the poll ?

@chakaz any ideas/thoughts?

We can simply iterate over all db keys in this lua script. That shouldn't be too hard, nor slow.

(we can use SCAN, TYPE and MEMORY USAGE in the script to get all the info we seek)

I thought about this and we don't really need scan. In fact I baked this metric in the lua script which we return -- works perfectly

chakaz

Generally LG
I wish we could reduce repetitiveness around the lua script. Perhaps we could add some get_size() function that returns esize or large_val_sz depending on huge_entry()'s value, and then it'll use that value instead of all if-else?

chakaz · 2024-12-02T08:43:10Z

tests/dragonfly/replication_test.py

 from .seeder import StaticSeeder
+from .seeder import SeederBase


Suggested change

from .seeder import StaticSeeder

from .seeder import SeederBase

from .seeder import StaticSeeder, SeederBase

chakaz · 2024-12-02T08:45:45Z

tests/dragonfly/replication_test.py

@@ -132,6 +161,12 @@ async def check():
    # Check data after stable state stream
    await check()

+    if big_value:
+        info = await c_master.info()
+        preemptions = info["big_value_preemptions"]


Where is this computed? I couldn't find

Where is this computed? I couldn't find

It's a new metric I introduced in my other PR. I will remove this for now and we will add it after it;s merged

tests/dragonfly/seeder/script-generate.lua

chakaz · 2024-12-02T09:52:57Z

tests/dragonfly/seeder/script-genlib.lua

+    LG_funcs.huge_value_size = large_val_sz
+end
+
+local function huge_entry()


We can simply iterate over all db keys in this lua script. That shouldn't be too hard, nor slow.

chakaz · 2024-12-02T09:53:58Z

tests/dragonfly/seeder/script-genlib.lua

+    LG_funcs.huge_value_size = large_val_sz
+end
+
+local function huge_entry()


(we can use SCAN, TYPE and MEMORY USAGE in the script to get all the info we seek)

chakaz · 2024-12-02T09:54:26Z

tests/dragonfly/seeder/script-genlib.lua

+end
+
+local function huge_entry()
+    local perc = LG_funcs.huge_value_percentage / 100


nit: this now isn't percent, right? fraction or ratio would be more accurate

chakaz · 2024-12-02T10:36:44Z

tests/dragonfly/seeder/script-genlib.lua

-        local elements = dragonfly.randstr(LG_funcs.esize, LG_funcs.csize)
+        local elements
+        if huge_entry() then
+            -- Hard coded 10 here, meaning up to 10 huge entries per set


please fix :)

Signed-off-by: kostas <[email protected]>

chakaz · 2024-12-02T19:09:21Z

tests/dragonfly/seeder/script-genlib.lua

+
+local huge_entries = 0
+
+local function huge_entry()


Don't you want huge_entry() to depend on the key? Such that some keys are huge, while others aren't, based on (say) their hash?
The reason I say this is because the seeder uses many operations to generate the values, like many lpush, hset, etc. If we do 100 operations per key (just throwing numbers here), doing 5% huge will make them all roughly of the same size...

tests/dragonfly/seeder/script-genlib.lua

tests/dragonfly/seeder/script-generate.lua

chakaz · 2024-12-04T07:09:35Z

tests/dragonfly/replication_test.py

+        #        (4, [4, 4], dict(key_target=10_000), 1_000, False),
+        #        pytest.param(6, [6, 6, 6], dict(key_target=100_000), 20_000, False, marks=M_OPT),
+        #        # Skewed tests with different thread ratio
+        #        pytest.param(8, 6 * [1], dict(key_target=5_000), 2_000, False, marks=M_SLOW),
+        #        pytest.param(2, [8, 8], dict(key_target=10_000), 2_000, False, marks=M_SLOW),
+        #        # Test with big value size
+        #        pytest.param(2, [2], dict(key_target=1_000, data_size=10_000), 100, False, marks=M_SLOW),
+        #        # Test with big value and big value serialization
+        #        pytest.param(2, [2], dict(key_target=1_000, data_size=10_000), 100, True, marks=M_SLOW),
+        #        # Stress test
+        #        pytest.param(
+        #            8, [8, 8], dict(key_target=1_000_000, units=16), 50_000, False, marks=M_STRESS
+        #        ),


Is this intentionally commented out?

Signed-off-by: kostas <[email protected]>

chakaz · 2024-12-04T20:43:31Z

tests/dragonfly/seeder_test.py

+        huge_value_percentage=0,
+        huge_value_size=0,


why is this needed?

Because we get really big containers which causes the memory to grow really fast. That's why I rather have two specific parameters. One for the size of each element on the container and one for the total elements per container

chakaz · 2024-12-04T20:44:19Z

tests/dragonfly/seeder/script-generate.lua

+    if op_type ~= "string" and op_type ~= "json" then
+      is_huge = huge_entry()
+    end


Please add a comment explaining that only string and json are handled here, because other types are handled below (and where)

We don't handle json or string here, we just roll a dice to decide if it should be huge value or not. There are no huge values for strings or json so that's why we skip the roll

ok, my bad, can you please add that as a comment?

feat: add support for big values in SeederV2

2fde647

Signed-off-by: kostas <[email protected]>

kostasrim self-assigned this Nov 28, 2024

kostasrim commented Nov 28, 2024

View reviewed changes

tests/dragonfly/seeder/script-generate.lua Show resolved Hide resolved

test the seeder changes

1f9ccd1

adiholden requested a review from chakaz December 2, 2024 08:25

add defaults

7d58739

kostasrim commented Dec 2, 2024

View reviewed changes

chakaz reviewed Dec 2, 2024

View reviewed changes

comments + metrics

0eeda72

Signed-off-by: kostas <[email protected]>

kostasrim changed the title ~~[wip -- do not review] feat: add support for big values in SeederV2~~ feat: add support for big values in SeederV2 Dec 2, 2024

kostasrim added 2 commits December 2, 2024 15:49

remove unused

a55c79a

fixes

089c8ad

chakaz reviewed Dec 2, 2024

View reviewed changes

fixes

a2020c4

chakaz reviewed Dec 3, 2024

View reviewed changes

tests/dragonfly/seeder/script-genlib.lua Show resolved Hide resolved

tests/dragonfly/seeder/script-generate.lua Show resolved Hide resolved

small text

473ee5c

kostasrim requested a review from chakaz December 3, 2024 16:33

chakaz reviewed Dec 4, 2024

View reviewed changes

fixes

edb2ed6

Signed-off-by: kostas <[email protected]>

kostasrim requested a review from chakaz December 4, 2024 07:43

kostasrim added 5 commits December 4, 2024 13:23

turn on allocation tracker

2eae4e2

fix uncontrolled huge entries

b0e85fb

tune

215a74a

polish and remove ci changes

bf4f1c8

fix zset

0408086

chakaz reviewed Dec 4, 2024

View reviewed changes

add comments

141dc71

kostasrim requested a review from chakaz December 5, 2024 08:14

chakaz approved these changes Dec 5, 2024

View reviewed changes

kostasrim enabled auto-merge (squash) December 5, 2024 08:19

kostasrim merged commit 7ccad66 into main Dec 5, 2024
9 checks passed

kostasrim deleted the kpr1 branch December 5, 2024 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for big values in SeederV2 #4222

feat: add support for big values in SeederV2 #4222

kostasrim commented Nov 28, 2024 •

edited

Loading

kostasrim Nov 28, 2024

kostasrim Nov 28, 2024 •

edited

Loading

chakaz Dec 2, 2024

kostasrim Dec 2, 2024

chakaz Dec 2, 2024

chakaz Dec 2, 2024

kostasrim Dec 2, 2024

chakaz left a comment

chakaz Dec 2, 2024

chakaz Dec 2, 2024

kostasrim Dec 2, 2024

chakaz Dec 2, 2024

chakaz Dec 2, 2024

chakaz Dec 2, 2024

chakaz Dec 2, 2024

chakaz Dec 2, 2024

chakaz Dec 4, 2024

chakaz Dec 4, 2024

kostasrim Dec 5, 2024

chakaz Dec 4, 2024

kostasrim Dec 5, 2024

chakaz Dec 5, 2024

kostasrim Dec 5, 2024

		from .seeder import StaticSeeder
		from .seeder import SeederBase

	from .seeder import StaticSeeder
	from .seeder import SeederBase
	from .seeder import StaticSeeder, SeederBase

feat: add support for big values in SeederV2 #4222

feat: add support for big values in SeederV2 #4222

Conversation

kostasrim commented Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

kostasrim Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chakaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kostasrim commented Nov 28, 2024 •

edited

Loading

kostasrim Nov 28, 2024 •

edited

Loading