-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for big values in SeederV2 #4222
Conversation
kostasrim
commented
Nov 28, 2024
•
edited
Loading
edited
- add support for big values in SeederV2
Signed-off-by: kostas <[email protected]>
tests/dragonfly/seeder/__init__.py
Outdated
@@ -137,6 +138,8 @@ def __init__( | |||
data_size=100, | |||
collection_size=None, | |||
types: typing.Optional[typing.List[str]] = None, | |||
huge_value_percentage=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I will keep a flat probability for each key in the key target to contain a huge value
local elements = dragonfly.randstr(LG_funcs.esize, LG_funcs.csize) | ||
local elements | ||
if huge_entry() then | ||
-- Hard coded 10 here, meaning up to 10 huge entries per set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//TODO so I don't forget to fix it. Replace 10
with LG_funcs.csize()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix :)
LG_funcs.huge_value_size = large_val_sz | ||
end | ||
|
||
local function huge_entry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to expose this as a metric such that once the seeder finishes it will preempt how many big values it created. However, since this code is a script I don't see a "smart way".
Maybe a seeder can create a key in dragonfly (set big_values number_of_big_values
) which can then the poll ?
@chakaz any ideas/thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simply iterate over all db keys in this lua script. That shouldn't be too hard, nor slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we can use SCAN
, TYPE
and MEMORY USAGE
in the script to get all the info we seek)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this and we don't really need scan
. In fact I baked this metric in the lua script which we return -- works perfectly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LG
I wish we could reduce repetitiveness around the lua script. Perhaps we could add some get_size()
function that returns esize
or large_val_sz
depending on huge_entry()
's value, and then it'll use that value instead of all if
-else
?
tests/dragonfly/replication_test.py
Outdated
from .seeder import StaticSeeder | ||
from .seeder import SeederBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from .seeder import StaticSeeder | |
from .seeder import SeederBase | |
from .seeder import StaticSeeder, SeederBase |
tests/dragonfly/replication_test.py
Outdated
@@ -132,6 +161,12 @@ async def check(): | |||
# Check data after stable state stream | |||
await check() | |||
|
|||
if big_value: | |||
info = await c_master.info() | |||
preemptions = info["big_value_preemptions"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this computed? I couldn't find
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this computed? I couldn't find
It's a new metric I introduced in my other PR. I will remove this for now and we will add it after it;s merged
LG_funcs.huge_value_size = large_val_sz | ||
end | ||
|
||
local function huge_entry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simply iterate over all db keys in this lua script. That shouldn't be too hard, nor slow.
LG_funcs.huge_value_size = large_val_sz | ||
end | ||
|
||
local function huge_entry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we can use SCAN
, TYPE
and MEMORY USAGE
in the script to get all the info we seek)
end | ||
|
||
local function huge_entry() | ||
local perc = LG_funcs.huge_value_percentage / 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this now isn't percent, right? fraction or ratio would be more accurate
local elements = dragonfly.randstr(LG_funcs.esize, LG_funcs.csize) | ||
local elements | ||
if huge_entry() then | ||
-- Hard coded 10 here, meaning up to 10 huge entries per set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix :)
Signed-off-by: kostas <[email protected]>
|
||
local huge_entries = 0 | ||
|
||
local function huge_entry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want huge_entry()
to depend on the key? Such that some keys are huge, while others aren't, based on (say) their hash?
The reason I say this is because the seeder uses many operations to generate the values, like many lpush
, hset
, etc. If we do 100 operations per key (just throwing numbers here), doing 5% huge will make them all roughly of the same size...
tests/dragonfly/replication_test.py
Outdated
# (4, [4, 4], dict(key_target=10_000), 1_000, False), | ||
# pytest.param(6, [6, 6, 6], dict(key_target=100_000), 20_000, False, marks=M_OPT), | ||
# # Skewed tests with different thread ratio | ||
# pytest.param(8, 6 * [1], dict(key_target=5_000), 2_000, False, marks=M_SLOW), | ||
# pytest.param(2, [8, 8], dict(key_target=10_000), 2_000, False, marks=M_SLOW), | ||
# # Test with big value size | ||
# pytest.param(2, [2], dict(key_target=1_000, data_size=10_000), 100, False, marks=M_SLOW), | ||
# # Test with big value and big value serialization | ||
# pytest.param(2, [2], dict(key_target=1_000, data_size=10_000), 100, True, marks=M_SLOW), | ||
# # Stress test | ||
# pytest.param( | ||
# 8, [8, 8], dict(key_target=1_000_000, units=16), 50_000, False, marks=M_STRESS | ||
# ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally commented out?
Signed-off-by: kostas <[email protected]>
huge_value_percentage=0, | ||
huge_value_size=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we get really big containers
which causes the memory to grow really fast. That's why I rather have two specific parameters. One for the size of each element on the container and one for the total elements per container
if op_type ~= "string" and op_type ~= "json" then | ||
is_huge = huge_entry() | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining that only string
and json
are handled here, because other types are handled below (and where)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't handle json or string here, we just roll a dice
to decide if it should be huge value or not. There are no huge values for strings or json so that's why we skip the roll
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, my bad, can you please add that as a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course!