Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for big values in SeederV2 #4222

Merged
merged 15 commits into from
Dec 5, 2024
Merged

feat: add support for big values in SeederV2 #4222

merged 15 commits into from
Dec 5, 2024

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Nov 28, 2024

  • add support for big values in SeederV2

@kostasrim kostasrim self-assigned this Nov 28, 2024
@@ -137,6 +138,8 @@ def __init__(
data_size=100,
collection_size=None,
types: typing.Optional[typing.List[str]] = None,
huge_value_percentage=0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I will keep a flat probability for each key in the key target to contain a huge value

local elements = dragonfly.randstr(LG_funcs.esize, LG_funcs.csize)
local elements
if huge_entry() then
-- Hard coded 10 here, meaning up to 10 huge entries per set
Copy link
Contributor Author

@kostasrim kostasrim Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//TODO so I don't forget to fix it. Replace 10 with LG_funcs.csize()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix :)

@adiholden adiholden requested a review from chakaz December 2, 2024 08:25
LG_funcs.huge_value_size = large_val_sz
end

local function huge_entry()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to expose this as a metric such that once the seeder finishes it will preempt how many big values it created. However, since this code is a script I don't see a "smart way".

Maybe a seeder can create a key in dragonfly (set big_values number_of_big_values) which can then the poll ?

@chakaz any ideas/thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simply iterate over all db keys in this lua script. That shouldn't be too hard, nor slow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we can use SCAN, TYPE and MEMORY USAGE in the script to get all the info we seek)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and we don't really need scan. In fact I baked this metric in the lua script which we return -- works perfectly

Copy link
Collaborator

@chakaz chakaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LG
I wish we could reduce repetitiveness around the lua script. Perhaps we could add some get_size() function that returns esize or large_val_sz depending on huge_entry()'s value, and then it'll use that value instead of all if-else?

Comment on lines 20 to 21
from .seeder import StaticSeeder
from .seeder import SeederBase
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from .seeder import StaticSeeder
from .seeder import SeederBase
from .seeder import StaticSeeder, SeederBase

@@ -132,6 +161,12 @@ async def check():
# Check data after stable state stream
await check()

if big_value:
info = await c_master.info()
preemptions = info["big_value_preemptions"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this computed? I couldn't find

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this computed? I couldn't find

It's a new metric I introduced in my other PR. I will remove this for now and we will add it after it;s merged

tests/dragonfly/seeder/script-generate.lua Show resolved Hide resolved
LG_funcs.huge_value_size = large_val_sz
end

local function huge_entry()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simply iterate over all db keys in this lua script. That shouldn't be too hard, nor slow.

LG_funcs.huge_value_size = large_val_sz
end

local function huge_entry()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we can use SCAN, TYPE and MEMORY USAGE in the script to get all the info we seek)

end

local function huge_entry()
local perc = LG_funcs.huge_value_percentage / 100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this now isn't percent, right? fraction or ratio would be more accurate

local elements = dragonfly.randstr(LG_funcs.esize, LG_funcs.csize)
local elements
if huge_entry() then
-- Hard coded 10 here, meaning up to 10 huge entries per set
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix :)

Signed-off-by: kostas <[email protected]>
@kostasrim kostasrim changed the title [wip -- do not review] feat: add support for big values in SeederV2 feat: add support for big values in SeederV2 Dec 2, 2024

local huge_entries = 0

local function huge_entry()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want huge_entry() to depend on the key? Such that some keys are huge, while others aren't, based on (say) their hash?
The reason I say this is because the seeder uses many operations to generate the values, like many lpush, hset, etc. If we do 100 operations per key (just throwing numbers here), doing 5% huge will make them all roughly of the same size...

@kostasrim kostasrim requested a review from chakaz December 3, 2024 16:33
Comment on lines 51 to 63
# (4, [4, 4], dict(key_target=10_000), 1_000, False),
# pytest.param(6, [6, 6, 6], dict(key_target=100_000), 20_000, False, marks=M_OPT),
# # Skewed tests with different thread ratio
# pytest.param(8, 6 * [1], dict(key_target=5_000), 2_000, False, marks=M_SLOW),
# pytest.param(2, [8, 8], dict(key_target=10_000), 2_000, False, marks=M_SLOW),
# # Test with big value size
# pytest.param(2, [2], dict(key_target=1_000, data_size=10_000), 100, False, marks=M_SLOW),
# # Test with big value and big value serialization
# pytest.param(2, [2], dict(key_target=1_000, data_size=10_000), 100, True, marks=M_SLOW),
# # Stress test
# pytest.param(
# 8, [8, 8], dict(key_target=1_000_000, units=16), 50_000, False, marks=M_STRESS
# ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally commented out?

Signed-off-by: kostas <[email protected]>
@kostasrim kostasrim requested a review from chakaz December 4, 2024 07:43
Comment on lines +38 to +39
huge_value_percentage=0,
huge_value_size=0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we get really big containers which causes the memory to grow really fast. That's why I rather have two specific parameters. One for the size of each element on the container and one for the total elements per container

Comment on lines +49 to +51
if op_type ~= "string" and op_type ~= "json" then
is_huge = huge_entry()
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining that only string and json are handled here, because other types are handled below (and where)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle json or string here, we just roll a dice to decide if it should be huge value or not. There are no huge values for strings or json so that's why we skip the roll

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, my bad, can you please add that as a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course!

@kostasrim kostasrim requested a review from chakaz December 5, 2024 08:14
@kostasrim kostasrim enabled auto-merge (squash) December 5, 2024 08:19
@kostasrim kostasrim merged commit 7ccad66 into main Dec 5, 2024
9 checks passed
@kostasrim kostasrim deleted the kpr1 branch December 5, 2024 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants