Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: expirations on huge values #4297

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

fix: expirations on huge values #4297

wants to merge 1 commit into from

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Dec 12, 2024

I managed to reproduce locally the https://github.com/dragonflydb/dragonfly/actions/runs/12290843825

The problem is that there are flows that hold no locks to the keys (it's not inline transactions because I fixed that while ago) so when we preempt when we call ExpireIfNeeded we first write to the journal, context switch to another fiber and then we finally delete the entry. If the fiber that we switched modifies the same key then we get the data inconsistency because after we resume back from the write to journal we will delete the key.

I am not yet sure which flow does not lock the key and I am currently investigating that.

The test no longer fails locally with this small change.

P.s. now all of our ExpireIfNeeded calls require to the journal write to not preempt. So we are kinda back on ExpireIfNeeded always requiring to not preempt.

I would rather not merge this yet until we understand exactly where we don't lock the keys.

@kostasrim kostasrim self-assigned this Dec 12, 2024
@kostasrim kostasrim marked this pull request as draft December 12, 2024 11:37
@@ -453,7 +453,7 @@ OpResult<DbSlice::PrimeItAndExp> DbSlice::FindInternal(const Context& cntx, std:
}

if (res.it->second.HasExpire()) { // check expiry state
res = ExpireIfNeeded(cntx, res.it, true);
res = ExpireIfNeeded(cntx, res.it, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we call FindInternal we always hold the lock to the key, or at least we should, if not its probably a bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and I think I am wrong here as well. There is something else that is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants