-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix test_network_disconnect_during_migration
test
#4224
Conversation
There are actually a few failures fixed in this PR, only one of which is a test bug. Fixes #4207
🕺 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -79,7 +81,9 @@ void JournalStreamer::Cancel() { | |||
VLOG(1) << "JournalStreamer::Cancel"; | |||
waker_.notifyAll(); | |||
journal_->UnregisterOnChange(journal_cb_id_); | |||
WaitForInflightToComplete(); | |||
if (!cntx_->IsCancelled()) { | |||
WaitForInflightToComplete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to write that maybe we should move cntx_->IsCancelled()
within WaitForInFlightToComplete
but then I realized it's only called in this place so not really needed I guess
@@ -41,7 +41,9 @@ JournalStreamer::JournalStreamer(journal::Journal* journal, Context* cntx) | |||
} | |||
|
|||
JournalStreamer::~JournalStreamer() { | |||
DCHECK_EQ(in_flight_bytes_, 0u); | |||
if (!cntx_->IsCancelled()) { | |||
DCHECK_EQ(in_flight_bytes_, 0u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did we not trigger this before ? Or did we just deadlocked because WaitForInFlightToCOmplete() would never progress
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk why we didn't trigger this before, but indeed this dead locks
src/server/multi_command_squasher.cc
Outdated
@@ -112,6 +112,8 @@ MultiCommandSquasher::SquashResult MultiCommandSquasher::TrySquash(StoredCmd* cm | |||
|
|||
cmd->Fill(&tmp_keylist_); | |||
auto args = absl::MakeSpan(tmp_keylist_); | |||
if (args.size() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, didn't DetermineKeys
below handle this case or ?
Also general small nits (I do not care if you apply this or not 😄 )
span
containsempty()
- We can also use
NumArgs()
and avoid the two calls above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to use empty()
, thanks!
Re/ NumArgs()
, that's a feature of the command, not the args
@@ -215,6 +219,9 @@ void RestoreStreamer::Run() { | |||
return; | |||
|
|||
cursor = db_slice_->Traverse(pt, cursor, [&](PrimeTable::bucket_iterator it) { | |||
if (fiber_cancelled_) // Could be cancelled any time as Traverse may preempt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can traverse preempt if we dont have the big value serialization merged yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the callback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which callback can preempt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way I think we can use snapshot_version_ instead of fiber_canceled_ because we always process them together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which callback can preempt?
db_slice_->FlushChangeToEarlierCallbacks(0 /db_id always 0 for cluster/,
DbSlice::Iterator::FromPrime(it), snapshot_version_);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also WriteBucket(it); can yield
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I think the if fiber_cancelled_ should be also after call to FlushChangeToEarlierCallbacks
I suggest to write in the descirtion all the bugs that were fixed |
@chakaz I think we also need to update the next methods ~OutgoingMigration::SliceSlotMigration::SliceSlotMigration() { |
Test now passes, and I think I responded to / applied all comments. PTAL :) |
@chakaz Please run tests a couple of times more, because sometimes they are passed even with bug |
There are actually a few failures fixed in this PR, only one of which is a test bug:
db_slice_->Traverse()
can yield, causingfiber_cancelled_
's value to changeWaitForInflightToComplete()
because it hasin_flight_bytes_
that will never reach destination due to the cancellationIterateMap()
with numeric key/values overrode the key's buffer with the value's bufferFixes #4207