-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(migration): Use transactions! #3266
base: main
Are you sure you want to change the base?
Conversation
9e48936
to
8a2e624
Compare
Signed-off-by: Vladislav Oleshko <[email protected]>
8a2e624
to
35ad7b5
Compare
} | ||
LOG(DFATAL) << "Could not find " << id << " to unregister"; | ||
void DbSlice::CallOnChange(DbIndex id, const ChangeReq& cr) const { | ||
FiberAtomicGuard fg; // Callbacks don't preemept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preempt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but soon they will preempt. is there any problem with this design have the callbacks preempting?
//! Registers the callback to be called for each change. | ||
//! Returns the registration id which is also the unique version of the dbslice | ||
//! at a time of the call. | ||
// Called before every access to an entry with a FindMutable call. Returns version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a comment that this function should be called in the global transaction only
uint32_t id = next_cb_id_++; | ||
change_cb_arr_.emplace_back(id, std::move(cb)); | ||
return id; | ||
lock_guard lk(cb_mu_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need it. I thought with global transaction we don't need it anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but we still need the mutex for deletes, becase we unregister arbitrarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for db_slice we don't need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because there is no code path form those callbacks that can unregister itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat fragile, I agree
UPD: we have to use transactions for finalization as well |
@@ -14,8 +14,8 @@ SegmentAllocator::SegmentAllocator(mi_heap_t* heap) : heap_(heap) { | |||
constexpr size_t limit = 1ULL << 35; | |||
static_assert((1ULL << (kSegmentIdBits + kSegmentShift)) == limit); | |||
// mimalloc uses 32MiB segments and we might need change this code if it changes. | |||
static_assert(kSegmentShift == MI_SEGMENT_SHIFT); | |||
static_assert((~kSegmentAlignMask) == (MI_SEGMENT_MASK)); | |||
// static_assert(kSegmentShift == MI_SEGMENT_SHIFT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we pulled new version of mimalloc
if this assert fails you should clean up your mimalloc build
@@ -240,6 +248,9 @@ void OutgoingMigration::SyncFb() { | |||
} | |||
|
|||
bool OutgoingMigration::FinalizeMigration(long attempt) { | |||
OnAllShards([this](auto& migration) { server_family_->CancelBlockingOnThread(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we cancel all blocking commands if we migrate only specific slots? we should cancel only the blocking commands only on this slots
@@ -240,6 +248,9 @@ void OutgoingMigration::SyncFb() { | |||
} | |||
|
|||
bool OutgoingMigration::FinalizeMigration(long attempt) { | |||
OnAllShards([this](auto& migration) { server_family_->CancelBlockingOnThread(); }); | |||
Transaction::Guard tg{tx_.get()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this to use Transaction::Guard? this change means that we will not be able to run f.e the config update command until we finalize the migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't finalize a migration in-between a command running, we have to use a global transaction
means that we will not be able to run f.e the config update command until we finalize the migration
It should be just enqueued in the transaction queue, so it will run immediately after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, client pause here is pretty safe from a tranasctional point 馃 will revert back to it
Fixes #3229, but the bigger issue is that migrations didn't use transactions at all the whole time 馃く
Now we have a global transaction cut to start a migration - registering both the journal and db_slice callbacks while the datastore is under a global lock and no commands are running