Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(migration): Use transactions! #3266

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Jul 4, 2024

Fixes #3229, but the bigger issue is that migrations didn't use transactions at all the whole time 馃く

Now we have a global transaction cut to start a migration - registering both the journal and db_slice callbacks while the datastore is under a global lock and no commands are running

Signed-off-by: Vladislav Oleshko <[email protected]>
@dranikpg dranikpg marked this pull request as ready for review July 4, 2024 20:07
}
LOG(DFATAL) << "Could not find " << id << " to unregister";
void DbSlice::CallOnChange(DbIndex id, const ChangeReq& cr) const {
FiberAtomicGuard fg; // Callbacks don't preemept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preempt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but soon they will preempt. is there any problem with this design have the callbacks preempting?

//! Registers the callback to be called for each change.
//! Returns the registration id which is also the unique version of the dbslice
//! at a time of the call.
// Called before every access to an entry with a FindMutable call. Returns version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a comment that this function should be called in the global transaction only

uint32_t id = next_cb_id_++;
change_cb_arr_.emplace_back(id, std::move(cb));
return id;
lock_guard lk(cb_mu_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need it. I thought with global transaction we don't need it anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but we still need the mutex for deletes, becase we unregister arbitrarily

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for db_slice we don't need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because there is no code path form those callbacks that can unregister itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat fragile, I agree

@dranikpg
Copy link
Contributor Author

dranikpg commented Jul 5, 2024

UPD: we have to use transactions for finalization as well

@@ -14,8 +14,8 @@ SegmentAllocator::SegmentAllocator(mi_heap_t* heap) : heap_(heap) {
constexpr size_t limit = 1ULL << 35;
static_assert((1ULL << (kSegmentIdBits + kSegmentShift)) == limit);
// mimalloc uses 32MiB segments and we might need change this code if it changes.
static_assert(kSegmentShift == MI_SEGMENT_SHIFT);
static_assert((~kSegmentAlignMask) == (MI_SEGMENT_MASK));
// static_assert(kSegmentShift == MI_SEGMENT_SHIFT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we pulled new version of mimalloc
if this assert fails you should clean up your mimalloc build

@@ -240,6 +248,9 @@ void OutgoingMigration::SyncFb() {
}

bool OutgoingMigration::FinalizeMigration(long attempt) {
OnAllShards([this](auto& migration) { server_family_->CancelBlockingOnThread(); });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we cancel all blocking commands if we migrate only specific slots? we should cancel only the blocking commands only on this slots

@@ -240,6 +248,9 @@ void OutgoingMigration::SyncFb() {
}

bool OutgoingMigration::FinalizeMigration(long attempt) {
OnAllShards([this](auto& migration) { server_family_->CancelBlockingOnThread(); });
Transaction::Guard tg{tx_.get()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this to use Transaction::Guard? this change means that we will not be able to run f.e the config update command until we finalize the migration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't finalize a migration in-between a command running, we have to use a global transaction

means that we will not be able to run f.e the config update command until we finalize the migration

It should be just enqueued in the transaction queue, so it will run immediately after

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, client pause here is pretty safe from a tranasctional point 馃 will revert back to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

db_slice and journal RegisterOnChange method refactoring
3 participants