-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage controller: add node deletion API #8226
base: main
Are you sure you want to change the base?
Conversation
3012 tests run: 2897 passed, 0 failed, 115 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
ee09248 at 2024-07-02T09:29:10.926Z :recycle: |
c8e6a4c
to
ee09248
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered implementing this as a background operation where the caller has to poll for the absence of the node? It would look a lot like the drain code and I think it would be easier on the operator (i.e us 😄 ).
) | ||
} | ||
|
||
self.maybe_reconcile_shard(shard, nodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale behind not waiting for the reconciles to complete before deleting the node? An overly eager operator may call into this API on "very loaded node" ™️ and immediately proceed to nuke it leading to a period of unavailability for all computes that haven't been informed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take this suggestion above, it would also be nice to limit reconcile concurrency.
# 1. Mark pageserver scheduling=pause | ||
# 2. Mark pageserver availability=offline to trigger migrations away from it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this step racy? The node will still reply to HBs and considered active again.
Problem
In anticipation of later adding a really nice drain+delete API, I initially only added an intentionally basic
/drop
API that is just about usable for deleting nodes in a pinch, but requires some ugly storage controller restarts to persuade it to restart secondaries.Summary of changes
I started making a few tiny fixes, and ended up writing the delete API...
generation_pageserver
columns that point to nonexistent node IDs. I started out thinking of this as a general resilience thing, but when implementing the delete API I realized it was actually a legitimate end state after the delete API is called (as that API doesn't wait for all reconciles to succeed).DELETE
API for nodes, which does not gracefully drain, but does reschedule everything. This becomes safe to use when the system is in any state, but will incur availability gaps for any tenants that weren't already live-migrated away. If tenants have already been drained, this becomes a totally clean + safe way to decom a node.FIXME: the node deletion function suffers the same awkwardness as other functions that iterate through shards and call schedule(): it doesn't have a proper ScheduleContext for them all. That doesn't break anything, it just means that some shards may later get migrated again in the background.
Checklist before requesting a review
Checklist before merging