Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: add node deletion API #8226

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jul 1, 2024

Problem

In anticipation of later adding a really nice drain+delete API, I initially only added an intentionally basic /drop API that is just about usable for deleting nodes in a pinch, but requires some ugly storage controller restarts to persuade it to restart secondaries.

Summary of changes

I started making a few tiny fixes, and ended up writing the delete API...

  • Quality of life nit: ordering of node + tenant listings in storcon_cli
  • Papercut: Fix the attach_hook using the wrong operation type for reporting slow locks
  • Make Service::spawn tolerate generation_pageserver columns that point to nonexistent node IDs. I started out thinking of this as a general resilience thing, but when implementing the delete API I realized it was actually a legitimate end state after the delete API is called (as that API doesn't wait for all reconciles to succeed).
  • Add a DELETE API for nodes, which does not gracefully drain, but does reschedule everything. This becomes safe to use when the system is in any state, but will incur availability gaps for any tenants that weren't already live-migrated away. If tenants have already been drained, this becomes a totally clean + safe way to decom a node.
  • Add a test and a storcon_cli wrapper for it

FIXME: the node deletion function suffers the same awkwardness as other functions that iterate through shards and call schedule(): it doesn't have a proper ScheduleContext for them all. That doesn't break anything, it just means that some shards may later get migrated again in the background.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/controller Component: Storage Controller labels Jul 1, 2024
@jcsp jcsp requested a review from VladLazar July 1, 2024 20:41
Copy link

github-actions bot commented Jul 1, 2024

3012 tests run: 2897 passed, 0 failed, 115 skipped (full report)


Flaky tests (1)

Postgres 16

Code coverage* (full report)

  • functions: 32.7% (6916 of 21165 functions)
  • lines: 50.0% (54209 of 108401 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
ee09248 at 2024-07-02T09:29:10.926Z :recycle:

@jcsp jcsp marked this pull request as ready for review July 2, 2024 13:56
@jcsp jcsp requested a review from a team as a code owner July 2, 2024 13:56
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered implementing this as a background operation where the caller has to poll for the absence of the node? It would look a lot like the drain code and I think it would be easier on the operator (i.e us 😄 ).

)
}

self.maybe_reconcile_shard(shard, nodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind not waiting for the reconciles to complete before deleting the node? An overly eager operator may call into this API on "very loaded node" ™️ and immediately proceed to nuke it leading to a period of unavailability for all computes that haven't been informed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you take this suggestion above, it would also be nice to limit reconcile concurrency.

Comment on lines +1700 to +1701
# 1. Mark pageserver scheduling=pause
# 2. Mark pageserver availability=offline to trigger migrations away from it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this step racy? The node will still reply to HBs and considered active again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants