Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PKI: Track last time auto tidy was run across restarts #28488

Merged
merged 9 commits into from
Sep 26, 2024

Conversation

stevendpclark
Copy link
Contributor

@stevendpclark stevendpclark commented Sep 24, 2024

Description

Start persisting the last time we ran auto-tidy in order to not lose the information across restarts. If an auto-tidy schedule was set to run every 7 days, but there was a scheduled Vault restart every 6 days, auto-tidy would never run as the counter would be reset on restart.

We add in two new auto-tidy fields to control a possible thundering herd problem of having too many auto-tidy processes starting up at once. min_startup_backoff_duration and max_startup_backoff_duration fields with defaults of 5m and 15m respectively.

The UI for auto-tidy has been updated, adding the two new fields to the auto-tidy configuration screen, the new fields are hidden if not enabled, as well as being displayed within the auto-tidy configuration view. They do not appear on the manual tidy screens.

Screenshot 2024-09-25 at 6 06 10 PM Screenshot 2024-09-25 at 6 06 28 PM

TODO only if you're a HashiCorp employee

  • Backport Labels: If this PR is in the ENT repo and needs to be backported, backport
    to N, N-1, and N-2, using the backport/ent/x.x.x+ent labels. If this PR is in the CE repo, you should only backport to N, using the backport/x.x.x label, not the enterprise labels.
    • If this fixes a critical security vulnerability or severity 1 bug, it will also need to be backported to the current LTS versions of Vault. To ensure this, use all available enterprise labels.
  • ENT Breakage: If this PR either 1) removes a public function OR 2) changes the signature
    of a public function, even if that change is in a CE file, double check that
    applying the patch for this PR to the ENT repo and running tests doesn't
    break any tests. Sometimes ENT only tests rely on public functions in CE
    files.
  • Jira: If this change has an associated Jira, it's referenced either
    in the PR description, commit message, or branch name.
  • RFC: If this change has an associated RFC, please link it in the description.
  • ENT PR: If this change has an associated ENT PR, please link it in the
    description. Also, make sure the changelog is in this PR, not in your ENT PR.

builtin/logical/pki/path_tidy.go Dismissed Show dismissed Hide dismissed
Copy link

github-actions bot commented Sep 24, 2024

Build Results:
All builds succeeded! ✅

Copy link

github-actions bot commented Sep 24, 2024

CI Results:
All Go tests succeeded! ✅

Copy link
Contributor

@victorr victorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive!

@@ -561,6 +579,108 @@ func pathTidyStatus(b *backend) *framework.Path {
}

func pathConfigAutoTidy(b *backend) *framework.Path {
autoTidyResponseFields := map[string]*framework.FieldSchema{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Thank you for removing duplicate code!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on this - thank you very much

@@ -571,6 +691,16 @@ func pathConfigAutoTidy(b *backend) *framework.Path {
Type: framework.TypeBool,
Description: `Set to true to enable automatic tidy operations.`,
},
"min_startup_backoff_duration": {
Type: framework.TypeDurationSecond,
Description: `The minimum amount of time auto-tidy will be delayed after startup in seconds.`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this sounds slightly better?

The minimum amount of time in seconds auto-tidy will be delayed after startup.

Copy link
Contributor

@kitography kitography left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, only one nit, and this is cleaner :)

@@ -561,6 +579,108 @@ func pathTidyStatus(b *backend) *framework.Path {
}

func pathConfigAutoTidy(b *backend) *framework.Path {
autoTidyResponseFields := map[string]*framework.FieldSchema{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on this - thank you very much

builtin/logical/pki/path_tidy.go Show resolved Hide resolved
Copy link
Contributor

@hellobontempo hellobontempo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for tackling the UI side of this. We really appreciate it 🏅 I left some comments/suggestions that are mostly UX related. Happy to pair or push up any of these changes if that's helpful.

The only concern is why the form test is missing an assertion. Since we want to make sure those attributes are being submitted properly.

ui/app/models/pki/tidy.js Outdated Show resolved Hide resolved
ui/app/models/pki/tidy.js Outdated Show resolved Hide resolved
// fields that are specific to auto-tidy, which should only show up when enabled.
get autoTidyEnabledFields() {
const enabledFields = [
{ 'Auto Tidy Startup Backoff': ['minStartupBackoffDuration', 'maxStartupBackoffDuration'] },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need these to be a separate group section in the form since it's just these two fields. Normally this would be where we'd consult design, but they're out this month... 😅

Do you feel like the header provides a UX benefit for the user? If not, then I think we can remove this since the form already has quite a few sections. I'm certainly not a PKI expert 😆 , but to me the separation doesn't seem necessary since these fields seem to relate to the interval_duration input, since they are all auto tidy settings. What do you think?

If this section grows, then in the future we could do some rearranging and label it more generally like "Auto tidy settings" or "Auto tidy configuration"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather it be part of the same, but was trying to follow the trend :) I'll try to remove this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah - I completely understand why you included it in a separate section! I'm honestly impressed you implemented the frontend piece so seamlessly 😄

return this._expandGroups(groups);
}

// fields that are specific to auto-tidy, which should only show up when enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this comment! 🤩 If auto tidy is enabled, then disabled, will these settings be preserved? If so, then I might suggest renaming the getter to autoTidyConfigFields since the enabled toggle is more for display only purposes in the UI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update to the proposed name, yes the values are preserved if auto-tidy is enabled/disabled.

const enabledFields = [
{ 'Auto Tidy Startup Backoff': ['minStartupBackoffDuration', 'maxStartupBackoffDuration'] },
];
return this._expandGroups(enabledFields);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how you feel about the section header, there are a couple of options here. Either way, we already call this._expandGroups on all of the fields above and that function does a lot of iterating so we want to avoid calling it again, if possible. We can update this to just return the relevant fields, which also simplifies the template logic (in the .hbs file)!

I'd probably refactor this to be:

get autoTidyConfigFields() {
   return ['minStartupBackoffDuration', 'maxStartupBackoffDuration']
}

and then to avoid having to update these in multiple places in the future, reuse this in the allGroups getter:

  get allGroups() {
    const groups = [
      { autoTidy: ['enabled', 'intervalDuration', ...this.autoTidyConfigFields },
      ...this.sharedFields,
    ];
    return this._expandGroups(groups);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup much better, implemented within 642e5fc

ui/lib/pki/addon/components/pki-tidy-form.hbs Outdated Show resolved Hide resolved
Comment on lines 27 to 28
minStartupBackoffDuration: string;
maxStartupBackoffDuration: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually don't need these here. The naming is confusing, but this is for the typescript function below handleTtl which is a single input that manages two params, interval_duration and enabled

We'd only need to add these if we wanted to do something similar and have these durations attached to a toggle (but I think how you've implemented, without the toggle, it is better)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yeah these were me playing around and forgetting to clean these up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 642e5fc

@@ -1399,6 +1399,16 @@ const pki = {
'Interval at which to run an auto-tidy operation. This is the time between tidy invocations (after one finishes to the start of the next). Running a manual tidy will reset this duration.',
fieldGroup: 'default',
},
minStartupBackoffDuration: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏 👏 thank you for updating these!

@@ -175,13 +183,15 @@ module('Integration | Component | pki tidy form', function (hooks) {
});

test('it should change the attributes on the model', async function (assert) {
assert.expect(12);
assert.expect(11);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. This is incorrect and should be 12 🤔 This likely means submitting the form is erroring and we're not hitting the post request below which is the twelfth assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good catch! Updated back to 12 and added the missing fields within the const fillInValues seems to have fixed it.

Implemented within 642e5fc

this.server.post('/pki-auto-tidy/config/auto-tidy', (schema, req) => {
assert.propEqual(
JSON.parse(req.requestBody),
{
acme_account_safety_buffer: '72h',
enabled: true,
min_startup_backoff_duration: '5m',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these to exist in the response, we need to input them in the form in the test. To do this, the const fillInValues on line 247 should include these (in seconds) so that the tests fills them in (line 260)

    const fillInValues = {
      issuerSafetyBuffer: 20,
      pauseDuration: 30,
      revocationQueueSafetyBuffer: 40,
      safetyBuffer: 50,
      minStartupBackoffDuration: 300,
      maxStartupBackoffDuration: 900,
    };

I wonder if this is why the submit is failing (and therefore not hitting the post request)

 - If the interval time for auto-tidy is longer then say a regularly
   scheduled restart of Vault, auto-tidy is never run. This is due to
   the time of the last run of tidy is only kept in memory and
   initialized on startup to the current time
 - Store the last run of any tidy, to maintain previous behavior, to
   a cluster local file, which is read in/initialized upon a mount
   initialization.
@stevendpclark stevendpclark force-pushed the stevendpclark/vault-30322-auto-tidy-across-nodes branch from 9531e50 to e2c2074 Compare September 25, 2024 20:25
@stevendpclark stevendpclark added this to the 1.19.0-rc milestone Sep 25, 2024
…ple enabling auto tidy from duration, move params to auto settings section
@attr('boolean', {
label: 'Automatic tidy enabled',
defaultValue: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed defaults so these values are retrieved from backend, this way the frontend won't accidentally override values if they change


@attr({
label: 'Automatic tidy enabled',
labelDisabled: 'Automatic tidy disabled',
mapToBoolean: 'enabled',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoupled this input from the enabled toggle so now it renders as its own TTL

@@ -76,7 +96,7 @@ export default class PkiTidyModel extends Model {
})
revocationQueueSafetyBuffer; // enterprise only

@attr('string', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 'string' types for ttls so we can pull default values seamlessly from API

{{/if}}
{{#each @tidy.formFieldGroups as |fieldGroup|}}
{{#each-in fieldGroup as |group fields|}}
{{#if (or (eq @tidyType "manual") @tidy.enabled)}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this outside the {{#each}} because it makes more sense there

@@ -231,28 +274,23 @@ module('Integration | Component | pki tidy form', function (hooks) {
assert.false(this.autoTidy.tidyAcme, 'tidyAcme is false on model');

await click(PKI_TIDY_FORM.toggleInput('acmeAccountSafetyBuffer'));
await fillIn(PKI_TIDY_FORM.acmeAccountSafetyBuffer, 3); // units are days based on defaultValue
await fillIn(PKI_TIDY_FORM.acmeAccountSafetyBuffer, 2); // units are days based on defaultValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed because 3 days was too similar to the default (72h vs 720h) and made it seem like there was a ttl miscalculation

Copy link
Contributor

@hellobontempo hellobontempo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work tackling the UI changes! Thank you again! ❤️ Fingers crossed the UI changes backport cleanly, but I can help with any merge conflicts if not

My last commit should address the UI failures on the previous CI run, hopefully the next is good to go ✅

@stevendpclark stevendpclark removed the backport/ent/1.16.x+ent Changes are backported to 1.16.x+ent label Sep 26, 2024
@stevendpclark
Copy link
Contributor Author

Thanks everyone for the review feedback and a special thanks to @hellobontempo for the review feedback and helping out with the UI!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.17.x backport/1.18.x hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed secret/pki ui
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants