Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use the allowed upgrades list when validating version upgrades #240

Merged

Conversation

dcrosta
Copy link
Contributor

@dcrosta dcrosta commented Sep 25, 2024

Beginning with v24.2, some releases are innovation releases which can be skipped during upgrading. The ListClusterMajorVersions API returns information on allowed upgrade paths, which we now use to validate whether an upgrade is possible in apply. Additional validation (cluster state, etc) is performed server-side within the API, so this validation is mostly to catch the most common type of error with a faster, more helpful error message, and not to be exhaustive.

Commit checklist

  • Changelog
  • Doc gen (make generate)
  • Integration test(s)
  • Acceptance test(s)
  • Example(s)

@dcrosta dcrosta force-pushed the support-skippable-innovation-releases branch 2 times, most recently from f7cd25d to 713fe76 Compare September 25, 2024 20:20
@dcrosta dcrosta force-pushed the support-skippable-innovation-releases branch 2 times, most recently from 85fb7c5 to 1ffb909 Compare October 3, 2024 20:20
@dcrosta dcrosta marked this pull request as ready for review October 3, 2024 20:56
@dcrosta dcrosta requested a review from fantapop October 3, 2024 21:03
Makefile Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
Comment on lines 752 to 756
resp.Diagnostics.AddError("Invalid CockroachDB version",
fmt.Sprintf(
"Cannot roll back to to '%s'.",
planVersion))
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask if we can list the valid rollback version here but... do we actually not know it at this point? If we let this pass through, it seems like we can give a better error. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't include the rollback version in the CC API clusters response. We could, there's probably no harm in doing so. Do you think that's valuable to add?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels more like noise to me. I'd rather not add it to the api response unless we have a more compelling use case. I was suggesting here that we could let the api call happen here which would let the ccapi validate the version and return the better error message. But we could also merge this as is. It's a slight improvement on an unlikely case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without all this error handling, we get:

cockroach_cluster.example: Modifying... [id=2dc9a589-1f20-412c-952b-522d5184a3f3]
cockroach_cluster.example: Still modifying... [id=2dc9a589-1f20-412c-952b-522d5184a3f3, 10s elapsed]
cockroach_cluster.example: Still modifying... [id=2dc9a589-1f20-412c-952b-522d5184a3f3, 20s elapsed]
╷
│ Error: Error updating cluster
│
│   with cockroach_cluster.example,
│   on main.tf line 49, in resource "cockroach_cluster" "example":
│   49: resource "cockroach_cluster" "example" {
│
│ cluster can only be rolled back to the previous version, expected: v24.1

I don't know why this took 20+ seconds to get this? But it seems like a reasonable error, so I'll remove the checks in the tf provider

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the request: https://us5.datadoghq.com/apm/trace/8538d437c1868d2218b1a60d611750c1?graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=1655416649021547938&timeHint=1728402657004.241

350ms for the actual api request. I think it probably took 20s because of all the other requests that happen during a tf run. It looks like there are about 20 requests to the same cluster endpoint. API caching via terraform isn't a good story currently.

Screenshot 2024-10-09 at 8 41 53 AM

internal/provider/cluster_resource.go Show resolved Hide resolved
Copy link
Collaborator

@fantapop fantapop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thank you

Beginning with v24.2, some releases are innovation releases which can be
skipped during upgrading. The ListClusterMajorVersions API returns
information on allowed upgrade paths, which we now use to validate
whether an upgrade is possible in `apply`. Additional validation
(cluster state, etc) is performed server-side within the API, so this
validation is mostly to catch the most common type of error with a
faster, more helpful error message, and not to be exhaustive.
@dcrosta dcrosta force-pushed the support-skippable-innovation-releases branch from afb2c01 to 0e694e0 Compare October 9, 2024 14:36
@dcrosta dcrosta merged commit 4a4444b into cockroachdb:main Oct 9, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants