You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cluster management and operations is not only creation and deletion of clusters but also the upgrade of the clusters.
Upgrades can be a major stress factor for any platform engineering team and therefore we should try to make them as easy as possible and automated but with as much insights for the team that does the upgrades.
While fully automated upgrades are on the lowest level of interaction and seem to be the easiest, they do not fit into the operational procedures of enterprise customers which want to trigger upgrades of production clusters in a controlled way
Major deliverables
ability to upgrade clusters
Who it benefits
Customer Business: Plane-ability and controlled cluster upgrades that fit the need of enterprise k8s cluster management
Platform: Stress free upgrades without a massive amount of work to upgrade
Mirantis: Great customer experience and happy customers
Acceptance criteria
Upgrading of a cluster involves 3 steps:
Upgrade the Helm Chart with the changes and push the changes into an OCI registry with a new version of the Helm Chart
Create a new Template Object with a new name that references the pushed vesrsion of the helm chart
Upgrade/migrate the Deployment object to point to the new Template Name which then actually triggers the upgrade of the cluster
The Deployment Object shows similar status information as CAPI itself provides
Expectation is to have three statuses: Upgrade in Progress, Upgrade successful, Upgrade failed
Failed Upgrades are clearly marked in the Deployment Object
Changes of template variables and template name of the Deployment object are treated the same way, as they could trigger any cluster changes (like a change of the instance type in AWS needs to replace all k8s nodes, the same as a template name upgrade which upgrades the k0s version)
Assumptions
CAPI does actual upgrades of the changes in an enterprise grade way
Telemetry & Success Criteria
Each Upgrade triggers a Telemetry Event with the following Infos after the upgrade is completed:
cluster_id
target_infrastructure
New template name
Out of scope
The actual upgrade of the cluster is handled by CAPI and we should not write any code in HMC repo which upgrades the clusters. HMC code should only be in an observabillity mode of the actual upgrade and provide as much information as needed into the Deployment Object from CAPI. If there are any bugs we find that prevent upgrades they should be fixed in CAPI or the affected CAPI providers.
CAPI is sometimes a bit finicky on which objects can be upgraded in place and which of them need to be rolling changed (new ones added and then old one removed). In this epic we don't want to worry about this yet and assume that the templates itself don't modify inplace parts of CAPI objects which actually can't be modified inplace.
Multi Cluster Upgrades will be implemented later
Auto Cluster Upgrade will be implemented later
Upgrading of Mirantis templates and mgmt control plane itself is not part of this epic
Goals
While fully automated upgrades are on the lowest level of interaction and seem to be the easiest, they do not fit into the operational procedures of enterprise customers which want to trigger upgrades of production clusters in a controlled way
Major deliverables
Who it benefits
Acceptance criteria
Assumptions
Telemetry & Success Criteria
Out of scope
related issues:
The text was updated successfully, but these errors were encountered: