-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design-proposal: improved handling of the VM FW ID #347
base: main
Are you sure you want to change the base?
design-proposal: improved handling of the VM FW ID #347
Conversation
4257d15
to
b7735b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dasionov for stepping up to create a design proposal. It would be wonderful if you can get this old issue sorted.
However, I think that you should discuss possible alternatives for a solution before deciding one implementation (persisting UUID in status).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @dasionov for picking this issue up! It's indeed important to solve and keeps coming back!
I very much agree with most of what @dankenigsberg commented above, especially regarding the need to provide a few alternative solutions. This way we can discuss which would be the best way moving forward.
Let me try to help you with that. Here are a few alternatives that I have in mind:
Keep the UUID in the VM's spec even if the user did not request it
Partially implemented here: kubevirt/kubevirt#13158.
Main idea: The UUID would be generated, as of now, for every VM once it boots. After generation, the VM's spec would be updated with the generated UUID.
Pros: Easy to implement, avoid adding new API fields, fully backward compatible.
Cons: Abusing the spec. The spec should only contain the user's desired state. In most situations, the VM owner does not care what the UUID is, hence it is not a part of the desired state from the user's POV. This will complicate and make the VM's definition longer, hard to read, and hard to reason about.
Generate a default random UUID via a webhook and set it to the spec
Partially implemented here: kubevirt/kubevirt#12085.
Main idea: If the user did not provide a desired UUID, a webhook will generate a random UUID and will set it to the VM's spec.
Pros: Same as the previous suggestion.
Cons: Same argument from above regarding abusing the spec. In addition, using webhooks for supplying defaults is a bad practice for many reasons, for example it might hurt performance and scalability if a lot of VMs are being started at the same time, since then virt-api might become a bottle neck.
Create a new firmware UUID field under status
Main idea: Just as today, UUID would be generated during the VM's first boot. After it is generated, it would be saved to a dedicated firmware UUID status field.
Pros: No need to abuse spec and we can keep it clean.
Cons: Adding a new API field which is generally not valuable outside the scope of this issue. New API fields need to be added with extreme caution, as they are very hard to remove, might increase load and hurt performance, and make the API less readable and compact. (We can consider minimizing this con by setting the UUID in a VM condition as suggested here).
Introduce a breaking change
Main idea: Just as today, UUID would be generated during the VM's first boot, but it would generate the UUID based on both name and namespace, which will be a breaking change. Before doing so, we would warn the users for a few versions (via logs/alerts/etc) and possibly providing tooling to add the current firmware UUID to the VM's spec in order to protect the workload.
Pros: No need to abuse spec and we can keep it clean, very easy to implement.
Cons: Demands user intervention in order to avoid breaking current workloads.
Obviously, feel free to add more alternatives that you can think of, or express your opinion on the above alternatives.
b7735b0
to
15c34f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks like a solid foundation. One concern I have is around VM snapshots—I think it's worth mentioning that even if the firmware UUID of an existing VM is set to persist, it won’t be retained during a restore. This could potentially cause functionality issues.
- Keeps the spec clean. | ||
- Focuses on VM status for generated information. | ||
|
||
**Cons:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A drawback here in my view is that an object's status is often dynamic and therefore somewhat fragile. This makes sense, as controllers manipulate the status field to represent the object's current state, while they rarely—if ever—modify the object's spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A drawback here in my view is that an object's status is often dynamic and therefore somewhat fragile. This makes sense, as controllers manipulate the status field to represent the object's current state
It is a matter of how we implement the field and the controllers that would interact with it.
I mean, as I see it, this status field will be only ever assigned once then never modified again by any controller. In this scenario I don't think it would be highly dynamic and fragile.
while they rarely—if ever—modify the object's spec.
And for a good reason :)
Ideally, only a human should ever modify a VM's spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple k8s objects rely on their status fields to function. It's not that fragile and is not editable for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this option, IMO, the flow simplifies: on VM start, the VM controller stores either the user-specified spec.template.spec.firmware.uuid or vm.metadata.uid.
No webhooks are required.
We can also trigger events for running VMs without a firmware UUID in the status to restart to get a stable UUID across shutdowns or just raise the restart required condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this option, IMO, the flow simplifies: on VM start, the VM controller stores either the user-specified spec.template.spec.firmware.uuid or vm.metadata.uid.
No webhooks are required.
We can also trigger events for running VMs without a firmware UUID in the status to restart to get a stable UUID across shutdowns or just raise the restart required condition.
Just to ensure that I understood correctly. In this MO, existing VMs would not persist the firmware UUID specified in their VMI counterparts but rather set the status firmware UUID to (for example) vm.metadata.uid and raise the restart required condition? This acknowledges that disruption (i.e. firmware UUID changes post-restart to affect anything that relies on it) is imminent, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This acknowledges that disruption (i.e. firmware UUID changes post-restart to affect anything that relies on it) is imminent, correct?
This disruption has to be prevented. Otherwise it will be a breaking change.
In order to prevent it I believe we'd have to apply a multi-phased approach:
- Continue with the current firmware UUID calculation, but store the UUID in status.
- As @vladikr suggested, use alerts / VM conditions in order to notify the users to restart VMs that do not have firmware UUID in their status.
- After a while, let the community know (i.e. via a mail to mailing list) that in the next version firmware UUID must reside in status in order to prevent breaking workloads.
- Switch to using
vm.metadata.uid
as the new firmware UUID (unless a UUID is already provided in the VM's status, therefore keeping backward compatibility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @vladikr suggested, use alerts / VM conditions in order to notify the users to restart VMs that do not have firmware UUID in their status.
So at least a restart is required to derive the firmware UUID from the VMI (including shutdown VMs) so that it would persist in status, correct?
Switch to using vm.metadata.uid as the new firmware UUID (unless a UUID is already provided in the VM's status, therefore keeping backward compatibility).
Assuming a scenario where n existing VMs in cluster(s) suffer from the issue of overlapping firmware UUIDs, what is the recommended approach for users to rectify the issue post-upgrade? Recreate the VMs? Change the spec.firmware.uuid
to what appears in vm.metadata.uid
and restart? Is user action mandatory to resolve the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So at least a restart is required to derive the firmware UUID from the VMI (including shutdown VMs) so that it would persist in status, correct?
Now that I'm re-thinking about it, a restart might not be necessary, as virt-controller could simply copy spec.firmware.uuid
to the status field if exists, and if not, re-calc the hash and assign it to status.
Assuming a scenario where n existing VMs in cluster(s) suffer from the issue of overlapping firmware UUIDs, what is the recommended approach for users to rectify the issue post-upgrade? Recreate the VMs? Change the spec.firmware.uuid to what appears in vm.metadata.uid and restart? Is user action mandatory to resolve the issue?
TBH I'm not sure that we can redeem this situation. If two VMs are already running with the same firmware UUID, I think one has to be re-created. In any case, I think it shouldn't be in the scope of this proposal which only aims to prevent this situation from happening in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I'm not sure that we can redeem this situation. If two VMs are already running with the same firmware UUID, I think one has to be re-created. In any case, I think it shouldn't be in the scope of this proposal which only aims to prevent this situation from happening in the future.
SGTM, thanks.
15c34f8
to
c35b4a4
Compare
@Acedus would you mind elaborating on this one? why won't it be retained during a restore? |
I'm referring to VM backups taken before the introduction of the changes described in this DP. Correct me if I'm wrong, but the VM snapshot content won't be updated as part of this change (e.g., whether through spec or status), and if it isn't, once the new UUID generation mechanism is introduced and a restore is attempted the UUID will change. |
3751565
to
7201c75
Compare
firmwareUUID: "123e4567-e89b-12d3-a456-426614174000" | ||
``` | ||
|
||
or persist via status condition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue against a condition. Conditions are generally needed to manage the lifecycle or to be consumed by other controllers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conditions are not for persistence.
7201c75
to
1279cbf
Compare
**Pros:** | ||
- The upgrade-specific code can be safely removed after two or three releases. | ||
- Simple logic post-upgrade, with a straightforward UUID assignment. | ||
- Limited disturbance to GitOps workflows, affecting only pre-existing VMs with the current method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true if gitops is setting metadata.uid, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiand @iholder101 What if the VM controller would dynamically add this annotation during VM creation if vm.spec.template.spec.Firmware.UUID
is not defined?
For example, when a VM is created (via GitOps or using oc/kubectl), the controller would set the following annotation:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
annotations:
kubevirt.io/host-uuid: "<host-uuid>"
Does that align with your understanding? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jangel97 what is the benefit of storing this in a new annotation, instead of an already-existing API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe leveraging existing solutions would be the most effective approach. However, we should also consider scenarios where a system has a host UUID, and this breaking change could potentially alter it. In that case, if I understand correctly, wouldn't it be necessary to retain the "old" host UUID somewhere?
- May potentially bottleneck `virt-api`. | ||
|
||
3. **Add a Firmware UUID Field to VM Status** | ||
**Description:** Store the generated UUID within the VM’s status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status is only live, it must not be used to store information.
aka any informatoin in the status in only relevant and "persisted" on the runtime platform.
Object smust not be created with a populated status (only empty status is what we want: status: {}
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object smust not be created with a populated status (only empty status is what we want:
status: {}
)
A VM object won't be created with a populated status, but a VM that was created, booted, then shut off would have a non-empty status.
Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @fabiand :)
just want to understand if I get you correctly.
type: Ready | ||
created: true | ||
runStrategy: Once | ||
firmwareUUID: "123e4567-e89b-12d3-a456-426614174000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.status ins only for erporting, not for persitence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate?
How is spec more persistent than status? Aren't they both persisted in etcd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.status ins only for erporting, not for persitence.
@fabiand I don't know where this is coming from. Numerous objects across k8s use the status exactly for that purpose, including objects in KubeVirt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VM object, until it is deleted would have a status - it simply stores the current state of the VM, while the spec holds the desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, it is not what status is meant to show.
Status is meant to represent the current state. However, the FW UUID cannot be "read" from the current state of the VM, because it is stopped.
There is a name to persisting such information on the status: caching.
Status is not about caching, it is about representing the actual. But here the suggestion is to store the actual so it will be the input for a future actual (which comes in the opposite direction).
We actually had this implemented in the network interfaces status and it caused a mess: To calculate the new status on a reconcile cycle, the previous data was considered, potentially dragging incorrect state over cycles indefinitely. We have since dropped this behavior, calculating the interfaces status without depending on the previous state.
But this is still tricky business even today, because multiple components attempt to touch the same list.
Bottom line, while this could be a good solution, caching the data on the status can potentially bite back. There is logic that uses this status information to set the UUID on the next VM instantiation and an opposite logic that reads the VM instantiation spec (or status?) to reflect that actual back to the same status field. Sounds like a loop to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the VM is stopped the status shows the current
state at this UUID is stored on the VM boot disk
As long as the VM exists this is its current state, regardless of its phase.
firmwareUUID: "123e4567-e89b-12d3-a456-426614174000" | ||
``` | ||
|
||
or persist via status condition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conditions are not for persistence.
**Pros:** | ||
- The upgrade-specific code can be safely removed after two or three releases. | ||
- Simple logic post-upgrade, with a straightforward UUID assignment. | ||
- Limited disturbance to GitOps workflows, affecting only pre-existing VMs with the current method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jangel97 what is the benefit of storing this in a new annotation, instead of an already-existing API?
|
||
**Cons:** | ||
- Requires additional code (likely in `virt-operator`) to handle the upgrade-specific persistence. | ||
- Patching the spec of a running VM may trigger a "restart required" condition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be easily avoided in code, if we find that vm's Firmware.UUID has NOT differed from the vmi's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, patching object specs after the admission is not a good practice that hides the user intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, patching object specs after the admission is not a good practice that hides the user intent.
Ack. And I'd say that patching during admission is just as bad in this respect.
However in this case, the user never intended us to have a buggy UUID, but must have it persisted. So I don't see here any issue, beside the (limited) disturbance to git-ops.
Can this be easily avoided in code, if we find that vm's Firmware.UUID has NOT differed from the vmi's?
@dasionov would you refer to this? I believe that the Con should say
- Code should be added to the computation of the "restart required" condition so that it is not raised if vm.Template.Firmware.UUID equals vmi.Firmware.UUID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sry for the delay, fixed
1279cbf
to
b79c2d0
Compare
|
||
### 4. Introduce a Breaking Change to Ensure Universal Uniqueness | ||
|
||
**Description:** Modify UUID generation to use a combination of VM name and namespace, ensuring unique UUIDs within the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about universal uniqueness? How would it be ensured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about universal uniqueness? How would it be ensured?
Using both name + namespace as a hash will ensure uniqueness AFAICT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not universal uniqueness. a different vm on a different cluster would obtain the same uuid if it has the same name and namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got you.
We can always add something "random" to the hash, like creation timestamp. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to clarify the impact of VMs in different clusters having the same UUID if they share the same name and namespace?
Are there specific cases where cross-cluster uniqueness is essential for the UUID to function correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that @dankenigsberg is referring migrating a VM from a different cluster, e.g. via importing/exporting disks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there specific cases where cross-cluster uniqueness is essential for the UUID to function correctly?
By definition, UUID has to be universally-unique. A lot of services would break if two different VMs present matching identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use the pod medata.UID?
https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the vm.metadata.uid?
- May necessitate tooling to facilitate UUID preservation for compatibility with existing workloads. | ||
|
||
|
||
### 5. Upgrade-Specific Persistence of Firmware UUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t necessarily object to this approach—in fact, it’s likely unavoidable to prevent disruption to existing workloads—but the process for implementing it should be more thoroughly detailed IMO.
Correct me if I'm wrong, but if VMs are stopped during the upgrade, they cannot rely on VMI objects to derive and persist the firmware UUID, as those objects won’t exist. As a result, existing VMs without a defined spec.firmware.uuid
field would need to populate the persistent UUID field (whatever that may be) using the current method (based on the name). The new method should then be applied exclusively to new VMs.
While this doesn’t cover all use cases, it’s reasonable to assume it addresses the majority—except for scenarios involving (for example) backup and restore, which present a separate set of challenges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the reservation that you are raising. Option 5 does not distinguish between running and non-running VMs. During upgrade, they are all changed to persist the buggy uuid. After upgrade, all VMs receive a fresh uuid.
The problem of restoring old VMs that did not persist their buggy uuid sounds important, but not specific to this option. @dasionov can you address it? in the design? I don't have any good idea...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the original idea was to persist the UUID in existing VMs by obtaining it from the VMIs, therefore my comment. Perhaps I mixed up the proposal with implementation, my bad, you can disregard the comment in that case.
### 5. Upgrade-Specific Persistence of Firmware UUID | ||
|
||
**Description:** Before upgrading KubeVirt, persist the `Firmware.UUID` of existing VMsin `vm.spec`. | ||
After the upgrade, any VM without `Firmware.UUID` is considered new. | ||
For these new VMs, use `vm.metadata.uid` as the firmware UUID if `vm.spec.template.spec.Firmware.UUID` is not defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem with this approach is upgrades that skip versions.
IOW, a user can upgrade Kubevirt v1.0 straight to v1.3, skipping v1.1 and v1.2. With this approach, such users will break their VMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support skipping intermediate minor releases? I don't think that we test that.
Anyway, the proposal says below
- Upgrade-specific code can be removed after two or three releases.
so a 1.4->1.7 would still be safe in this regard.
3d599f9
to
f1166a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dasionov Hi, I would suggest to simplify the PR subject and call it "improved handling of the VM firmware UUID", sine I see that you want to cover more than only the UUID persistence subject.
f1166a8
to
4da1ce6
Compare
4da1ce6
to
45bdad4
Compare
/cc @EdDev |
**Pros:** | ||
- Straightforward implementation. | ||
- Avoids the need to introduce new API fields. | ||
- Fully backward compatible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this break existing VMs? After a shutdown, these will get a new UUID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a shutdown, these will get a new UUID.
IIUC the idea here is a multi-phased approach, in which UUID will first be saved to the spec and then (after a few releases) the way of computing the UUID will change. IOW this approach is similar to the status field approach but with a spec field.
**Cons:** | ||
- Introduces a new API field, which could increase API surface area and requires long-term support. | ||
- New fields may impact system performance and resource load. | ||
- Adds to API complexity and may reduce readability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this con, can you elaborate?
|
||
**Cons:** | ||
- Introduces a new API field, which could increase API surface area and requires long-term support. | ||
- New fields may impact system performance and resource load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how?
**Cons:** | ||
- Potential misuse of the spec field, which ideally should only reflect the user’s intended configuration. | ||
- The majority of users may not consider the UUID part of the desired VM state, making this field irrelevant from their perspective. | ||
- Can lead to longer and more complex VM definitions, impacting readability and management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come? Can we elaborate?
### 1. Persist UUID in the VM's Spec Field | ||
|
||
**Description:** The UUID is generated when the VMI is being created and is then saved to the VM's spec field. | ||
|
||
**Pros:** | ||
- Straightforward implementation. | ||
- Avoids the need to introduce new API fields. | ||
- Fully backward compatible. | ||
|
||
**Cons:** | ||
- Potential misuse of the spec field, which ideally should only reflect the user’s intended configuration. | ||
- The majority of users may not consider the UUID part of the desired VM state, making this field irrelevant from their perspective. | ||
- Can lead to longer and more complex VM definitions, impacting readability and management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you ensure backward compatibility here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look at #347 (comment).
@dasionov perhaps it's better to add this to the document (under several of the approaches)
### 2. Generate a Default UUID via Webhook and Set it to the Spec Field | ||
|
||
**Description:** A webhook generates a UUID if the user does not provide one. This UUID is then set in the VM's spec field. | ||
|
||
**Pros:** | ||
- Retains backward compatibility and avoids introducing new API fields. | ||
|
||
**Cons:** | ||
- Same concerns as above about the spec field. | ||
- Using webhooks to assign defaults can degrade performance, especially with high volumes of VMs, as `virt-api` may become a bottleneck. | ||
- May compromise scalability, as generating a UUID via webhook adds overhead when multiple VMs start simultaneously. | ||
|
||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should describe why is it backward compatible (The controller uses same function to "generate" UUID in case of old vm where old vm is equal to vm that does not have the UUID set)
|
||
**Cons:** | ||
- Same concerns as above about the spec field. | ||
- Using webhooks to assign defaults can degrade performance, especially with high volumes of VMs, as `virt-api` may become a bottleneck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this degrade a performance if the webhook already exists? How is this a bottleneck if the API can be scaled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess maybe not everyone would want to scale the entire virt-api component just to get more web-hooks power IMHO
### 3. Create a New Firmware UUID Field in VM Status | ||
|
||
**Description:** A new field is added under VM status to store the generated firmware UUID upon first VM boot. | ||
|
||
**Pros:** | ||
- Keeps the spec clean, avoiding unnecessary fields in the user-defined configuration. | ||
- Provides a clear separation between user configuration (spec) and system-generated information (status). | ||
|
||
**Cons:** | ||
- Introduces a new API field, which could increase API surface area and requires long-term support. | ||
- New fields may impact system performance and resource load. | ||
- Adds to API complexity and may reduce readability. | ||
|
||
*Note:* To minimize this con, consider setting the UUID as a condition under VM status instead of introducing a separate field. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again how do we ensure the existing VMs retain stable UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the flow of option 3 is not well explained - at least from what I had in mind.
I see this working the following:
VM controller:
-
On start:
if the user explicitly set thevm.spec.template.spec.Firmware.UUID
- propagate this tovmi.spec.Firmware.UUID
Otherwise (user didn't set firmware UUID explicitly):
- sets the
vmi.spec.Firmware.UUID
tovm.metadata.uid
ifvm.Status.FirmwareUUID
is empty (because it's a newly created VM - no VMI existed before) - if vm.status.FirmwareUUID is set - propagate it to
vmi.spec.Firmware.UUID
- sets the
-
On Sync:
ifvm.status.FirmwareUUID
is empty set it usingvmi.spec.Firmware.UUID
This approach preserves backward compatibility and is still gitops friendly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I imagine it, by using the same approach as with vTPM backend storage it would be easy to manage the VM Cloning, snapshot and restore aspects, i.e. we would not clone the UUID PVC as it should be unique just like the vTPM
|
||
--- | ||
|
||
### 4. Introduce a Breaking Change to Ensure Universal Uniqueness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1
- May necessitate tooling to facilitate UUID preservation for compatibility with existing workloads. | ||
|
||
|
||
### 5. Upgrade-Specific Persistence of Firmware UUID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require having 2 releases to fix this issue or some kind of coordination of the upgrade that is not available today.
should not affect updates / rollbacks. | ||
|
||
## Functional Testing Approach | ||
Verify that a newly created VMI has a unique firmware UUID assigned and that this UUID persists across VMI restarts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have this test already?
What we miss is upgrade compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we even test that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could express that some parts are tested as e2e tests and others as unit tests.
You could detail the scenarios which needs coverage.
bc9b3ff
to
58738d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I like this proposal.
While I think there is room to improve some parts, the base is good and triggers conversation and ideas.
Thank you!
|
||
|
||
## Definition of Users | ||
End Users: Individuals or organizations running VMs and VMIs on KubeVirt who require consistent firmware UUIDs for their applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually define VM owners (e.g. namespace/project admins) or/and cluster admins.
I guess you mean here to the first.
End Users: Individuals or organizations running VMs and VMIs on KubeVirt who require consistent firmware UUIDs for their applications. | ||
|
||
## User Stories | ||
As an end-user, I expect my VMI to maintain its identity across restarts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look like a "story" to me.
And there are several scenarios, from creating and starting a new VM, starting an old VM (that was not running), restoring an old VM, restoring a new VM, etc.
Please list here all the cases that you intend to support and mention the ones you do not intend to support for this design.
@@ -0,0 +1,191 @@ | |||
# Overview | |||
This proposal introduces a mechanism to persist the firmware UUID of a Virtual Machine Instance (VMI) in KubeVirt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that we persist VM (not VMI) FW ID.
**Description:** A webhook generates a UUID if the user does not provide one. This UUID is then set in the VM's spec field. | ||
This approach ensures backward compatibility because the controller will use the same function to generate a UUID for VMs that do not already have the UUID set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me the added value of a webhook compared to a similar logic at the VM controller.
All changes to the VM resource passes through the VM controller, so please elaborate how a webhook can do more. If it is just an option, with identical pros as one in the controller, then please clarify that explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this got drop in the last update, so the comment is irrelevant now.
## Repos | ||
Kubevirt/kubevirt | ||
|
||
# Design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a need to choose one solution and then list the others as alternatives.
|
||
--- | ||
|
||
## What Happens to the Firmware UUID During a Restore? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are potentially to issues in this domain, one extends an existing problem in general and another may be specific to restoration (and cloning?):
- There is a potential collision with an existing identical UUID, but this is rare enough so one could intentionally ignore.
- If the same VM can be restored twice, then it has the potential to collide with a VM defined in the system.
Can a VM be restored twice, resulting in two cone VMs?
In the same context, cloning needs to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be cloned, which will remove the VM specific data, but not restored twice.
type: Ready | ||
created: true | ||
runStrategy: Once | ||
firmwareUUID: "123e4567-e89b-12d3-a456-426614174000" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, it is not what status is meant to show.
Status is meant to represent the current state. However, the FW UUID cannot be "read" from the current state of the VM, because it is stopped.
There is a name to persisting such information on the status: caching.
Status is not about caching, it is about representing the actual. But here the suggestion is to store the actual so it will be the input for a future actual (which comes in the opposite direction).
We actually had this implemented in the network interfaces status and it caused a mess: To calculate the new status on a reconcile cycle, the previous data was considered, potentially dragging incorrect state over cycles indefinitely. We have since dropped this behavior, calculating the interfaces status without depending on the previous state.
But this is still tricky business even today, because multiple components attempt to touch the same list.
Bottom line, while this could be a good solution, caching the data on the status can potentially bite back. There is logic that uses this status information to set the UUID on the next VM instantiation and an opposite logic that reads the VM instantiation spec (or status?) to reflect that actual back to the same status field. Sounds like a loop to me.
The proposed changes have no anticipated impact on scalability capabilities of the KubeVirt framework | ||
|
||
## Update/Rollback Compatibility | ||
should not affect updates / rollbacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the content of ## What Happens to the Firmware UUID During a Restore?
should land.
Or you should remove this part.
should not affect updates / rollbacks. | ||
|
||
## Functional Testing Approach | ||
Verify that a newly created VMI has a unique firmware UUID assigned and that this UUID persists across VMI restarts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could express that some parts are tested as e2e tests and others as unit tests.
You could detail the scenarios which needs coverage.
- Based on the selected design, either introduce a new field or utilize an existing one | ||
(such as in spec, status) to store the firmware UUID. | ||
- Update controller logic to check for and persist the UUID, ensuring it is generated only once per VM. | ||
- Testing: add unit and functional tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think these are valid phases.
You cannot add any logic without unit tests and possibly basic e2e tests.
And you cannot add the API change without utilizing it, not even as a separate PR.
Usually phasing means that you provide a basic functionality that works, then extend it in the next phases. It can also include feature lifecycle stages, i.e. alpha, beta and GA planning.
BTW, you should mention explicitly that there is no plan to protect this new logic with a FG (the fact it was not mentioned, implied to me that this is the intention).
58738d1
to
2d28927
Compare
2d28927
to
df1dc1f
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1cb58b3
to
4082df8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using the same way as with vTPM in order to persist the UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, what I don't like about the currently proposed approach is that a spec field becoming mandatory & auto-set by a controller. In my opinion this is spec abuse, as the user cannot express "I don't care about this, do whatever you want", but instead the desired state is misleadingly expressing "I care about this, and want this specific firmware UUID".
Saying that, I'm fine with this approach as a temporary step since:
- If we chose the status field approach, we would introduce a new field that AFAIU no one cares about and would serve only our controllers.
- We already have spec fields that behave this way.
In my opinion, in the far future, we should:
- Drop the logic to patch existing VMs.
- Make the spec field optional again.
- Similar to how it's working now, implicitly set the firmware UUID if not specified in spec.
Thank you very much @dasionov for driving this and for your patience!
Looks good to me in general.
1. **New VMs**: | ||
- If the firmware UUID is not explicitly defined in `vm.spec.template.spec.firmware.uuid`, the mutator webhook will automatically set the firmware UUID to the value of `vm.metadata.uid`. | ||
|
||
2. **Old VMs**: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "old VMs" I think you mean "existing VMs", right?
This proposal introduces a mechanism to persist the firmware UUID of a Virtual Machine in KubeVirt. | ||
By storing the firmware UUID, we ensure that it remains consistent across VM restarts. | ||
which is crucial for applications and services that rely on the UUID for identification or licensing purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is correct, I don't think that's the most important part of this proposal, but rather that the firmware UUID to become, well, a "real" UUID (universally unique ID). Can you please rephrase?
* Improve the uniqueness of the VMI firmware UUID to become independent of the VM name and namespace. | ||
* Maintain the UUID persistence over the VM lifecycle. | ||
* Maintain backward compatibility | ||
* Maintain UUID persistence with VM backup/restore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but I think we won't be able to achieve this.
Perhaps this can be moved to the non-goals section.
## Non Goals | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change UUID of currently existing VMs
|
||
|
||
## Definition of Users | ||
VM owners: who require consistent firmware UUIDs for their applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also cluster-admin: to ensure VMs have universally unique firmware IDs
## User Stories | ||
### Supported cases: | ||
* Creating and Starting a New VM | ||
* starting and old VM that was not running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to support that?
how would the controller distinguish between VMs that have started and VMs that have not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would use the legacy uuid calculation for both running/stopped vms.
## Update/Rollback Compatibility | ||
Backups created before implementing the persistent firmware UUID mechanism will not include the firmware UUID in the VM's spec. | ||
As a result, restoring such backups will generate a new UUID for the VM. | ||
This change may lead to compatibility issues for workloads or systems that rely on consistent UUIDs, such as licensing servers or configuration management systems. | ||
Users are advised to take this into consideration and plan backup and restore operations accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can tweak and restore controller's code such that if a backup did not consist a firmware UUID in the spec, the controller would know to set up the UUID in the old way. This will ensure that old backups are compatible.
FYI @akalenyu @ShellyKa13
This proposal introduces a mechanism to make the firmware UUID of a Virtual Machine in KubeVirt universally unique. It ensures the UUID remains consistent across VM restarts, preserving stability and reliability. Signed-off-by: Daniel Sionov <[email protected]>
4082df8
to
edb94f1
Compare
What this PR does / why we need it:
This proposal introduces a mechanism to persist the FW ID of a Virtual Machine (VM) in KubeVirt. By storing the FW ID, we ensure that it remains consistent across VMI restarts, which is
crucial for applications and services that rely on the UUID for identification or licensing purposes.
relates-to: kubevirt/kubevirt#13156, kubevirt/kubevirt#13158
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: