Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rescheduling] Add mutex #188

Draft
wants to merge 1 commit into
base: stable/yoga-m3
Choose a base branch
from

Conversation

BenjaminLudwigSAP
Copy link
Collaborator

No description provided.

octavia_f5/api/drivers/f5_driver/arbiter.py Show resolved Hide resolved
octavia_f5/api/drivers/f5_driver/arbiter.py Show resolved Hide resolved
Comment on lines +186 to +188
# If any of the load balancers is currently being rescheduled, we want to hold back the whole declaration because we don't know whether the load balancer is supposed to exist on this device or not.
pass # TODO

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use the provisioning_status?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far my reasoning for not using provisioning_status was that it is bound to specific values. There are in fact possible values that aren't used for load balancers so far (e.g. ALLOCATED). However we can't use that field at all due to following reason:
No matter what locking mechanism we use, it has to be used by the controller_worker as well when updating a load balancer. It calls status_manager.update_status after syncing the LBs. That function determines the status to set by looking at the current value of provisioning_status. If we let the controller worker change that field, the value it was set to before would be lost. The controller worker could remember the value of that field for every load balancer, however this would not be crash-safe.

It turns out there is a simple solution however: I'll use the amphora status field (which is bound to the same range of values as provisioning_status) and set it to ALLOCATED (that value is so far only used for amphora entries representing devices) while the associated load balancer is locked by either the rescheduling arbiter or the controller worker. update_status would then reset that field (unlock the LB). It can be made crash-resistant by having workers reset the field to the value of the associated LB's provisioning_status at startup, if compute_flavor matches the host this worker is assigned to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty over-engineered to me, so you want to introduce locks to the status_manager as well as controller_worker? The status is considered a user facing property, nothing to act on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not overengineered, it's basically exactly what you proposed, just with all necessary considerations laid out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I misunderstood your initial proposal (the one with provisioning_status)

@BenjaminLudwigSAP BenjaminLudwigSAP self-assigned this Apr 12, 2023
@BenjaminLudwigSAP BenjaminLudwigSAP added the rescheduling Relevant for rescheduling semantics in some way label Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rescheduling Relevant for rescheduling semantics in some way
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants