-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rescheduling] Add mutex #188
base: stable/yoga-m3
Are you sure you want to change the base?
Conversation
# If any of the load balancers is currently being rescheduled, we want to hold back the whole declaration because we don't know whether the load balancer is supposed to exist on this device or not. | ||
pass # TODO | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use the provisioning_status
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far my reasoning for not using provisioning_status
was that it is bound to specific values. There are in fact possible values that aren't used for load balancers so far (e.g. ALLOCATED
). However we can't use that field at all due to following reason:
No matter what locking mechanism we use, it has to be used by the controller_worker
as well when updating a load balancer. It calls status_manager.update_status
after syncing the LBs. That function determines the status to set by looking at the current value of provisioning_status
. If we let the controller worker change that field, the value it was set to before would be lost. The controller worker could remember the value of that field for every load balancer, however this would not be crash-safe.
It turns out there is a simple solution however: I'll use the amphora status
field (which is bound to the same range of values as provisioning_status
) and set it to ALLOCATED
(that value is so far only used for amphora entries representing devices) while the associated load balancer is locked by either the rescheduling arbiter or the controller worker. update_status
would then reset that field (unlock the LB). It can be made crash-resistant by having workers reset the field to the value of the associated LB's provisioning_status
at startup, if compute_flavor
matches the host this worker is assigned to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pretty over-engineered to me, so you want to introduce locks to the status_manager as well as controller_worker? The status is considered a user facing property, nothing to act on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not overengineered, it's basically exactly what you proposed, just with all necessary considerations laid out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I misunderstood your initial proposal (the one with provisioning_status
)
No description provided.