[rescheduling] Add mutex #188

BenjaminLudwigSAP · 2022-11-14T17:33:58Z

No description provided.

octavia_f5/api/drivers/f5_driver/arbiter.py

octavia_f5/api/drivers/f5_driver/tasks/reschedule_tasks.py

notandy · 2022-11-14T20:24:39Z

octavia_f5/controller/worker/sync_manager.py

+        # If any of the load balancers is currently being rescheduled, we want to hold back the whole declaration because we don't know whether the load balancer is supposed to exist on this device or not.
+        pass # TODO
+


Any reason not to use the provisioning_status?

So far my reasoning for not using provisioning_status was that it is bound to specific values. There are in fact possible values that aren't used for load balancers so far (e.g. ALLOCATED). However we can't use that field at all due to following reason:
No matter what locking mechanism we use, it has to be used by the controller_worker as well when updating a load balancer. It calls status_manager.update_status after syncing the LBs. That function determines the status to set by looking at the current value of provisioning_status. If we let the controller worker change that field, the value it was set to before would be lost. The controller worker could remember the value of that field for every load balancer, however this would not be crash-safe.

It turns out there is a simple solution however: I'll use the amphora status field (which is bound to the same range of values as provisioning_status) and set it to ALLOCATED (that value is so far only used for amphora entries representing devices) while the associated load balancer is locked by either the rescheduling arbiter or the controller worker. update_status would then reset that field (unlock the LB). It can be made crash-resistant by having workers reset the field to the value of the associated LB's provisioning_status at startup, if compute_flavor matches the host this worker is assigned to.

This seems pretty over-engineered to me, so you want to introduce locks to the status_manager as well as controller_worker? The status is considered a user facing property, nothing to act on.

It's not overengineered, it's basically exactly what you proposed, just with all necessary considerations laid out.

Unless I misunderstood your initial proposal (the one with provisioning_status)

fixup! [rescheduling] Make sure LB isn't updated during rescheduling

6c75f06

notandy reviewed Nov 14, 2022

View reviewed changes

BenjaminLudwigSAP self-assigned this Apr 12, 2023

BenjaminLudwigSAP added the rescheduling Relevant for rescheduling semantics in some way label Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rescheduling] Add mutex #188

[rescheduling] Add mutex #188

BenjaminLudwigSAP commented Nov 14, 2022

notandy Nov 14, 2022

BenjaminLudwigSAP Nov 15, 2022

notandy Nov 15, 2022

BenjaminLudwigSAP Nov 16, 2022

BenjaminLudwigSAP Nov 16, 2022

		# If any of the load balancers is currently being rescheduled, we want to hold back the whole declaration because we don't know whether the load balancer is supposed to exist on this device or not.
		pass # TODO

[rescheduling] Add mutex #188

Are you sure you want to change the base?

[rescheduling] Add mutex #188

Conversation

BenjaminLudwigSAP commented Nov 14, 2022

notandy Nov 14, 2022

Choose a reason for hiding this comment

BenjaminLudwigSAP Nov 15, 2022

Choose a reason for hiding this comment

notandy Nov 15, 2022

Choose a reason for hiding this comment

BenjaminLudwigSAP Nov 16, 2022

Choose a reason for hiding this comment

BenjaminLudwigSAP Nov 16, 2022

Choose a reason for hiding this comment