You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 15, 2018. It is now read-only.
Both lock and seed-node entries get a TTL after which they expire at the backend. Given this, there is no attempt at the codebase to delete them when it is actually allowed. These cases would be:
Lock entries can be deleted when single cluster node has successfully joined cluster and added itself to the backend.
Seed node entries can be deleted when a node is gracefully leaving the cluster.
While not deleting these entries does not create some catastrophic failure (since they will eventually expire), they do have some annoying side effects:
I have happened to experience the generation of a huge amount of occurences of the log message Couldn't acquire lock, going to GettingNodes due to the existence of a lock that could have been released, but has just been left to expire (the reason becomes more frequent in the context of Join timing out in handover scenario #168)
Newcomers to the cluster will have to wait for a whole TTL to join successfully in the case that previously existing nodes have just exited the cluster. (assuming Join timing out in handover scenario #168 will be addressed)
My suggestion would be to add two extra methods on the Coordination trait:
unlock
removeSelf
with default implementations that would do nothing (to keep backwards compatibility) and that would attempt to delete lock and seed node entries respectively on a best effort basis. This means that the ConstructrMachine would fire & forget these commands so as not to make the FSM code any more complex. In the best case, firing these commands will achieve the desired results and in the worst one, nothing changes compared to how the FSM works at the moment.
As mentioned at #168 this is one of the improvements my team would need in the context of our work, which of course we will be more than happy to contribute via PR. Your thoughts?
The text was updated successfully, but these errors were encountered:
Both lock and seed-node entries get a TTL after which they expire at the backend. Given this, there is no attempt at the codebase to delete them when it is actually allowed. These cases would be:
While not deleting these entries does not create some catastrophic failure (since they will eventually expire), they do have some annoying side effects:
Couldn't acquire lock, going to GettingNodes
due to the existence of a lock that could have been released, but has just been left to expire (the reason becomes more frequent in the context of Join timing out in handover scenario #168)My suggestion would be to add two extra methods on the
Coordination
trait:with default implementations that would do nothing (to keep backwards compatibility) and that would attempt to delete lock and seed node entries respectively on a best effort basis. This means that the ConstructrMachine would fire & forget these commands so as not to make the FSM code any more complex. In the best case, firing these commands will achieve the desired results and in the worst one, nothing changes compared to how the FSM works at the moment.
As mentioned at #168 this is one of the improvements my team would need in the context of our work, which of course we will be more than happy to contribute via PR. Your thoughts?
The text was updated successfully, but these errors were encountered: