Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OldestAutoDowning behavior with multiple nodes going unreachable at the same time #22

Open
kkolman opened this issue Apr 11, 2018 · 3 comments · Fixed by sisioh/akka-cluster-custom-downing#3

Comments

@kkolman
Copy link

kkolman commented Apr 11, 2018

Hi !

I was experimenting with OldestAutoDowning strategy and stumbled upon a behavior when multiple nodes get into unreachable state at the "same" time - or within a "stable-after" window.

In that case OldestAutoDowning will down one node only, the events for other nodes will not get processed and things are stuck in "Leader can currently not perform its duties" state forever.

2018-04-11 15:23:04 [INFO ] [a.c.Cluster(akka://somecluster)]  - Cluster Node [akka.tcp://[email protected]:2552] - Leader can currently not perform its duties,
 reachability status: [
  akka.tcp://[email protected]:2552 -> akka.tcp://[email protected]:60202: Unreachable [Unreachable] (9),
  akka.tcp://[email protected]:2552 -> akka.tcp://[email protected]:60265: Unreachable [Unreachable] (8)
  ], member status: [
    akka.tcp://[email protected]:2552 Up seen=true,
    akka.tcp://[email protected]:60202 Up seen=false,
    akka.tcp://[email protected]:60265 Up seen=false
    ]

These are the settings:

akka.cluster.downing-provider-class = "tanukki.akka.cluster.autodown.OldestAutoDowning"

custom-downing {
  stable-after = 20s

  oldest-auto-downing {
    oldest-member-role = "master"
    down-if-alone = false
  }
}

Seems to me in this case
CustomAutoDownBase#downPendingUnreachableMembers will never get called in that case as its called from OldestAutoDownBase#onMemberRemoved only and somehow that will not happen.

How to reproduce:

  • start node with "master" role
  • start two other nodes not having "master" role
  • kill the two nodes
  • => one gets downed, the other one not, "Leader can currently not perform its duties" state forever

Is this a bug or a feature ?

Thx !

@kkolman kkolman changed the title OldestAutoDowning OldestAutoDowning behavior with multiple nodes going unreachable at the same time Apr 11, 2018
@TanUkkii007
Copy link
Owner

I think this is a bug. In this case the maser is the oldest so the master should down the other two members.

@kkolman
Copy link
Author

kkolman commented Apr 16, 2018

@TanUkkii007 thx for the response. i was trying to write a test to track this down but did not suceed (yet?) as im not so fluent in scala.

@TanUkkii007
Copy link
Owner

@kkolman
I hope my implementation can help improve your Scala skills.
The motivation I created this split brain resolver was I wanted to learn how Akka cluster works.
I am not a active developpr now, but thanks to other contributors who kindly updated Scala and Akka versions, I think this project is still a good playground.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants