OldestAutoDowning behavior with multiple nodes going unreachable at the same time #22

kkolman · 2018-04-11T13:35:49Z

Hi !

I was experimenting with OldestAutoDowning strategy and stumbled upon a behavior when multiple nodes get into unreachable state at the "same" time - or within a "stable-after" window.

In that case OldestAutoDowning will down one node only, the events for other nodes will not get processed and things are stuck in "Leader can currently not perform its duties" state forever.

2018-04-11 15:23:04 [INFO ] [a.c.Cluster(akka://somecluster)]  - Cluster Node [akka.tcp://[email protected]:2552] - Leader can currently not perform its duties,
 reachability status: [
  akka.tcp://[email protected]:2552 -> akka.tcp://[email protected]:60202: Unreachable [Unreachable] (9),
  akka.tcp://[email protected]:2552 -> akka.tcp://[email protected]:60265: Unreachable [Unreachable] (8)
  ], member status: [
    akka.tcp://[email protected]:2552 Up seen=true,
    akka.tcp://[email protected]:60202 Up seen=false,
    akka.tcp://[email protected]:60265 Up seen=false
    ]

These are the settings:

akka.cluster.downing-provider-class = "tanukki.akka.cluster.autodown.OldestAutoDowning"

custom-downing {
  stable-after = 20s

  oldest-auto-downing {
    oldest-member-role = "master"
    down-if-alone = false
  }
}

Seems to me in this case
CustomAutoDownBase#downPendingUnreachableMembers will never get called in that case as its called from OldestAutoDownBase#onMemberRemoved only and somehow that will not happen.

How to reproduce:

start node with "master" role
start two other nodes not having "master" role
kill the two nodes
=> one gets downed, the other one not, "Leader can currently not perform its duties" state forever

Is this a bug or a feature ?

Thx !

The text was updated successfully, but these errors were encountered:

TanUkkii007 · 2018-04-14T08:53:50Z

I think this is a bug. In this case the maser is the oldest so the master should down the other two members.

kkolman · 2018-04-16T09:31:18Z

@TanUkkii007 thx for the response. i was trying to write a test to track this down but did not suceed (yet?) as im not so fluent in scala.

TanUkkii007 · 2018-04-23T13:50:18Z

@kkolman
I hope my implementation can help improve your Scala skills.
The motivation I created this split brain resolver was I wanted to learn how Akka cluster works.
I am not a active developpr now, but thanks to other contributors who kindly updated Scala and Akka versions, I think this project is still a good playground.

kkolman changed the title ~~OldestAutoDowning~~ OldestAutoDowning behavior with multiple nodes going unreachable at the same time Apr 11, 2018

j5ik2o mentioned this issue Oct 29, 2019

fix issue 22 sisioh/akka-cluster-custom-downing#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OldestAutoDowning behavior with multiple nodes going unreachable at the same time #22

OldestAutoDowning behavior with multiple nodes going unreachable at the same time #22

kkolman commented Apr 11, 2018

TanUkkii007 commented Apr 14, 2018

kkolman commented Apr 16, 2018

TanUkkii007 commented Apr 23, 2018

OldestAutoDowning behavior with multiple nodes going unreachable at the same time #22

OldestAutoDowning behavior with multiple nodes going unreachable at the same time #22

Comments

kkolman commented Apr 11, 2018

TanUkkii007 commented Apr 14, 2018

kkolman commented Apr 16, 2018

TanUkkii007 commented Apr 23, 2018