Priam on new ASG instance tells Cassandra to gossip with dead ASG EC2 instance #686

amr46 · 2018-07-03T16:38:55Z

Setup:

running cassandra 3.11 and Priam 3.11
ASG with 3 nodes, one in each AZ 1a, 1b, 1c
Manually terminated the node in 1c via the AWS console.
New node comes up shortly after. Priam detects the old 1c node is dead in its log.
is_replace_token is true and the IP of the dead node is returned in get_replaced_ip
Old node marked as '-dead' in aws sdb and DN in nodetool status
New Node starts Priam, TRIES to start Cassandra, but keeps telling Cassandra that the old 1c node is still in gossip. Cassandra cannot connect to the downed node, and aborts on startup

Fix:

a service tomcat8 restart on the new node fixes the problem
on the restart, is_replace_token returns: false so no IP is replaced so no gossip with dead nodes occurs
upon restart, nodetool status on the other nodes replaces the 'DN' node with the new node

Questions:

Why is Cassandra not able to replace the dead node ?
Why on the Priam restart, is Cassandra able to restart successfully ignoring the dead node?

The text was updated successfully, but these errors were encountered:

amr46 · 2018-07-12T16:24:54Z

I think that the protocol used by priam is incorrect:
If is_replace = true, and it's attempting to replace a downed node - that node might be unavailable altogether. Priam has explicitly marked this downed node as dead, so the expectation of any communication with it should be 0.

Cassandra, when started with in replace mode, attempts to talk to the downed node and fails whenever the node doesn't exist in gossip. Hence the replace can never happen without manual intervention.

@arunagrawal84 thx for helping out in the past, could you comment on this?

arunagrawal84 · 2018-07-17T20:53:08Z

@amr46 can you please confirm if other 2 nodes (in other AZ's), are marked as seed nodes as well?

amr46 · 2018-08-25T21:52:47Z

I will have to replicate the environment and get back to you ASAP in the week of 9/3 @arunagrawal84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Priam on new ASG instance tells Cassandra to gossip with dead ASG EC2 instance #686

Priam on new ASG instance tells Cassandra to gossip with dead ASG EC2 instance #686

amr46 commented Jul 3, 2018 •

edited

Loading

amr46 commented Jul 12, 2018 •

edited

Loading

arunagrawal84 commented Jul 17, 2018

amr46 commented Aug 25, 2018

Priam on new ASG instance tells Cassandra to gossip with dead ASG EC2 instance #686

Priam on new ASG instance tells Cassandra to gossip with dead ASG EC2 instance #686

Comments

amr46 commented Jul 3, 2018 • edited Loading

amr46 commented Jul 12, 2018 • edited Loading

arunagrawal84 commented Jul 17, 2018

amr46 commented Aug 25, 2018

amr46 commented Jul 3, 2018 •

edited

Loading

amr46 commented Jul 12, 2018 •

edited

Loading