You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The relocate command stops replication on peer.host.com and runs a START SLAVE UNTIL... command on delayed.host.com. That completed and delayed.host.com (which is delayed for several hours) did wait, and did move below peer.host.com. However, peer.host.com did not restart replication. I assume this is due to a timeout to the connection.
I would recommend setting up the connection to retry if connection times out.
If I wanted to make a feature request, It also might make sense to allow a relocate to change SQL_Delay temporarily so that the host can catch up faster and be moved before a timeout even happens. That might need to be an optional item since the command really doesn't know why the host is delayed, and un-delaying it could cause issues.
Unfortunately I didn't have logging setup for this, so I can't be 100% sure my assessment is completely accurate. So read everything I said as, "This is how I saw it."
The text was updated successfully, but these errors were encountered:
START SLAVE UNTIL suggests the server was relocated via standard "move", i.e. by coordinating binlog positions -- whereas we would have expected it to relocate via pseudo-gtid.
When relocating via pseudo-gtid there is no problem at all with delayed replicas. That is, it takes longer to compute the coordinates from which they should replicate, because a more exhaustive search of binary logs is involved; but otherwise it isn't a big deal.
So the problem is: why did orchestrator choose to use "classic" move rather than pseudo-gtid.
I ran into an issue while moving a delayed host under a peer that was not delayed.
Command:
The relocate command stops replication on
peer.host.com
and runs aSTART SLAVE UNTIL...
command ondelayed.host.com
. That completed anddelayed.host.com
(which is delayed for several hours) did wait, and did move belowpeer.host.com
. However,peer.host.com
did not restart replication. I assume this is due to a timeout to the connection.I would recommend setting up the connection to retry if connection times out.
If I wanted to make a feature request, It also might make sense to allow a relocate to change
SQL_Delay
temporarily so that the host can catch up faster and be moved before a timeout even happens. That might need to be an optional item since the command really doesn't know why the host is delayed, and un-delaying it could cause issues.Unfortunately I didn't have logging setup for this, so I can't be 100% sure my assessment is completely accurate. So read everything I said as, "This is how I saw it."
The text was updated successfully, but these errors were encountered: