Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving SQL_Delay host connection issues #279

Open
tomkrouper opened this issue Oct 25, 2016 · 1 comment
Open

Moving SQL_Delay host connection issues #279

tomkrouper opened this issue Oct 25, 2016 · 1 comment

Comments

@tomkrouper
Copy link
Contributor

I ran into an issue while moving a delayed host under a peer that was not delayed.

Command:

orchestrator -c relocate -i delayed.host.com -d peer.host.com

The relocate command stops replication on peer.host.com and runs a START SLAVE UNTIL... command on delayed.host.com. That completed and delayed.host.com (which is delayed for several hours) did wait, and did move below peer.host.com. However, peer.host.com did not restart replication. I assume this is due to a timeout to the connection.

I would recommend setting up the connection to retry if connection times out.

If I wanted to make a feature request, It also might make sense to allow a relocate to change SQL_Delay temporarily so that the host can catch up faster and be moved before a timeout even happens. That might need to be an optional item since the command really doesn't know why the host is delayed, and un-delaying it could cause issues.

Unfortunately I didn't have logging setup for this, so I can't be 100% sure my assessment is completely accurate. So read everything I said as, "This is how I saw it."

@shlomi-noach
Copy link
Contributor

START SLAVE UNTIL suggests the server was relocated via standard "move", i.e. by coordinating binlog positions -- whereas we would have expected it to relocate via pseudo-gtid.

When relocating via pseudo-gtid there is no problem at all with delayed replicas. That is, it takes longer to compute the coordinates from which they should replicate, because a more exhaustive search of binary logs is involved; but otherwise it isn't a big deal.

So the problem is: why did orchestrator choose to use "classic" move rather than pseudo-gtid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants