-
Notifications
You must be signed in to change notification settings - Fork 136
[question] Understanding how to add new servers #440
Comments
If you bootstrap using a configuration that already contains all the servers you plan to have in the cluster, you shouldn't raft_add those servers again -- their membership has already been recorded in the Raft log and committed as part of the bootstrap process. You can just raft_start those servers, and the leader (the server that called raft_bootstrap) will contact them to replicate log entries, including the one that describes the initial configuration. You should never call raft_bootstrap on more than one node in the cluster. An alternative approach, as used by dqlite for example, is to have the first server call raft_bootstrap with a configuration that contains only itself. Then as each other server starts up, you call raft_add on the bootstrap server to have it join the cluster. But you shouldn't mix this with the previous strategy. Does that make sense? I'm happy to answer follow-up questions. |
Or to keep it short...
Yes, this is what you should do on non-bootstrap nodes. If (and only if) the server was not part of the bootstrap configuration, you will have to raft_add it on the leader in order for it to participate in the cluster. |
Just a clarification: you can actually call raft_bootstrap() on more than one node. The code that @mdorier posted, where he calls raft_bootstrap() on all the servers part of the initial configuration is correct. |
@freeekanayaka Thanks -- I guess I mistakenly transferred the dqlite requirement to have only one bootstrap node to raft in my head. |
I was indeed calling it on more than one node. I think it works because they will all effectively write the same entry with the exact same configuration, so there will just be no need for the leader to send the configuration over to the followers. I think I understand better now. The pattern of starting with one server that bootstraps and adding servers one by one is what I'm trying to do. So I can just call |
This is correct.
This is also correct.
Yes, precisely
No, that would lead to an inconsistent cluster. What raft_bootstrap() does is to simply write the first entry in the log, which contains the initial configuration. All servers part of the initial configuration should call raft_bootstrap() with such configuration. If you call raft_bootstrap() on a new server and pass it a different configuration, then entry 1 of that server will differ from entry 1 on all other servers.
The new server will obtain the current list as part of log replication, since other servers will send it entry 1. |
Yes, you can do that too if you will, which is what @cole-miller described and what dqlite does. But it's not mandatory, as long as you follow the rules above (which are the standard Raft rules from the paper). |
FWIW, it's more an implementation detail than a requirement. In fact, we might want to support also the more-than-one-server-in-initial-configuration scenario in the future, and we wouldn't break any fundamental dqlite assumption. It's just there wasn't such a use case yet. |
Thanks to both of you for the clarification! Closing the issue. |
I'm going to re-open this issue as there are still some things I don't understand. I am implementing my own Following the discussion above I decided to try having just one process call On the leader (initial process), at start time:
The process prints the following:
On the joining process, at start time:
The process prints the following:
Then when the leader is informed of the new process, it calls
(the last line repeats several times, I think because it's an append entry call that's getting re-tried over and over even though the other process has crashed at this point) On the joining process, I see the following:
The log's
Since the only thing different between my current code and what I had before is that now only one process calls bootstrap and so the other processes (1) are not initially part of the cluster and (2) don't have anything in their log to begin with, and since the only function that's called on the log that gives information to RAFT is the Thanks for your help! |
I'm not sure when I'll have time to look at the details of this, maybe @cole-miller or @MathieuBordere will be able to do that. However, before spending too much time on this, may I ask why you need to implement your own Implementing Also, beware that I think the current |
That's pretty much the reason. We have HPC systems with specific network and storage hardware. I don't think it's that complicated, unfortunately the precise semantics of each function isn't very well documented. |
Oh I think I got it working. The Note: as I'm doing this, I'm writing down the exact semantics of each function of the raft_io structure (I have a bunch that already work right and are tested), including expectations about ownership of memory being passed to those functions. Once I get the full implementation working, I'll clean that up and send it to you guys (though if you plan to rework the |
Great!
I think that would be valuable regardless, thanks. I'm not entirely sure if and when v1 will see life. |
It's probably not too complicated to get something working, however it's non trivial to have it sport good performance and to have it behave 100% correctly under the most harsh conditions and pathological scenarios, something that our Jepsen-based dqlite test suite exercises. Anyway, if it's open source if could share pointers that would be great, we could include it in our README, for folks with similar needs. |
I'm trying to understand the procedure to add a new server to an existing cluster. Here is what I do to initialize the cluster, currently, on all its processes:
Now if I want to add a new server and convert it into a voter, I know I need to have the leader of the current cluster call
raft_add
with the ID and address of the new server (which I can easily communicate to the leader), then callraft_assign
to make it a voter.However what is not clear to me is what the new server should be doing. Should I just call
raft_init
thenraft_start
without callingraft_bootstrap
? Should I callraft_bootstrap
with a configuration that only includes the new server? Or should the new server obtain the current list of servers via some other ways and run the above code as well?The text was updated successfully, but these errors were encountered: