Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P state exchange #2373

Open
roman-khimov opened this issue Mar 2, 2021 · 3 comments
Open

P2P state exchange #2373

roman-khimov opened this issue Mar 2, 2021 · 3 comments
Labels
Discussion Initial issue state - proposed but not yet accepted

Comments

@roman-khimov
Copy link
Contributor

Summary or problem description
The need for P2P state exchange has been noted several times already, I'll just quickly recap the most important points:

  • as chain grows processing it from the genesis can become too resource-intensive task to perform for every node that wants to join the network
  • old (as in older than MaxTraceableBlocks) blocks can be dropped, making block-by-block synchronization less available

In order to bootstrap new full node we need to make its storage synchronized with other full nodes and if we're not processing the whole chain this includes:

  1. retrieving and processing all headers (not a problem already)
  2. synchronizing contract storage contents (everything state root hash includes)
  3. retrieving and saving MaxTraceableBlocks number of last blocks (already possible, but hi, Lower MaxValidUntilBlockIncrement limit #2026)

The most problematic is the second item that I'd like to discuss.

Do you have any solution you want to propose?
It's relatively easy to create a protocol for MPT traversal that will allow nodes to request/receive tree nodes and allow to synchronize this way. There also is a proposal to do this exchange via full state snapshots (#1286), but MPT traversal seems to be simpler than that if we're to suppose that every node has it. Unfortunately, that's not the case for C# implementation where nodes can have some state, but not have state root hash calculated over it and verified with SV signatures.

This can be solved with another node capability ("MPTNode", for example) so that the other nodes would know who can provide state data via P2P (the same is true if we're to go snapshot route), but one problem still remains and I don't see any clear solution for it yet.

The problem of state trust
neo-go has StateRootInHeader option that is a protocol extension (hi, #1526) allowing to put state root hash into the block header. This means that any node receiving this header also receives verified hash of chain state as of previous block, so if a new node processes headers from the genesis it reliably knows what state to request from its peers (be it MPT traversal or snapshot) and can verify data against this hash.

Absent this option we have state validators that are defined as a role in native contract. They sign state root messages defined as [version, index, hash]. These messages can only be verified if the node already has proper state, that is its RoleManagement native contract's storage contains keys of state validators. But we're talking about nodes that want to synchronize state via P2P, so they don't yet have it.

And they can't reliably get this data. The node that doesn't process the chain from the genesis only has protocol configuration and headers, neither configuration, nor headers contain SV keys. So we can't have P2P state exchange until we solve this problem.

There are several potential solutions to it:

  • implementing in-header state roots
  • adding initial state validators to protocol configuration, but then we'll also need to add PrevHash and NextValidators fields into state root messages making them form a proper secondary chain that would need to be synchronized along with header chain (further complicating configuration and synchronization)
  • signing state messages (or state snapshots) with CN keys (rendering SVs useless)

The choice is rather obvious for me, but I'm not sure any of these options are really welcome as solutions. So maybe there is something else we can do? Or maybe going one of these routes is fine? Maybe we don't need P2P state exchange at all? I'm really interested in opinions on this.

Other considerations
Note that although this is related to #2346, it's not a substitute for it. Archiving is still important even if P2P state exchange is implemented.

Neo Version

  • Neo 3

Where in the software does this update applies to?

  • Ledger
  • P2P (TCP)
@roman-khimov roman-khimov added the Discussion Initial issue state - proposed but not yet accepted label Mar 2, 2021
@ZhangTao1596
Copy link

ZhangTao1596 commented Mar 5, 2021

With p2p state exchange, we don't even need full storage when persist block. We can get the key-value we need via p2p when executing transaction script if the speed acceptable. I like this idea. I mentioned something similar before here #1526 (comment).
This requires that we can get state proof via p2p from other nodes. But StateService is only installed in a bit few nodes I think. Besides there is a FullState configuration in StateService which resulted in many nodes with StateService installed can only provide latest storage proof. We should optimize these shortness first. If in neo2, it will be easier.

@roman-khimov
Copy link
Contributor Author

roman-khimov commented Mar 5, 2021

We can get the key-value we need via p2p when executing transaction script if the speed acceptable.

Well, there are problems with this:

  • you have 15 seconds to process the block and you have a number of state DB accesses to make
  • you don't know which of your neighbors can answer state requests
  • you only know proper state root if you're lucky enough to catch "stateroot" message for the current block (hi, How can a node synchronize state root signatures over P2P? neo-modules#536)
  • and you still can't use state roots if you don't have RoleManagement contract state (to know current SV list), but you don't have state

So I'd rather do full state synchronization first.

Besides there is a FullState configuration in StateService which resulted in many nodes with StateService installed can only provide latest storage proof

That can easily be solved with requirement to store some "checkpoint" states every N (say, 50K) blocks, so that at current height X the node will still store two previous checkpoints even if it deletes old state data otherwise. So a new node joining the network would try to synchronize with the latest checkpoint and it would have at least N blocks of time to do that.

At the same time from what I see in recent discussions state roots we have in Neo 3 are just not usable for this purpose, so we either need to adapt them for it, or use some different approach, or not bother with state exchange at all.

@roman-khimov
Copy link
Contributor Author

I'll outline a scheme here that can work with in-header stateroots (and we'll implement it as an extension in NeoGo), although I'm not sure how to adapt it to SV-signed state data. This scheme implies that nodes always compute stateroot hash for their data and always check it against block headers.

So we want to synchronize state over the P2P protocol. This means that we should receive enough data to process new blocks and run transactions. But actual node state is a moving target because it changes with every block and regular Neo node only keeps the latest state in its DB. It also has MPT state data, but it can delete old trees with KeepOnlyLatestState option (thus keeping only the latest one). It at the same time can delete old (untraceable) blocks (RemoveUntraceableBlocks).

Implementing synchronization using KV DB directly (requesting paths/ranges) can be more network-efficient, but it requires either keeping an old KV DB along with the new one (which is not space-efficient and complicates logic) or creating a snapshot like in #1286 (which adds some load on CNs and needs some external storage or P2P propagation mechanism). Therefore it's considered to be impractical.

We have the same data in MPT though and by design it can easily store multiple versions of data (corresponding to different blocks). It at the same time is self-proving against known stateroot, so we'll use that as the basis for state exchange.

Synchronization can take some time, so consistent state data corresponding to some block has to be available on the network even though the network advances as new blocks get added. Some heights are declared special for this reason and known as state synchronization heights, nodes keep available state data in form of MPT for these heights for some time so that other nodes can get it.

Protocol changes

Proposed approach tries to make minimal changes to the protocol as well as node's logic.

Two new P2P messages are added:

  • getmptdata
    Which has the same payload as inventory message, that is a list of uint256 hashes. It's sent by the node trying to get state, starting with the state root hash and then descending down the tree. These are the nodes requested, but peer is allowed to reply with more than that in its mptdata message. One request can only contain 32 hashes, because we expect answers to go deeper and there is no need to request all hashes explicitly.

    It can technically be implemented as just another subtype of getdata, but since it doesn't use 1:1 request-reply mapping and it's not about inventories that are broadcasted through the network a separate message is more appropriate.

  • mptdata
    Node receiving getmptdata request should reply to it with mptdata message containing an array of serialized MPT nodes (similar to getproof RPC reply). It must start with the first node requested in getmptdata and then can contain its child and grandchild nodes up to ending leafs with the only requirement that the resulting message must fit into P2P size constraints. Node doesn't have to reply to all of the requested nodes, only the first one is mandatory (there might be too many lower nodes to fit the reply into one message). It is a protocol violation though to have nodes in mptdata message that can't be verified (that is that are not a part of the requested subtries).

    Message receiver must verify each node and add it into local trie. Hash nodes are maintained in a set of nodes to request with the next getmptdata message.

An additional protocol configuration option is also added, StateSyncInterval specified as the number of blocks between state heights available for synchronization. Particular value of this setting should take into account the time needed for complete synchronization (it should be well lower than SecondsPerBlock × StateSyncInterval) as well as MaxTraceableBlocks
setting. 20000-50000 seems to be a good choice for regular public networks with 15s inter-block time (about 40K blocks are produced weekly in this case).

Operation

All nodes maintain local MPT and calculate local state root hash. Some keep all stateroots, while some (KeepOnlyLatestState) keep the latest one plus two previous state synchronization points (for current height N that would be StateSyncInterval × (N ÷ StateSyncInterval) (known as P1) and StateSyncInterval × (N ÷ StateSyncInterval - 1) (known as P2). For it to work the chain height must be at least 2 × StateSyncInterval + 1 of course. Nodes with KeepOnlyLatestState setting on remove old state synchronization points as they become "unreachable" (when passing new synchronization heights). All nodes also keep old blocks down to P2 - MaxTraceableBlocks.

New node that wants to perform synchronization fetches all block headers first receiving signed stateroot data with the headers. It then creates a list of currently unknown MPT hashes, adds P1 stateroot hash in it and starts synchronizing state. On every iteration of this process it takes currently unknown hashes and requests them from its peers with getmptdata message. It receives some number of MPT nodes in reply, adds them to the local trie removing now known nodes from the request list and adding new unknown ones. When it receives leaf nodes it also adds them to the local KV DB. This process repeats until all state data for P1 is received.

If the node is interrupted during the process it must walk the local on-disk trie, rebuild the list of missing nodes and then start the process again.

It then (technically, it can be done in parallel to state sync) fetches MaxTraceableBlocks number of blocks preceeding P1 and saves them locally. Once this is done, the node has complete P1 state and is ready to process blocks normally, so starting with P1 + 1 it uses regular fetching/processing mechanism that brings is to the current chain state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Initial issue state - proposed but not yet accepted
Projects
None yet
Development

No branches or pull requests

2 participants