Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove extended Proposal payload before sending to remote signer (fixes buffer underflow) #240

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zarkone
Copy link

@zarkone zarkone commented Jul 18, 2024

This pull request addresses the problem of buffer underflow problem, which happens when tmkms signer is plugged over tcp to Sei.

Original issue, opened in tmkms repo iqlusioninc/tmkms#729

Even though the issue is opened in tmkms, I believe that fix should be applied in Sei.

What happens

This snippet illustrates the difference between Sei's Proposal and original Proposal structure.

// tendermint/cometbft proposal:
type Proposal struct {
	Type      SignedMsgType
	Height    int64
	Round     int32
	PolRound  int32
	BlockID   BlockID
	Timestamp time.Time
	Signature []byte
}
// vs sei-tendermint proposal
type Proposal struct {
	Type            SignedMsgType
	Height          int64
	Round           int32
	PolRound        int32
	BlockID         BlockID
	Timestamp       time.Time
	Signature       []byte

  // this is a list, and can be very long...
	TxKeys          []*TxKey
	Evidence        *EvidenceList
	LastCommit      *Commit
	Header          Header
	ProposerAddress []byte
}

As you can see, Tendermint protocol has a slight design flaw: Proposal is implicitly assumed to have a fixed length, but it doesn't declare it explicitly. Remote signer implementations assume that protobuf message, which is sent over the wire is always less than 1024 bytes.

And this is not an issue for networks where Proposal holds the original structure, but not for Sei. When Sei accumulates a certain amount of TxKeys, Proposal becomes bigger than 1024 bytes. When remote signer reads it, it reads 1024 bytes and tries to unmarshal it it fails with buffer underflow.

Why not to fix it in remote signer?

Yes, at first glance, it feels that remote signer can just aggregate multiple chunks and unmarshal it once it gets the all the chunks. Why not do just do that?

The reason why it is not possible (at least, in efficient way) is hidden in another issue with old Tendermint (and Sei) tcp signer implementation.

When protobuf message is being send over the wire, it suppose to have a length encoded in the beginning of the message. And Sei puts the length, however, it is not the length of the whole message -- this is the length of a chunk.

See here:

if dataMaxSize < len(data) {
chunk = data[:dataMaxSize]
data = data[dataMaxSize:]
} else {
chunk = data
data = nil
}
chunkLength := len(chunk)
binary.LittleEndian.PutUint32(frame, uint32(chunkLength))
copy(frame[dataLenSize:], chunk)

So instead of sending length in a first chunk, it sends a length of each chunk in each chunk, which makes it impossible to aggregate it on wire level. It becomes possible on protobuf level, in a very ugly way -- see the implementation here iqlusioninc/tmkms#903 .

In short, I've made a loop which aggregates chunks (with removed length) and tries to unmarshall it on each new chunk. Which means, if Proposal is e.g. 7kb it will try to unmarshal it 7 times: this solution is pretty fragile.

What happens in this PR

While I was debugging the issue, I've noticed that even though tmkms gets Sei's Proposal, since it doesn't have Sei's .proto files it uses original Tendermint .proto to unmrashal the Proposal. So it signs the original proposal, or, "canonical", if you will, but not the extended Sei's Proposal. However, it works fine on Sei Testnet without any issues. I dig further and found that local signer also uses stripped down Proposal, doing CanonicalizeProposal before signing (see privval/file.go#L445).

Note

I don't understand why only part of proposal is being signed -- maybe you could point me where to read about this, how does this work?

In any case, I figured that since Sei accepts short Proposal we can just remove additional fields. In this PR, I do it right before sending bytes because of the nature of protobuf: I don't see a sane way of how to send a different proposal without changing all the types in function definitions.

If you know a better way how to do that please let me know: for now, you can treat this PR as an expression of intention of what I want to happen, rather then complete implementation.

Misc

The described problem of chunk length instead of msg length is out of scope here. As far as I can see, it is the case for cometBFT as well, since the code looks the same: cometbft/p2p/conn/secret_connection.go#L221 -- however, I didn't try to break it, it is hard to do without an extended Proposal.

As for tmkms, even though I have a PR with workaround open, it is rather a hotfix for us and other validators.

The length problem should be addressed on a wire level, in tendermint-p2p: it doesn't do any aggregation and reads only one chunk now (see tendermint-rs/p2p/secret_connection.rs#L618). But it makes sense to fix only after it will be fixed in producers.

How to test?

Testing could be simple or could be tricky, depending on what tooling you already have. In order to see failure, you just need to plug Sei to tmkms over TCP. You need to have bigger transactions, so that TxKeys will become big enough to cause overflow. You can see observe it on testnet without any efforts, since oracle-feeder puts 30-50 txs almost in every block.

Locally, you can send a lot of transactions manually, something like

    ~/go/bin/seid tx bank send $admin_0 $admin_1 1usei --fees 4000usei -y &
    ~/go/bin/seid tx bank send $admin_1 $admin_2 1usei --fees 4000usei -y &
    ~/go/bin/seid tx bank send $admin_2 $admin_3 1usei --fees 4000usei -y &
    ~/go/bin/seid tx bank send $admin_3 $admin_4 1usei --fees 4000usei -y &
    # etc...

-- this works for me.

So you can check that without patch tmkms (and Sei) keep failing with buffer overflow, and with patch it works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant