Broadcasting of invalid `voluntary_exit` messages to mesh peers #24

cortze · 2024-05-16T14:55:05Z

Description

We've seen that after 2 to 2.5 hours of running Hermes starts experiencing sudden spikes in the GRAFT and PRUNE events affecting all the topics.

Although we couldn't see any direct implication in the number of peers in each mesh, it is a clear concern that could point to a decreasing peerscore that could prevent us from establishing stable connections with other nodes on meshes.

Due to the lack of message validation on each PubSub topic, it is possible that our node is forwarding non-valid messages to our mesh nodes, decreasing our score.

This is something that has been already present at our control Prysm node, where erigon/caplin peers have been sending non-valid volintary_exits.

time="2024-05-16 12:35:01" level=debug msg="Gossip message was rejected" agent="erigon/caplin" error="non-active validator cannot exit" gossipScore=-6182.725625534806 multiaddress="/ip4/120.31.71.167/tcp/55742" peerID=16Uiu2HAkzNLy2S3voLw3CFxET1kXYSZVLV6QwkHuP3RaDdGJSk2E prefix=sync topic="/eth2/6a95a1a9/voluntary_exit/ssz_snappy"
time="2024-05-16 12:35:01" level=debug msg="Gossip message was rejected" agent="erigon/caplin" error="non-active validator cannot exit" gossipScore=-6182.725625534806 multiaddress="/ip4/120.31.71.167/tcp/55742" peerID=16Uiu2HAkzNLy2S3voLw3CFxET1kXYSZVLV6QwkHuP3RaDdGJSk2E prefix=sync topic="/eth2/6a95a1a9/voluntary_exit/ssz_snappy"

Possible Solution

Suggest to not subscribe to the voluntary_exists for now. The interest on debugging that particular topic is rather low, and seems to be isolated to only that one.

The text was updated successfully, but these errors were encountered:

yiannisbot · 2024-05-16T18:23:01Z

Great catch, which definitely deserves a deeper look! Two quick questions:

why would we see this behaviour only after 2-2.5hrs and not continuously? We suspect receipt of those invalid messages is a random event which happened to start after 2hrs of running our node/experiment?
given that meshes and peer scores are per topic, why would our node get PRUNE'd from topics other than the voluntary_exists one?

guillaumemichel · 2024-05-17T06:58:01Z

We can unsubscribe from the voluntary_exits for now as a quick fix 👍🏻

On the long run, could we copy the validation logic for this topic over to hermes as well?

cortze · 2024-05-17T07:30:05Z

replying to @yiannisbot

why would we see this behaviour only after 2-2.5hrs and not continuously? We suspect receipt of those invalid messages is a random event which happened to start after 2hrs of running our node/experiment?

Voluntary exits are messages with a rather short frequency, as they represent a validator sending their voluntary exit from the list of active validators. Thus, they are pretty sporadical.

given that meshes and peer scores are per topic, why would our node get PRUNE'd from topics other than the voluntary_exists one?

If your score gets too low, it can actually affect other topics as well ->

... The score is computed across all (configured) topics with a weighted mix, such that faulty behaviour in one topic percolates to other topics. ....
...
Heartbeat Maintenance
The score is checked explicitly during heartbeat maintenance such that:
- Peers with a negative score are pruned from all meshes.

to @guillaumemichel

We can unsubscribe from the voluntary_exits for now as a quick fix

I've applied the quick-fix for the night run I did locally. (fingers-crossed) If that improves the mesh connectivity, I'll add a quick patch and think of a more long-term solution.

I thought that we could easily fetch the list of active validators right at the start of the tool from our trusted Prysm node and then start judging whether the exit is valid or not to modify that list on the go 🤷🏽

cortze added a commit that referenced this issue May 21, 2024

unsubscribe to the Voluntary Exit topic to address #24

51bac9e

cortze mentioned this issue May 21, 2024

Temporary Fix: Unsubscribe to the voluntary_exits topic #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcasting of invalid `voluntary_exit` messages to mesh peers #24

Broadcasting of invalid `voluntary_exit` messages to mesh peers #24

cortze commented May 16, 2024

yiannisbot commented May 16, 2024

guillaumemichel commented May 17, 2024

cortze commented May 17, 2024

Broadcasting of invalid voluntary_exit messages to mesh peers #24

Broadcasting of invalid voluntary_exit messages to mesh peers #24

Comments

cortze commented May 16, 2024

Description

Possible Solution

yiannisbot commented May 16, 2024

guillaumemichel commented May 17, 2024

cortze commented May 17, 2024

Broadcasting of invalid `voluntary_exit` messages to mesh peers #24

Broadcasting of invalid `voluntary_exit` messages to mesh peers #24