-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protocols/gossipsub: Custom interval grows and is not performant #2497
Comments
Checked our metrics again now that some hours have passed and gains are consistent |
Also to note, the interval code refers to |
Oh, surprising. Thanks for the debugging work and the detailed report @AgeManning and @divagant-martian!
Agreed.
If I am not mistaken we moved to support running as WASM in NodeJS. See #2143 for details. I am surprised that this causing issues now, given that https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/src/interval.rs is a port of I wonder whether we should port the recent version of tokio's interval implementation to rust-libp2p or even better contribute an @wngr @PhilippGackstatter @thomaseizinger any thoughts on this? |
just for completeness, the experiment code https://github.com/divagant-martian/interval-example |
Can't comment much on the issue itself, but out of those two, I think the latter is preferable from a compatibility point of view. The |
I agree in contributing if possible to this in the appropriate repository. Having a custom |
The current implementation as it stands has some serious problems (even though it's taken us a while to track it down). @divagant-martian has indicated that the For Lighthouse, we're probably going to a run a fork for the time being with the tokio implementation, and I suggest we use the It's probably worth looking for any other places where this implementation of |
## Issue Addressed Lighthouse gossiping late messages ## Proposed Changes Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected Upgrade libp2p ## Additional Info libp2p/rust-libp2p#2497
Removed the custom interval implementation and removes support for wasm32-unknown-unknown. See #2497 for details. Co-authored-by: Diva M <[email protected]> Co-authored-by: Max Inden <[email protected]>
#2506 is merged. I will leave this issue open to track support for Next step I think it is worth exploring contributing an I don't have the capacity / priority to own this any time soon. In case someone wants to run Backport
I am assuming, given that you already released a patch version of lighthouse (https://github.com/sigp/lighthouse/releases/tag/v2.1.3) that you don't need a backport of this patch? If you do need a backport, I am happy to release |
Yeah I dont think we need a backport at the moment. There's a few more small changes (metrics) to add to gossipsub, so happy to wait for next release cycle. :) |
Removed the custom interval implementation and removes support for wasm32-unknown-unknown. See libp2p/rust-libp2p#2497 for details. Co-authored-by: Diva M <[email protected]> Co-authored-by: Max Inden <[email protected]>
Should this issue be closed? |
Thanks @divagant-martian. |
@mxinden - Your input here might also be helpful.
After a bit of digging, we've found that the gossipsub heartbeat can be set to the order of 1s but grow to the order of many minutes. This heartbeat is crucial for gossipsub to maintain its peers and meshes as well as bound its memory cache and gossip messages on time.
A many-minute heartbeat can cause terrible conditions on the network and significantly grow memory use of gossipsub users.
We've tracked the issue down to the custom implementation of
Interval
in https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/src/interval.rs.As an example, we have three nodes running, in one (green one) I've replaced the custom
Interval
with atokio::time::Interval
. The heartbeat is supposed to be 700ms.Halfway through these graphs I've switched the green node to
tokio::time::Interval
. The heartbeat interval (time between heartbeats) then stays steady on 700ms whereas the other nodes have grown and continue to do so.It's also useful to note that the heartbeat duration (time taken to do a heartbeat) itself is about 3-4x faster with
tokio::time::Interval
than with the custom implementation we are currently using.After a very quick skim, I think the issue of the growing interval lies in this calcualtion: https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/src/interval.rs#L86
Where if we miss a interval the
fires_at
anddelay
can potentially become out of sync.It is useful to note that the heartbeat interval grows in multiples of the desired interval.
It would be nice to get the current implementation to work correctly and be as performant as the tokio version. Failing this, potentially we add a feature flag or something so that tokio users can benefit from the performance gains.
cc @divagant-martian @pawanjay176
The text was updated successfully, but these errors were encountered: