Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocols/gossipsub: Custom interval grows and is not performant #2497

Closed
AgeManning opened this issue Feb 9, 2022 · 11 comments
Closed

protocols/gossipsub: Custom interval grows and is not performant #2497

AgeManning opened this issue Feb 9, 2022 · 11 comments

Comments

@AgeManning
Copy link
Contributor

@mxinden - Your input here might also be helpful.

After a bit of digging, we've found that the gossipsub heartbeat can be set to the order of 1s but grow to the order of many minutes. This heartbeat is crucial for gossipsub to maintain its peers and meshes as well as bound its memory cache and gossip messages on time.

A many-minute heartbeat can cause terrible conditions on the network and significantly grow memory use of gossipsub users.

We've tracked the issue down to the custom implementation of Interval in https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/src/interval.rs.

As an example, we have three nodes running, in one (green one) I've replaced the custom Interval with a tokio::time::Interval. The heartbeat is supposed to be 700ms.

allg

Halfway through these graphs I've switched the green node to tokio::time::Interval. The heartbeat interval (time between heartbeats) then stays steady on 700ms whereas the other nodes have grown and continue to do so.
It's also useful to note that the heartbeat duration (time taken to do a heartbeat) itself is about 3-4x faster with tokio::time::Interval than with the custom implementation we are currently using.

After a very quick skim, I think the issue of the growing interval lies in this calcualtion: https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/src/interval.rs#L86

Where if we miss a interval the fires_at and delay can potentially become out of sync.

It is useful to note that the heartbeat interval grows in multiples of the desired interval.

It would be nice to get the current implementation to work correctly and be as performant as the tokio version. Failing this, potentially we add a feature flag or something so that tokio users can benefit from the performance gains.

cc @divagant-martian @pawanjay176

@divagant-martian
Copy link
Contributor

Checked our metrics again now that some hours have passed and gains are consistent
image
The step increases are what @AgeManning mentions as increases by multiples of the heartbeat duration

@divagant-martian
Copy link
Contributor

Also to note, the interval code refers to wasm-timer as the original source. I wrote a small code that uses an Interval and adds a disruption, calculates de moving avg of the last 5 ticks. wasm-timer maintains the expected rolling average, the current implementation grows. Is there a reason to avoid using wasm-timer directly? (If any interest in the code arises I can upload it somewhere)

@mxinden
Copy link
Member

mxinden commented Feb 9, 2022

Oh, surprising. Thanks for the debugging work and the detailed report @AgeManning and @divagant-martian!

It would be nice to get the current implementation to work correctly and be as performant as the tokio version.

Agreed.

Is there a reason to avoid using wasm-timer directly?

If I am not mistaken we moved to support running as WASM in NodeJS. See #2143 for details.

I am surprised that this causing issues now, given that https://github.com/libp2p/rust-libp2p/blob/master/protocols/gossipsub/src/interval.rs is a port of wasm-timer interval.

I wonder whether we should port the recent version of tokio's interval implementation to rust-libp2p or even better contribute an Interval implementation to futures-timer upstream (see async-rs/futures-timer#60).

@wngr @PhilippGackstatter @thomaseizinger any thoughts on this?

@divagant-martian
Copy link
Contributor

just for completeness, the experiment code https://github.com/divagant-martian/interval-example

@PhilippGackstatter
Copy link
Contributor

port the recent version of tokio's interval implementation [...] or even better contribute an Interval implementation to futures-timer upstream

Can't comment much on the issue itself, but out of those two, I think the latter is preferable from a compatibility point of view. The futures-timer implementation could hopefully use gloo_timers::future::IntervalStream which has good Wasm compatibility (including node.js).

@divagant-martian
Copy link
Contributor

I agree in contributing if possible to this in the appropriate repository. Having a custom Interval implementation to give support to one target inside gossipsub puts the burden of maintenance on libp2p and particularly in libp2p-gossipsub, when it really is out of scope

@AgeManning
Copy link
Contributor Author

The current implementation as it stands has some serious problems (even though it's taken us a while to track it down).

@divagant-martian has indicated that the wasm-timer implementation doesn't grow the heartbeat interval and is therefore useable, however is not as performant as the tokio implementation.

For Lighthouse, we're probably going to a run a fork for the time being with the tokio implementation, and I suggest we use the wasm-timer implementation for gossipsub until a solution is found that can also support WASM in NodeJS.

It's probably worth looking for any other places where this implementation of Interval is used modifying it to wasm-timer (if the custom implementation was made for other parts of the codebase).

bors bot pushed a commit to sigp/lighthouse that referenced this issue Feb 10, 2022
## Issue Addressed
Lighthouse gossiping late messages

## Proposed Changes
Point LH to our fork using tokio interval, which 1) works as expected 2) is more performant than the previous version that actually worked as expected
Upgrade libp2p 

## Additional Info
libp2p/rust-libp2p#2497
mxinden added a commit that referenced this issue Feb 14, 2022
Removed the custom interval implementation and removes support for
wasm32-unknown-unknown. See #2497
for details.

Co-authored-by: Diva M <[email protected]>
Co-authored-by: Max Inden <[email protected]>
@mxinden
Copy link
Member

mxinden commented Feb 14, 2022

#2506 is merged. I will leave this issue open to track support for wasm-unknown-unknown for libp2p-gossipsub.

Next step

I think it is worth exploring contributing an Interval implementation to futures-timer (async-rs/futures-timer#60) using gloo_timers on WASM as suggested by @PhilippGackstatter above.

I don't have the capacity / priority to own this any time soon. In case someone wants to run libp2p-gossipsub on wasm-unknown-unknown and wants to pick this up, I am happy to help.

Backport

For Lighthouse, we're probably going to a run a fork for the time being with the tokio implementation, and I suggest we use the wasm-timer implementation for gossipsub until a solution is found that can also support WASM in NodeJS.

I am assuming, given that you already released a patch version of lighthouse (https://github.com/sigp/lighthouse/releases/tag/v2.1.3) that you don't need a backport of this patch? If you do need a backport, I am happy to release libp2p v0.42.3.

@AgeManning
Copy link
Contributor Author

Yeah I dont think we need a backport at the moment. There's a few more small changes (metrics) to add to gossipsub, so happy to wait for next release cycle. :)

santos227 added a commit to santos227/rustlib that referenced this issue Jun 20, 2022
Removed the custom interval implementation and removes support for
wasm32-unknown-unknown. See libp2p/rust-libp2p#2497
for details.

Co-authored-by: Diva M <[email protected]>
Co-authored-by: Max Inden <[email protected]>
@divagant-martian
Copy link
Contributor

Should this issue be closed?

@mxinden
Copy link
Member

mxinden commented May 28, 2023

Thanks @divagant-martian.

@mxinden mxinden closed this as completed May 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants