Compared performance to gRPC with heavy payloads #674

rchoffardet · 2024-04-22T20:56:42Z

Hey 👋🏻

Firstly, thanks for your invested efforts and the openly publishing of your library.
This "issue" isn't really an issue, it's more like a conversation :)

My team and I are interested in alternative to gRPC such as your library and with the claimed performance improvement, it looked very promising.

However, I played a bit with your benchmark and found that with heavier load (10kB and 100kB) Slt.Fusion performs worse than gRPC on my machine (the payload is just a constant random string) whereas it's a known weakness of gRPC.

Stl.Rpc:
Light : 354.64K 400.21K 405.08K 394.14K -> 405.08K calls/s
Medium : 52.71K 53.84K 52.92K 52.75K -> 53.84K calls/s
Heavy : 5.98K 6.04K 6.19K 5.95K -> 6.19K calls/s
gRPC:
Light : 138.20K 195.26K 226.40K 227.11K -> 227.11K calls/s
Medium : 57.93K 60.00K 60.61K 60.96K -> 60.96K calls/s
Heavy : 7.03K 7.11K 7.02K 7.20K -> 7.20K calls/s

Light is 1kB, Medium is 10kB and Heavy is 100kB.

Do you observe similar relative numbers? Am I doing something wrong?

Regards

alexyakunin · 2024-08-17T23:57:45Z

@rchoffardet hi, sorry, I just read this comment. Couple things:

The most recent version of Fusion & ActualLab.Rpc (ex-Stl.Rpc) is here: https://github.com/ActualLab/Fusion
I've made a number of improvements in ActualLab.Rpc from Dec 2023 (that's when I switched to a new repo), so probably it makes sense to repeat the test with the newest version. Though as far as I remember, none of them were targeting large payload case specifically.

A few questions I'd like to ask:

Which serializer did you use? MemoryPack or MessagePack? The default is currently MemoryPack.
If it's MemoryPack, what string encoding is used? The default is UTF8, i.e. large string payload means transcoding.
Did you test this over network or locally? I.e. can network bandwidth be a bottleneck here?
Do you use WebSocket per-message compression (make sure it's off)?
Is it over a single WebSocket channel or over a multiple ones?

I'd do a couple extra things:

Profile & see what % of time is spent in UTF8 encoding/decoding
Switch to byte[] payload to rule out encoding throughput.
Similarly, try MemoryPack w/ UTF16 encoding (= no encoding for strings).

Overall, it makes sense to find out what's the bottleneck in both these cases. It's easy with ActualLab.Rpc - all you need is to profile your test. I'd bet that if your client & server are on the same machine, most likely it's UTF8 transcoding, and probably gRPC does it slightly faster. This doesn't tell it's faster in general, of course.

The larger are your messages, the less RPC library can do to efficiently transmit them, assuming we're talking about the real networking & same-size wire payload per message. I.e. the larger is payload, the less noticeable is the overhead RPC library adds + the more likely that your network connection is the bottleneck.

And if we're talking about really large payload, more likely than not you'll end up splitting it into chunks to process it sooner, implement resume on disconnect, etc. - i.e. more likely than not you'll end up streaming it with RpcStream<T> rather than pushing it all-at-once. That's why I also tested byte[] streaming in my tests.

Fusion also "pushes" you to implement APIs producing small or medium-size payloads. The more precise your invalidations + the less payload you have to re-send once it happens, the better.

So large payload tests are important only to some extent.

alexyakunin · 2024-08-17T23:59:17Z

@rchoffardet if you'd end up continuing the investigation, please create a similar issue in https://github.com/ActualLab/Fusion repo ; and I'll be happy to help.

alexyakunin · 2024-08-26T09:40:40Z

@rchoffardet just updated https://github.com/ActualLab/Fusion.Samples to the latest Fusion version, and even though the RPC perf there seem to be even higher now, I confirm that it degrades with larger stream items (compared to gRPC & SignalR) - i.e. it's noticeably faster on 100-byte items, but slightly slower on 10KB items.

I'll investigate.

alexyakunin · 2024-08-27T05:37:56Z

@rchoffardet I investigated what's going on - overall, gRPC seem to have less memory copying while handling ByteString values. And even though ActualLab.Rpc & SignalR do a decent job, they still copy byte arrays at least one extra time. That's partially because the serializer is replaceable in both libraries, so e.g. ActualLab RPC produces an intermediate blob to make it work.

And if you think about the test that streams large byte arrays locally, copying is ~ all that matters there.

Real-life scenario is very different: NIC bandwidth or some other IO is way more likely to be a bottleneck in this case.

I'll probably try to go a bit further in terms of optimizing ActualLab.Rpc specifically for this scenario, but overall, I don't see a huge value in doing this: it's on par with SignalR on large byte arrays, and IMO the scenarios with small & medium-sized items are much more important.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compared performance to gRPC with heavy payloads #674

Compared performance to gRPC with heavy payloads #674

rchoffardet commented Apr 22, 2024

alexyakunin commented Aug 17, 2024 •

edited

Loading

alexyakunin commented Aug 17, 2024

alexyakunin commented Aug 26, 2024 •

edited

Loading

alexyakunin commented Aug 27, 2024 •

edited

Loading

Compared performance to gRPC with heavy payloads #674

Compared performance to gRPC with heavy payloads #674

Comments

rchoffardet commented Apr 22, 2024

alexyakunin commented Aug 17, 2024 • edited Loading

alexyakunin commented Aug 17, 2024

alexyakunin commented Aug 26, 2024 • edited Loading

alexyakunin commented Aug 27, 2024 • edited Loading

alexyakunin commented Aug 17, 2024 •

edited

Loading

alexyakunin commented Aug 26, 2024 •

edited

Loading

alexyakunin commented Aug 27, 2024 •

edited

Loading