prb-wm – too many concurrent requests with many workers #1285

ngrewe · 2023-05-26T17:52:32Z

When the new prb-wm service handles more than ~250 workers with default settings, we start seeing the following kind of error:

stopped due to error: Rpc error: RPC error: Configured max number of request slots exceeded

This error comes from jsonrpcee and occurs because some client is build with a low (default is 256) limit of concurrent requests and the prb is utilising many of them. I suspect that it's this client here, but I'm not familiar with architecture to be sure:

https://github.com/Phala-Network/phala-blockchain/blob/8fe05eb72b76f4939d0c03e62a3fc7e58b260a5c/standalone/prb/src/datasource.rs#LL555C1-L555C85

We seem to have worked around this (somewhat) by increasing CACHE_SIZE, but it would be nice to have the number of concurrent connections configurable.

The text was updated successfully, but these errors were encountered:

PHA-SYSOPS · 2023-06-20T09:43:31Z

I can confirm the problem exists in serveral farms. The problem increases when you have more workers connected. I was looking for the specific code that triggers this, so good find @ngrewe ... i now have more to finally get devs moving on this.

PHA-SYSOPS · 2023-06-20T10:06:19Z

Oh i also noticed that using headerscache solves it ... BUT you must implement HC correctly (synced, good bind address not 127.0.0.1, etc) then it does limit the amount of load towards the node hitting this client slots issue. So to reproduce just remove HC from the config, only use 1 node as backend, with a WM having ~150 workers.

ngrewe · 2023-06-20T10:57:41Z

The thing about headers-cache is that the version in the latest docker image is a bit unreliable because it tends to also import unfinalized blocks, which causes sync issues. There's a fix for this in-tree, though... just not released.

jasl · 2023-06-21T07:55:14Z

The thing about headers-cache is that the version in the latest docker image is a bit unreliable because it tends to also import unfinalized blocks, which causes sync issues. There's a fix for this in-tree, though... just not released.

I make a release, could you help to try it?

just change the Docker image to

jasl123/phala-headers-cache:23062301
DIGEST:sha256:c0479365396092bf066095fb6ce606c693617d7f3fe585aa70b95692b675f82f

If everything good, I shall move it to phalanetwork org

ngrewe · 2023-06-24T19:23:06Z

I make a release, could you help to try it?

We'll deploy it in a test environment. I'll get back to you after I've had it soak test for a few days.

Nexus2k · 2023-08-12T13:43:56Z

Got the same issue, using 1 node and don't have header-cache, just a single archive node and 178 + 116 workers on two different PRBv3 wm's.
@krhougs can you refactor prb to reuse node connections for all managed workers instead of using a new connection each?

Nexus2k · 2023-08-20T18:49:34Z

Can someone maybe refactor the prb code so it just uses a small amount of RPC clients instead of several per worker?

jasl · 2023-09-19T07:24:40Z

Sorry for the late response but we're testing a workaround #1388

jasl123/phala-prb:23091801
DIGEST:sha256:9330afe8e474b1c709e9d4def0a04edfc94349d44726e15e4ec7490d15e770fd

Nexus2k · 2023-09-19T08:49:43Z

I've already patched my prb version to allow a higher connection count but that just overloads the node more. Please implement some connection pooling or other way that not every worker establishes a new node connection. Other PRBv3 users also mentioned it's more stable when using a header cache which is something that is not mentioned on the public mining wiki.

Nexus2k mentioned this issue Sep 18, 2023

PRBv3 stake changes not submitted to chain #1376

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prb-wm – too many concurrent requests with many workers #1285

prb-wm – too many concurrent requests with many workers #1285

ngrewe commented May 26, 2023 •

edited

Loading

PHA-SYSOPS commented Jun 20, 2023

PHA-SYSOPS commented Jun 20, 2023

ngrewe commented Jun 20, 2023

jasl commented Jun 21, 2023 •

edited

Loading

ngrewe commented Jun 24, 2023

Nexus2k commented Aug 12, 2023 •

edited

Loading

Nexus2k commented Aug 20, 2023

jasl commented Sep 19, 2023

Nexus2k commented Sep 19, 2023

prb-wm – too many concurrent requests with many workers #1285

prb-wm – too many concurrent requests with many workers #1285

Comments

ngrewe commented May 26, 2023 • edited Loading

PHA-SYSOPS commented Jun 20, 2023

PHA-SYSOPS commented Jun 20, 2023

ngrewe commented Jun 20, 2023

jasl commented Jun 21, 2023 • edited Loading

ngrewe commented Jun 24, 2023

Nexus2k commented Aug 12, 2023 • edited Loading

Nexus2k commented Aug 20, 2023

jasl commented Sep 19, 2023

Nexus2k commented Sep 19, 2023

ngrewe commented May 26, 2023 •

edited

Loading

jasl commented Jun 21, 2023 •

edited

Loading

Nexus2k commented Aug 12, 2023 •

edited

Loading