chain: use batch rpc client in mempool poller #888

yyforyongyu · 2023-09-20T17:55:36Z

This PR replaces the normal client with a batch client created from rpcclient.NewBatch. All the getrawtransaction requests are now batched with a default size of 10,000, and the batch will wait for 1 second before attempting the next one.

This change is made so less pressure is placed on bitcoind as when the mempool is full, there will be loads of getrawtransaction calls made, resulting in a slow response for other RPC calls such as getblockchaininfo, causing issues in other systems.

The mod files are also updated and a similar check is added here(copied from lnd)

guggero

Did a first pass, looks really good! Hope this'll quite significantly reduce the load on bitcoind.

guggero · 2023-09-21T07:46:43Z

chain/mempool.go

@@ -347,12 +376,21 @@ func (m *mempool) UpdateMempoolTxes(txids []*chainhash.Hash) []*wire.MsgTx {
 	// Set all mempool txs to false.
 	m.UnmarkAll()

+	// TODO(yy): let the OS manage the number of goroutines?


I think it definitely makes sense to limit the number of goroutines here, especially if they end up creating RPC requests. Even if the requests are async and a call here would just create an entry in the queue, it probably still makes sense to limit the number of goroutines to prevent clogging on slower systems.
Or what's the reason for the TODO? Do we feel like this number isn't high enough currently?

The RPC requests are already processed sequentially in btcd tho, so it won't have any effect here on that front. Unless we are concerning about the overhead of creating and destroying goroutines, I think it's better to let the runtime scheduler to handle them. Also GOMAXPROCS is already default to runtime.NumCPU

it probably still makes sense to limit the number of goroutines to prevent clogging on slower systems.

Have we experienced this before?

Have we experienced this before?

Yes, at least I think that in certain situations (e.g. during itests) when we ran too many things in parallel, some goroutines would just be starved (didn't get any CPU time). Not sure if that gets worse with the number of goroutines or just with the number of processes running on a system.

You mean when using make itest-parallel? I guess it's bypassing go's scheduler since we are parallelizing it in shell?

Yes. So when the CPU is already struggling, goroutines can get starved (at least it looked like it). But I'm not sure if that gets worse with the number of goroutines there are in total or whether that doesn't have any direct impact. Just some thoughts/observations to possibly help the decision here.

Oli is right. Since goroutines are scheduled like anything else, it's possible to have them starve if there are too many context switches going on. I'm not sure if that applies here if the # of goroutines is minimal.
For more info about goroutine scheduling, see the goroutine section here: https://dave.cheney.net/2015/08/08/performance-without-the-event-loop

chain/mempool.go

chain/mempool_test.go

guggero · 2023-09-21T08:01:14Z

wallet/txauthor/go.mod

@@ -7,4 +7,13 @@ require (
 	github.com/btcsuite/btcd/btcutil v1.1.1
 	github.com/btcsuite/btcwallet/wallet/txrules v1.2.0
 	github.com/btcsuite/btcwallet/wallet/txsizes v1.2.3
+	github.com/fsnotify/fsnotify v1.5.4 // indirect


Reminder to push new tags for all those submodules after merging the PR.

guggero

Very nice, looks very good!
Just a bunch of questions/comments.

chain/mempool.go

guggero · 2023-09-22T10:54:42Z

chain/mempool.go

-	var eg errgroup.Group
-	eg.SetLimit(runtime.NumCPU())
+	// newTxids stores a list of unseen txids found in the mempool.
+	newTxids := make([]*chainhash.Hash, 0)


nit: use var newTxids []*chainhash.Hash if we're not pre-allocating with a non-zero value.

hmm feel like I've had this conversation with laolu before but now I cannot recall the details😂 But why is this the preferred way again?

I guess it's just visually more apparent that there is no pre-allocation/initialization. It's easier to miss the 0 IMO and the make() would subconsciously register as "oh, we're allocating something here" when scanning the code quickly.
But it's really just nit-picking at this point. But I guess it's nice if all code uses the same pattern.

cool yeah let's stick to this pattern then

Ah, also, I guess when it comes to unit tests and deep equality tests, there is a difference between var newTxids []*chainhash.Hash and newTxids := make([]*chainhash.Hash, 0): The first will give you a nil slice, while the second will give you an empty []*chainhash.Hash{} slice. So if that was a "decode" function, we'd get a diff on such a field from before encoding and after, if there were no elements present.

hmm so I had to revert the change to fix the linter,

Error: chain/mempool.go:411:2: Consider preallocating `newTxids` (prealloc) var newTxids []*chainhash.Hash

Hmm, yeah, I see that the linter isn't very helpful here... Can we make any guess at how many new TXIDs we might get, so we don't have to pre-allocate with 0 (which silences the linter but doesn't really address the linter's underlying concern). Maybe something like len(newTxids) - len(m.txs) (with proper mutex lock and underflow checks of course). Not sure if that makes a big difference, even if this is a quite performance sensitive part of the code. So feel free to ignore.

chain/mempool.go

chain/mempool_test.go

guggero · 2023-09-22T11:03:35Z

wallet/txauthor/go.sum

@@ -22,8 +22,6 @@ github.com/btcsuite/btclog v0.0.0-20170628155309-84c8d2346e9f/go.mod h1:TdznJufo
 github.com/btcsuite/btcutil v0.0.0-20190425235716-9e5f4b9a998d/go.mod h1:+5NJ2+qvTyV9exUAL/rxXi3DcLg2Ts+ymUAY5y4NvMg=
 github.com/btcsuite/btcwallet/wallet/txrules v1.2.0 h1:BtEN5Empw62/RVnZ0VcJaVtVlBijnLlJY+dwjAye2Bg=
 github.com/btcsuite/btcwallet/wallet/txrules v1.2.0/go.mod h1:AtkqiL7ccKWxuLYtZm8Bu8G6q82w4yIZdgq6riy60z0=
-github.com/btcsuite/btcwallet/wallet/txsizes v1.2.2 h1:vtfutRuUoNsdgmKgYShuUgmBawNeC+lWu9jeucWkAII=


What version of Go did you run this on? I'm getting a diff in wallet/{txauthor,txrules,txsizes} with Go 1.18 or later.

mine is go version go1.20.5 darwin/arm64. What are the diffs here? I think maybe it's due to my local go work setup, that it also makes sure this repo is in sync with lnd.

Here's the diff I get when running go mod tidy:
diff.txt

yeah very weird, for instance this line

github.com/golang/protobuf v1.5.2/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY=

looks like this is in sync with lnd's mod which also has this version.

guggero

LGTM, pending resolution of the go mod tidy diff.

guggero · 2023-09-26T16:45:35Z

wallet/txauthor/go.sum

@@ -22,8 +22,6 @@ github.com/btcsuite/btclog v0.0.0-20170628155309-84c8d2346e9f/go.mod h1:TdznJufo
 github.com/btcsuite/btcutil v0.0.0-20190425235716-9e5f4b9a998d/go.mod h1:+5NJ2+qvTyV9exUAL/rxXi3DcLg2Ts+ymUAY5y4NvMg=
 github.com/btcsuite/btcwallet/wallet/txrules v1.2.0 h1:BtEN5Empw62/RVnZ0VcJaVtVlBijnLlJY+dwjAye2Bg=
 github.com/btcsuite/btcwallet/wallet/txrules v1.2.0/go.mod h1:AtkqiL7ccKWxuLYtZm8Bu8G6q82w4yIZdgq6riy60z0=
-github.com/btcsuite/btcwallet/wallet/txsizes v1.2.2 h1:vtfutRuUoNsdgmKgYShuUgmBawNeC+lWu9jeucWkAII=


Here's the diff I get when running go mod tidy:
diff.txt

Roasbeef · 2023-09-27T22:01:08Z

cc @ellemouton

saubyk · 2023-10-05T15:34:06Z

cc @Crypt-iQ for review. thanks

Crypt-iQ · 2023-10-05T16:39:48Z

Needs a rebase since the gettxspendingprevout change landed

This commit adds a quit signal such that mempool can be torn down quickly once the upper layer decides to shut down.

Crypt-iQ · 2023-10-05T16:18:45Z

chain/mempool.go

+
+	// getRawTxBatchSize specifies the number of requests to be batched
+	// before sending them to the bitcoind client.
+	getRawTxBatchSize = 10_000


I think this should be lowered. The maximum http response body that bitcoind will send back is 33,554,432 bytes. It's unlikely but possible that 10,000 transactions each ~3.3K bytes are requested and cause the batch to fail. Additionally, I think this might cause other goroutines calling bitcoind to have to wait a long time for the batch to finish

Decreased by 10x. Not sure if the 32MB body size limit also applies for batch rpc calls?

Additionally, I think this might cause other goroutines calling bitcoind to have to wait a long time for the batch to finish

Probably. I assume the lookup will be fast in bitcoind, sth like log(N) if indexed. And because we are sending 10000x fewer requests here, it will actually make bitcoind faster to respond because of the overhead saved.

I think any gain would come from less HTTP headers. bitcoind will still call the same function N times and still acquire the main cs_main lock N times. I'd be curious to benchmark this in a reproducible way and see where any gain comes from, but not sure how feasible that is.

The 32MB body size applies for all incoming HTTP requests to bitcoind, it's set here: https://github.com/bitcoin/bitcoin/blob/744157ef1a0b61ceb714cc27c9ae158907aecdc9/src/httpserver.cpp#L457

Crypt-iQ · 2023-10-09T15:36:20Z

chain/mempool.go

@@ -347,12 +376,21 @@ func (m *mempool) UpdateMempoolTxes(txids []*chainhash.Hash) []*wire.MsgTx {
 	// Set all mempool txs to false.
 	m.UnmarkAll()

+	// TODO(yy): let the OS manage the number of goroutines?


Oli is right. Since goroutines are scheduled like anything else, it's possible to have them starve if there are too many context switches going on. I'm not sure if that applies here if the # of goroutines is minimal.
For more info about goroutine scheduling, see the goroutine section here: https://dave.cheney.net/2015/08/08/performance-without-the-event-loop

Crypt-iQ · 2023-10-09T15:36:43Z

chain/bitcoind_zmq_events.go

@@ -135,13 +135,12 @@ func newBitcoindZMQEvents(cfg *ZMQConfig,

 	zmqEvents := &bitcoindZMQEvents{
 		cfg:           cfg,
-		client:        client,


client needs to be set here so that gettxspendingprevout can be called

Crypt-iQ · 2023-10-09T15:37:25Z

chain/bitcoind_events.go


 // NewBitcoindEventSubscriber initialises a new BitcoinEvents object impl
 // depending on the config passed.
 func NewBitcoindEventSubscriber(cfg *BitcoindConfig,
-	client *rpcclient.Client) (BitcoindEvents, error) {
+	client, batchClient *rpcclient.Client) (BitcoindEvents, error) {


nit: rename batchClient to not match the interface name?

Crypt-iQ · 2023-10-09T16:02:55Z

chain/bitcoind_rpc_events.go

+	if cfg.RPCBatchInterval == 0 {
+		mCfg.batchWaitInterval = DefaultBatchWaitInterval
+	}
+


The getRawTxBatchSize & batchWaitInterval members of mCfg don't get set if a non-default value is used

whoops fixed

Crypt-iQ · 2023-10-09T16:03:03Z

chain/bitcoind_zmq_events.go

+
+	if cfg.RPCBatchInterval == 0 {
+		mCfg.batchWaitInterval = DefaultBatchWaitInterval
+	}


Same as in the RPC case

Crypt-iQ

LGTM besides nit

Crypt-iQ · 2023-10-11T12:56:06Z

chain/bitcoind_events.go


 // NewBitcoindEventSubscriber initialises a new BitcoinEvents object impl
 // depending on the config passed.
 func NewBitcoindEventSubscriber(cfg *BitcoindConfig,
-	client *rpcclient.Client) (BitcoindEvents, error) {
+	client, bClient *rpcclient.Client) (BitcoindEvents, error) {


nit: I think client & bClient should be defined with their interfaces rather than both being *rpcclient.Client which is a little confusing. Alternatively, bClient batchClient works.

updated the method signature - yeah I also think it's a bit confusing and prone to errors, and think this needs to be changed in btcd so we can have a rpcclient.BatchClient.

Crypt-iQ · 2023-10-11T12:56:27Z

chain/bitcoind_rpc_events.go

@@ -69,7 +77,7 @@ var _ BitcoindEvents = (*bitcoindRPCPollingEvents)(nil)
 // newBitcoindRPCPollingEvents instantiates a new bitcoindRPCPollingEvents
 // object.
 func newBitcoindRPCPollingEvents(cfg *PollingConfig,
-	client *rpcclient.Client) *bitcoindRPCPollingEvents {
+	client, bClient *rpcclient.Client) *bitcoindRPCPollingEvents {


nit: same thing with the type of bClient

This commit replaces the normal client with a batch client created from `rpcclient.NewBatch`. All the `getrawtransaction` requests are now batched with a default size of 10,000, and the batch will wait for 1 second before attempting the next one. This change is made so less pressure is made on bitcoind as when the mempool is full, there will be loads of `getrawtransaction` calls made, resulting in a slow response for other RPC calls such as `getblockchaininfo`, causing issues in other systems.

Running the following commands, ``` go work sync make tidy-module ```

This commit modifies the method `getRawTxIgnoreErr` to log more useful info. It also removes `*mempool` as the method reciever as it's unnecessary to put it on mempool.

chain: use batch rpc client in mempool poller

yyforyongyu force-pushed the batch-client-mempool branch 4 times, most recently from 8d4880b to bfb8a83 Compare September 20, 2023 18:10

yyforyongyu mentioned this pull request Sep 20, 2023

gomod: update btcwallet version lightningnetwork/lnd#8019

Merged

guggero reviewed Sep 21, 2023

View reviewed changes

yyforyongyu force-pushed the batch-client-mempool branch 2 times, most recently from a521070 to 48cdb29 Compare September 21, 2023 13:41

guggero self-requested a review September 21, 2023 15:21

guggero reviewed Sep 22, 2023

View reviewed changes

yyforyongyu force-pushed the batch-client-mempool branch 2 times, most recently from 177b056 to 93ab1f8 Compare September 26, 2023 16:01

guggero approved these changes Sep 26, 2023

View reviewed changes

yyforyongyu added 2 commits October 9, 2023 18:21

chain: add quit signal to mempool to unlock shutdown process

3b15bc3

This commit adds a quit signal such that mempool can be torn down quickly once the upper layer decides to shut down.

chain: refactor UpdateMempoolTxes to fetch mempool directly

0efe36f

yyforyongyu force-pushed the batch-client-mempool branch 3 times, most recently from d0a835b to fc64ebc Compare October 9, 2023 11:40

Crypt-iQ reviewed Oct 9, 2023

View reviewed changes

yyforyongyu force-pushed the batch-client-mempool branch 2 times, most recently from cf3f51d to 6d01043 Compare October 11, 2023 07:53

Crypt-iQ approved these changes Oct 11, 2023

View reviewed changes

yyforyongyu added 5 commits October 12, 2023 15:44

chain: make getRawTxBatchSize and batchWaitInterval configurable

a82ce7c

chain: modify mempool config to fix unit tests

7f81f41

multi: add command to clean modules

adadc86

gomod: update mod using go work sync

d9466fa

Running the following commands, ``` go work sync make tidy-module ```

yyforyongyu added 2 commits October 12, 2023 15:44

chain: add verbose logging when error in getRawTxIgnoreErr

026bbea

This commit modifies the method `getRawTxIgnoreErr` to log more useful info. It also removes `*mempool` as the method reciever as it's unnecessary to put it on mempool.

chain: patch unit tests for batchGetRawTxes

4acd62b

yyforyongyu force-pushed the batch-client-mempool branch from 6d01043 to 4acd62b Compare October 12, 2023 07:44

Crypt-iQ approved these changes Oct 12, 2023

View reviewed changes

guggero merged commit afbabbe into btcsuite:master Oct 12, 2023
3 checks passed

yyforyongyu deleted the batch-client-mempool branch October 12, 2023 15:31

buck54321 pushed a commit to buck54321/btcwallet that referenced this pull request Apr 21, 2024

Merge pull request btcsuite#888 from yyforyongyu/batch-client-mempool

281eaa5

chain: use batch rpc client in mempool poller

buck54321 pushed a commit to buck54321/btcwallet that referenced this pull request Apr 21, 2024

Merge pull request btcsuite#888 from yyforyongyu/batch-client-mempool

09bcc03

chain: use batch rpc client in mempool poller

chain: use batch rpc client in mempool poller #888

chain: use batch rpc client in mempool poller #888

Conversation

yyforyongyu commented Sep 20, 2023

guggero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guggero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guggero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Roasbeef commented Sep 27, 2023

saubyk commented Oct 5, 2023

Crypt-iQ commented Oct 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Crypt-iQ left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment