Discussion: Keep Indexer Data on Disk #1634

containerman17 · 2024-10-04T02:02:02Z

I propose storing the indexer data on disk. Right now, it's all kept in RAM, then copied to disk, and later restored from disk to RAM during Node startup. Speed shouldn't be a concern. Modern SSDs are cheap and abundant, and with ext4's page cache, performance should be solid. In production, a simple caching server could be placed in front for added speed.

I love having a built-in indexer. Indexers in EVM are a pain, so let's make this a fully functional one—we're already 99% there.

I also suggest removing the const maxBlockWindow uint64 = 1_000_000 limit on stored blocks. Since the data will be on disk, it’s no longer necessary. Instead, we can limit the size with an option like --max-indexer-size=4TB.

Indexer nodes shouldn't be validators, and validator nodes shouldn't index. That's how it works in EVM, and I envision the same for HyperSDK.

P.S. The only issue I see is that block history won't be syncable across nodes, but we've never discussed keeping the entire chain history anyway.

The text was updated successfully, but these errors were encountered:

aaronbuchwald · 2024-10-04T15:47:20Z

We do currently keep the full retention window on disk, but we explicitly avoid storing a blockID -> height mapping because it will produce a heavy random write workload. To support lookups by both height and blockID, we iterate over the full height based mapping on startup so that we can load the blockID -> height mapping into memory. This motivates setting an upper bound to prevent prolonged load times on startup.

There are a couple of tradeoffs here:

do we support an on-disk blockID -> block height mapping (heavy random workload)?
do we set a maximum retention window?
how much do we rely on cache vs. read from disk? (if we set a low enough limit, no problem with relying on cache)
to what extent can we push users from depending directly on HyperSDK node APIs, so that we're trapped into supporting them vs. pushing users to using scalable services outside of the node

The big question is where to draw the line between an API served by the HyperSDK in the node, as a sidecar using code provided by the HyperSDK, or as an external service built to scale out horizontally?

containerman17 · 2024-10-07T05:31:19Z

HyperSDK should provide sufficient tooling for at least 80% of projects by default without needing any additional software, IMHO. Let me know if you disagree.

I'll run my benchmarks on NVMe and EBS and will get back to you with the results so we can continue the conversation with data.

containerman17 · 2024-10-08T06:17:06Z

I made a benchmark, and the performance is more than adequate.

The benchmark runs on 100k blocks, fills them, then queries randomly.

Setup

The transactions are Transfer transactions with a TransferResult result, signed with a dummy key. This setup is as close to reality as possible. Each block contains 1000 transactions, so 100k blocks contain 100 million transactions total, occupying 1.2GB on disk.

Database structure

Blocks are stored as height -> block bytes pairs. To find blocks by ID, we store block ID => height pairs. We don't store transactions separately. Instead, every time we fetch a transaction from an entire block, we keep txID => block height pairs in a separate database.

Write benchmark

The benchmark calls indexer.Accept in batches of 1000 blocks at a time. Note: this isn't actual API-level batching, but rather calling indexer.Accept 1000 times with wait groups. On my Mac, it takes 250 to 500 ms to record 1000 blocks, containing 1 million transactions.

Read benchmark

In the read benchmark, we retrieve:

2000 random blocks using random(0, chainHeight-1)
From these random blocks, get their IDs and then retrieve the same 2000 blocks by ID
From the random blocks, get one transaction per block, and retrieve those by ID

Overall results:

Retrieving blocks by ID: ~3300 requests per second
Retrieving blocks by height: ~3700 requests per second
Retrieving transactions by ID: ~1500 requests per second

Problems

I often hit OOM when retrieving more than 2000 transactions at once, but that's probably due to the benchmark structure.
Occasionally, I get a negative WaitGroup counter error in Pebble. Not sure if I'm using Pebble incorrectly or if it's a race condition bug.
I haven't tested on network block storage (like default AWS EC2 storage), but I'm 100% sure performance will be significantly lower due to the high latency of network-based block storage.

Conclusions

For 100k TPS writing 100 blocks per second, enabling the indexer could be a major slowdown for a validating node. However, for a node dedicated to indexing, running even on such an underpowered machine, as Macbook Air, the limit would be at least 2k blocks with 1000 transactions each—around 2 million TPS, which is more than enough.

Possible improvements

Batch writes to disk once per second (I believe should do at least 10x improvements on ingestion)
Add read caching
Keep the last 100 blocks in memory
Store transaction duplicates. Will make tx lookups way faster, but requires almost double the storage space
Add pruning

Here's the branch with the benchmark: indexer-on-disk
Benchmark log

containerman17 · 2024-10-08T14:05:11Z

From offline conversation: Need to test 100+ gigs DBs

containerman17 · 2024-10-10T11:34:20Z

Updated benchmark for an 11GB database:

Database size on disk: 11GB. Blocks are digested at around 400-500 blocks per second without parallelization enabled. Each block contains 1000 transactions, so that's 400-500k TPS. The read speed has decreased slightly but remains solid—2k+ RPS for whole blocks and 1k+ RPS for individual transactions. Overall, this benchmark processed over 2 million blocks with a rolling window of 1 million blocks.

2024/10/10 11:31:31 accepted 9359 blocks
2024/10/10 11:31:32 accepted 9691 blocks
2024/10/10 11:31:32 accepted 10k blocks containing 10m txs (height=2m) in 21.654585233s. Database occupies 11G on disk. 461k TPS
2024/10/10 11:31:33 Retrieved 1000 blocks by ID in 439.584919ms (2274.87 RPS)
2024/10/10 11:31:34 Retrieved 1000 blocks by height in 405.051033ms (2468.82 RPS)
2024/10/10 11:31:35 Retrieved 1000 transactions by ID in 821.499821ms (1217.29 RPS)
2024/10/10 11:31:38 accepted 359 blocks
2024/10/10 11:31:39 accepted 720 blocks

containerman17 added the lifecycle/frozen label Oct 8, 2024

containerman17 self-assigned this Oct 8, 2024

containerman17 linked a pull request Oct 11, 2024 that will close this issue

[WIP] Indexer on disk #1660

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Keep Indexer Data on Disk #1634

Discussion: Keep Indexer Data on Disk #1634

containerman17 commented Oct 4, 2024 •

edited

Loading

aaronbuchwald commented Oct 4, 2024

containerman17 commented Oct 7, 2024

containerman17 commented Oct 8, 2024 •

edited

Loading

containerman17 commented Oct 8, 2024

containerman17 commented Oct 10, 2024

Discussion: Keep Indexer Data on Disk #1634

Discussion: Keep Indexer Data on Disk #1634

Comments

containerman17 commented Oct 4, 2024 • edited Loading

aaronbuchwald commented Oct 4, 2024

containerman17 commented Oct 7, 2024

containerman17 commented Oct 8, 2024 • edited Loading

Setup

Database structure

Write benchmark

Read benchmark

Problems

Conclusions

Possible improvements

containerman17 commented Oct 8, 2024

containerman17 commented Oct 10, 2024

containerman17 commented Oct 4, 2024 •

edited

Loading

containerman17 commented Oct 8, 2024 •

edited

Loading