Support replaying with less memory #4401

kderme · 2024-06-13T21:00:58Z

When a block is reapplied, the ledger only generates the new state and doesn't validate the given block/txs. This makes the process faster than normal application, but still uses the same memory, since the result ledger state is the same. This is a desired property for the node, however for clients that only replay the chain for its staking and governance data and events, some parts of the ledger are never actually used. This includes the multiassets, payment credential and other parts of the UTxO set, like inline scripts, datums etc.

This feauture request proposes the addition of a ledger flag, which changes the reapplication ledger state result in terms of these data.

An open question is if this can be supported without major changes to ledger. My assumption is that it can. The BabbageTxOut contains a number of constructors. The flag could force the usage of only one or a couple of these constructors, while leaving fields like multiassets and stake creadential empty. A new constructor hopefully is not necessary, since it could affect the performance of the node.

To define more formally what is requested, given the existing block reapplication:

reapply:: State -> Block -> (State, Events)

We want to find a function t, that trims the UTxO set as much as possible

t :: State -> State

and define a new function

reapply':: State -> Block -> (State, Events)

such that

if
(s', e) == reapply s b
then
(t s', e) == reapply' (t s) b

Clients like db-sync will be mostly benefited from this, however it could be used to test an important property that ledger and node relies on: The ledger reapplication should work and reach a correct ledger state without relying on unecessary data. Also making it easier to create a ledger state could be used for debugging.

The text was updated successfully, but these errors were encountered:

lehins · 2024-06-19T20:00:15Z

I am wondering if there is a real reason to pursue this approach with UTxOHD being around the corner? I assume it would solve this issue of unnecessary memory overhead, right?

An open question is if this can be supported without major changes to ledger. My assumption is that it can.

Unfortunately support for this would require significant changes. It could theoretically be done, but it would add quite a bit of complexity to the already complex rules, because we would have to track in the ledger rules all the parts that would need to not be applied to the ledger state on a specific flag. The biggest reason why I would oppose this change is that, besides ignoring certain state modifications, we would also have to perform some validations conditionally on this flag, which all together sounds too dangerous to me.

I think I can suggest an alternative approach that might work for db-sync. Namely to employ some postprocessor that would strip out the unnecessary parts from the state after applying a block. The only required trick for this to work is that all ledger validation has to be turned off, i.e reapplyBlock is not sufficient, because it still performs some validations. So applyBlock has to be called with NoValidation, cause otherwise predicate failures like ValueNotConservedUTxO will be triggered. In other words it would work like that:

apply a block without validations to a stripped down state
look into transactions in the block and collect redundant data that they could have added to the state
remove that data from the state.

My suggestion would allow for ledger not to worry about this optimization that is not relevant for the chain safety and performance and it would allow for the team to stay sane working on those rules.

The ledger reapplication should work and reach a correct ledger state without relying on unecessary data. Also making it easier to create a ledger state could be used for debugging.

This use case is not accurate at all. We do need all data in order to reach the correct state. It is just that db-sync's view differs from the one that node and ledger have about what correct state actually means 🙂

There is plenty of data on chain that is not relevant for ledger, eg aux data, anchors, etc. Almost none of it is stored in the ledger state precisely because it is not relevant for ledger.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support replaying with less memory #4401

Support replaying with less memory #4401

kderme commented Jun 13, 2024 •

edited

Loading

lehins commented Jun 19, 2024

Support replaying with less memory #4401

Support replaying with less memory #4401

Comments

kderme commented Jun 13, 2024 • edited Loading

lehins commented Jun 19, 2024

kderme commented Jun 13, 2024 •

edited

Loading