Backfilling history within RPC's retention window #323
Replies: 7 comments 8 replies
-
+1
Does backfilling need a Captive Core option? I would actually argue that backfill should be CDP only. Reasons being
I'm in favor of option 3 |
Beta Was this translation helpful? Give feedback.
-
My vote is for 1 and sync via captive core by default with an option to use CDP instead |
Beta Was this translation helpful? Give feedback.
-
There are a few challenges with option 3, Backfill offline via a different command (a la horizon reingest):
+1 for making it configurable.
this could be challenging from an operator experience perspective because there are 2 sources of ledgers (captive-core and cdp) and 2 phases of ingestion (backfill and live). So in total there are 4 different combinations that they could consider:
To minimize confusion, I am inclined to allow the operator to only chose the source of ledgers (captive core vs cdp) and not allow them to customize the ingestion source for each phase of ingestion. |
Beta Was this translation helpful? Give feedback.
-
I'm inclined to support option (1), synchronous backfill. I'll echo the problems others have brought up with the other options: In (2), asynchronous backup is untenable with Captive Core, since you'd need one in catchup mode and one for live ingestion. I could get on board with asynchronous CDP backfill if we can nail down the multiple-writer complexity of sqlite, but that also introduces the divergent functionality paths Tamir mentions. In (3), you enter the world of state machine management hell that we have in Horizon, which includes dealing with gaps as Tamir mentioned. With (1) we have the cleanest possible approach, and we could theoretically avoid some of the issues outlined by Molly:
I'm pretty sure that if we synced to the database on every ledger, we could safely allow endpoints to return results within the in-progress backfill window. This would at least enable historical queries quickly, with the window increasing as backfill progresses.
Only if backfill is enabled! Since it's opt-in, people will be aware of the performance risks. In order to avoid gaps and retention window issues, I think we should allow only a single parameter which specifies the number of ledgers backwards from the current tip to backfill on startup. If this parameter is set, it becomes equivalent to the retention window, meaning only one of these can be used at a time: |
Beta Was this translation helpful? Give feedback.
-
I agree with everyone else about doing synchronously being a better option
I think this option would be ideal and I don't think that it would be a lot more complicated than doing it synchronously (apart from coordinating it with reaping). But it does require running a second copy of Captive Core. Unless, of course, CDP is used. Here is the Epic I created a while ago about it #196
We should. What ingestion backend we use shouldn't make a difference. EDIT: BTW, regardless of the option we choose, I think we can reuse some of the tickets I created for #196 . For instance, #203 will be necessary regardless of what we choose. |
Beta Was this translation helpful? Give feedback.
-
I personally prefer (1). It may be very simple for multi-node users to use it. Before completing the backfill, health should be set to false, and once completed, it should be set to true, allowing us to direct traffic to the new node. |
Beta Was this translation helpful? Give feedback.
-
To summarize the discussions thus far (comment if you disagree):
|
Beta Was this translation helpful? Give feedback.
-
What
RPC currently populates it's retention window just via "forward fill"; meaning, it starts out with no data and does not prune away any old data until it fills up it's retention window. This means that when spinning up a new node, an operator needs to wait the duration of their retention window before that RPC is actually retaining the amount of history that it is configured to retain. We'd propose changing that, so that an RPC can have a full retention window immediately (or quickly) on startup.
Why
There have been several context in which this has come up, some discussed internally and some from various provider use-cases. Some of these include:
Note that I haven't actually seen/heard anyone complain about this@overcat has indicated they'd want thisHow
How exactly we should implement this from a product-perspective is debatable, and we have a few different options. For example:
horizon reingest
), where backfill is mutually exclusive with "live" ingestionI think either of option 1/3 would be reasonable, but open to other thoughts. Other things/options to consider:
Beta Was this translation helpful? Give feedback.
All reactions