diff --git a/.vscode/settings.json b/.vscode/settings.json new file mode 100644 index 00000000..aae9a96f --- /dev/null +++ b/.vscode/settings.json @@ -0,0 +1,6 @@ +{ + "[markdown]": { + "editor.wordWrap": "wordWrapColumn" + }, + "editor.wordWrap": "off" +} \ No newline at end of file diff --git a/docs/eigenda/architecture/README.mdx b/docs/eigenda/architecture/README.mdx new file mode 100644 index 00000000..b4f4a573 --- /dev/null +++ b/docs/eigenda/architecture/README.mdx @@ -0,0 +1,129 @@ +--- +sidebar_position: 2 +title: Architecture +--- + +## Introduction + +EigenDA is a Data Availability (DA) service, implemented as an actively validated service (AVS) on EigenLayer, that provides secure and scalable DA for L2s on Ethereum. + +### What is DA? + +In informal terms, DA is a guarantee that a given piece of data will be available to anyone who wishes to retrieve it. + +A DA system accepts blobs of data (via some interface) and then makes them available to retrievers (through another interface). + +Two important aspects of a DA system are + +1. Security: The security of a DA system constitutes the set of conditions which are sufficient to ensure that all data blobs certified by the system as available are indeed available for honest retrievers to download. +2. Throughput: The throughput of a DA system is the rate at which the system is able to accept blobs of data, typically measured in bytes/second. + +### An EigenLayer AVS for DA + +EigenDA is implemented as an actively validated service on EigenLayer, which is a restaking protocol for Ethereum. + +Because of this, EigenDA makes use of the EigenLayer state, which is stored on Ethereum, for consensus about the state of operators and as a callback for consensus about the availability of data. This means that EigenDA can be simpler in implementation than many existing DA solutions: EigenDA doesn't need to build it's own chain or consensus protocol; it rides on the back of Ethereum. + +### A first of its kind, horizontally scalable DA solution + +Among extant DA solutions, EigenDA takes an approach to scalability which is unique in that it yields true horizontal scalability: Every additional unit of capacity contributed by a operator can increase the total system capacity. + +This property is achieved by using a Reed Solomon erasure encoding scheme to shard the blob data across the DA nodes. While other systems such as Celestia and Danksharding (planned) also make use of Reed Solomon encoding, they do so only for the purpose of supporting certain observability properties of Data Availability Sampling (DAS) by light nodes. On the other hand, all incentivized/full nodes of the system download, store, and serve the full system bandwidth. + +Horizontal scalability provides the promise for the technological bottlenecks of DA capacity to continually track demand, which has enormous implications for Layer 2 ecosystems. + +### Security Model + +EigenDA produces a DA attestation which asserts that a given blob or collection of blobs is available. Attestations are anchored to one or more "Quorums," each of which defines a set of EigenLayer stakers which underwrite the security of the attestation. Quorums should be considered as redundant: Each quorum linked to an attestation provides an independent guarantee of availability as if the other quorums did not exist. + +Each attestation is characterized by safety and liveness tolerances: + +- Liveness tolerance: Conditions under which the system will produce an availability attestation. +- Safety tolerance: Conditions under which an availability attestation implies that data is indeed available. + +EigenDA defines two properties of each blob attestation which relate to its liveness and safety tolerance: + +- Liveness threshold: The liveness threshold defines the minimum percentage of stake which an attacker must control in order to mount a liveness attack on the system. +- Safety threshold: The safety threshold defines the total percentage of stake which an attacker must control in order to mount a first-order safety attack on the system. + +The term "first-order attack" alludes to the fact that exceeding the safety threshold may represent only a contingency rather than an actual safety failure due to the presence of recovery mechanisms that would apply during such a contingency. Discussion of such mechanisms is outside of the scope of the current documentation. + +Safety thresholds can translate directly into cryptoeconomic safety properties for quorums consisting of tokens which experience toxicity in the event of publicly observable attacks by a large coalition of token holders. This an other discussions of cryptoeconomic security are also beyond the scope of this technical documentation. We restrict the discussion to illustrating how the protocol preserves the given safety and liveness thresholds. + +## System Architecture + +![image](./assets/architecture.png) + +### Core Components + +- **DA nodes** are the service providers of EigenDA, storing chunks of blob data for a predefined time period and serving these chunks upon request. +- The **disperser** is responsible for encoding blobs, distributing them to the DA nodes, and aggregating their digital signatures into a DA attestation. As the disperser is currently centralized, it is trusted for system liveness; the disperser will be decentralized over time. +- The disperser and the DA nodes both depend on the **Ethereum L1** for shared state about the DA node registration and stake delegation. The L1 is also currently used to bridge DA attestations to L2 end-user applications such as rollup chains. + +### Essential flows + +**Dispersal**. The is the flow by which data is made available and consists of the following steps: + +1. The Disperser receives a collection of blobs, [encodes them], constructs a batch of encoded blobs and headers, and sends the sharded batch to the DA nodes. +2. The DA nodes validate their shares of the batch, and return an attestation consisting of a BLS signature of the batch header. +3. The disperser collects the attestations from the DA nodes and aggregates them into a single aggregate attestation. + +**Bridging**. For a DA attestation to be consumed by the L2 end-user (e.g. a rollup), the it must be bridged to a chain from which the L2 can read. This might simply be the Ethereum L1 itself, but in many cases it is more economical to bridge directly into the L2 since this drastically decreases signature verification costs. For the time being all attestations are bridged to the L1 by the disperser. + +**Retrieval**. Interested parties such as rollup challengers that want to obtain rollup blob data can retrieve a blob by downloading the encoded chunks from the DA nodes and decoding them. The blob lookup information contained in the request is obtained from the from the bridged attestation to the DA nodes. + +## Protocol Overview + +For expositional purposes, we will divide the protocol into two conceptual layers: + +- Attestation Layer: Modules to ensure that whenever a DA attestation is accepted by an end-user (e.g. a rollup), then the data is indeed available. More specifically, the attestation layer ensures that the system observes the safety and liveness tolerances defined in the [Security Model](#security-model) section. +- Network Layer: The communications protocol which ensures that the liveness and safety of the protocol are robust against network-level events and threats. + +![image](./assets/attestation-layer.png) + +![image](./assets/network-layer.png) + +## Attestation Layer + +The attest layer is responsible for ensuring that when the network-level assumptions and safety and liveness tolerances are observed, the system properly makes data available. + +The primary responsibility of the attestation layer is to enable consensus about whether a given blob of data is fully within the custody of a set of honest nodes. (Here, what can be taken to be a set of honest nodes is defined by the system safety tolerance and the assurance that these honest nodes will be able to transmit the data to honest retrievers is handled by the network layer.) Since EigenDA is an EigenLayer AVS it does not need its own actual consensus protocol, but can instead piggy-back off of Ethereum's consensus. As a result, the attestation layer decomposes into two fairly straightforward pieces: + +- **Attestation Logic**: The attestation logic allows us to answer the question of whether a given blob is available, given both a DA attestation and the validator state at the associated Ethereum block. The attestation logic can be understood as simply a function of these inputs which outputs yes or no, depending on whether these inputs imply that data is available. Naturally, this function is grounded upon assumptions about the behavior of honest nodes, which must perform certain validation actions as part of the attestation layer. The attestation logic further decomposes into two major modules: + - *Encoding*: The encoding module defines a procedure for blobs to be encoded in such a way that their successful reconstruction can be guaranteed given a large enough collection of unique encoded chunks. The procedure also allows for the chunks to be trustlessly verified against a blob commitment so that the disperser cannot violate the protocol. + - *Assignment*: The assignment module provides a deterministic mapping from validator state to an allocation of encoded chunks to DA nodes. The mapping is designed to uphold safety and liveness properties with minimal data-inefficiency. +- **Bridging**: Bridging describes how the attestation is bridged to the consumer protocol, such as that of the rollup. In principle, bridging can be performed in one of several different ways in order to optimize efficiency and composability. At the moment, only bridging via the Ethereum L1 is directly supported. + +![image](./assets/attestation-layer-parts.png) + +The desired behavior of the attestation logic can be formally described as follows (Ignore this if you're happy with the high level ideas): Let $\alpha$ denote the safety threshold, i.e. the maximum proportion of adversarial stake that the system is able to tolerate. Likewise, let $\beta$ represent the amount of stake that we require to be held by the signing operators in order to accept an attestation, i.e. one minus the liveness threshold. Also, let $O$ denote the set of EigenDA operators. + +We need to guarantee that any set of signing operators $U_q \subseteq O$ such that + +$$ \sum_{i \in U_q} S_i \ge \beta \sum_{i \in O}S_i$$ + +and any set of adversarial operators $U_a \subseteq U_q$ such + +$$ \sum_{i \in U_a} S_i \le \alpha \sum_{i \in O}S_i$$ + +we we can reconstruct the original data blob from the chunks held by $U_q \setminus U_a$. + +### Encoding Module + +The encoding module defines a procedure for blobs to be encoded in such a way that their successful reconstruction can be guaranteed given a large enough collection of unique encoded chunks. The procedure also allows for the chunks to be trustlessly verified against a blob commitment so that the disperser cannot violate the protocol. + +[Read more](./encoding.md) + +### Assignment Module + +The assignment module is nothing more than a rule which takes in the Ethereum chain state and outputs an allocation of chunks to DA operators. + +[Read more](./assignment.md) + +### Signature verification and bridging + +[Read more](./bridging.md) + +## Network Layer + +This section is under construction. \ No newline at end of file diff --git a/docs/eigenda/architecture/amortized-proving.md b/docs/eigenda/architecture/amortized-proving.md new file mode 100644 index 00000000..b9aa027d --- /dev/null +++ b/docs/eigenda/architecture/amortized-proving.md @@ -0,0 +1,60 @@ +# Amortized KZG Prover Backend + +It is important that the encoding and commitment tasks are able to be performed in seconds and that the dominating complexity of the computation is nearly linear in the degree of the polynomial. This is done using algorithms based on the Fast Fourier Transform (FFT). + +This document describes how the KZG-FFT encoder backend implements the `Encode(data [][]byte, params EncodingParams) (BlobCommitments, []*Chunk, error)` interface, which 1) transforms the blob into a list of `params.NumChunks` `Chunks`, where each chunk is of length `params.ChunkLength` 2) produces the associated polynomial commitments and proofs. + +We will also highlight the additional constraints on the Encoding interface which arise from the KZG-FFT encoder backend. + +## Deriving the polynomial coefficients and commitment + +As described in the [Encoding Module Specification](../spec/protocol-modules/storage/encoding.md), given a blob of data, we convert the blob to a polynomial $p(X) = \sum_{i=0}^{m-1} c_iX^i$ by simply slicing the data into a string of symbols, and interpreting this list of symbols as the tuple $(c_i)_{i=0}^{m-1}$. + +In the case of the KZG-FFT encoder, the polynomial lives on the field associated with the BN254 elliptic curve, which as order [TODO: fill in order]. + +Given this polynomial representation, the KZG commitment can be calculated as in [KZG polynomial commitments](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html). + +## Polynomial Evaluation with the FFT + +In order to use a Discrete Fourier Transform (DFT) to evaluate a polynomial, the indices of the polynomial evaluations which will make up the Chunks must be members of a cyclic group, which we will call $S$. A cyclic group is the group generated by taking all of the integer powers of some generator $v$, i.e., $\{v^k | k \in \mathbb{Z} \}$ (For this reason, the elements of a cyclic group $S$ of order $|S|=m$ will sometimes be referred to as the $|m|$’th roots of unity). Notice that since our polynomial lives on the BN254 field, the group $S$ must be a subgroup of that field (i.e. all if its elements must lie within that field). + +Given a cyclic group $S$ of order $m$, we can evaluate a polynomial $p(X)$ of order $n$ at the indices contained in $S$ via the DFT, + +$$ +p_k = \sum_{i=1}^{n}c_i (v^k)^i +$$ + +where $p_k$ gives the evaluation of the polynomial at $v^k \in S$. Letting $c$ denote the vector of polynomial coefficients and $p$ the vector of polynomial evaluations, we can use the shorthand $p = DFT[c]$. The inverse relation also holds, i.e., $c = DFT^{-1}[p]$. + +To evaluate the DFT programmatically, we want $m = n$. Notice that we can achieve this when $m > n$ by simply padding $c$ with zeros to be of length $m$. + +The use of the FFT can levy an additional requirement on the size of the group $S$. In our implementation, we require the size of $S$ to be a power of 2. For this, we can make use of the fact that the prime field associated with BN254 contains a subgroup of order $2^{28}$, which in turn contains subgroups of orders spanning every power of 2 less than $2^{28}$. + +As the encoding interface calls for the construction of `NumChunks` Chunks of length `ChunkLength`, our application requires that $S$ be of size `NumChunks*ChunkLength`, which in turn must be a power of 2. + +## Amortized Multireveal Proof Generation with the FFT + +The construction of the multireveal proofs can also be performed using a DFT (as in [“Fast Amortized Kate Proofs”](https://eprint.iacr.org/2023/033.pdf)). Leaving the full details of this process to the referenced document, we describe here only 1) the index-assignment the scheme used by the amortized multiproof generation approach and 2) the constraints that this creates for the overall encoder interface. + +Given the group $S$ corresponding to the indices of the polynomial evaluations and a cyclic group $C$ which is a subgroup of $S$, the cosets of $C$ in $S$ are given by + +$$ +s+C = \{g+c : c \in C\} \text{ for } s \in S. +$$ + +Each coset $s+C$ has size $|C|$, and there are $|S|/|C|$ unique and disjoint cosets. + +Given a polynomial $p(X)$ and the groups $S$ and $C$, the Amortized Kate Proofs approach generates $|S|/|C|$ different KZG multi-reveal proofs, where each proof is associated with the evaluation of $p(X)$ at the indices contained in a single coset $sC$ for $s \in S$. Because the Amortized Kate Proofs approach uses the FFT under the hood, $C$ itself must have an order which is a power of 2. + +For the purposes of the KZG-FFT encoder, this means that we must choose $S$ to be of size `NumChunks*ChunkLength` and $C$ to be of size `ChunkLength`, each of which must be powers of 2. + +## Worked Example + +As a simple illustrative example, suppose that `AssignmentCoordinator` provides the following parameters in order to meet the security requirements of given blob: + +- `ChunkLength` = 3 +- `NumChunks` = 4 + +Supplied with these parameters, `Encoder.ParamsFromMins` will upgrade `ChunkLength` to the next highest power of 2, i.e., `ChunkLength` = 4, and leave `NumChunks` unchanged. The following figure illustrates how the indices will be assigned across the chunks in this scenario. + +![Worked example of chunk indices for ChunkLength=4, NumChunks=4](./assets/encoding-groups.png) diff --git a/docs/eigenda/architecture/assets/architecture.png b/docs/eigenda/architecture/assets/architecture.png new file mode 100644 index 00000000..0668d219 Binary files /dev/null and b/docs/eigenda/architecture/assets/architecture.png differ diff --git a/docs/eigenda/architecture/assets/assignment-module.png b/docs/eigenda/architecture/assets/assignment-module.png new file mode 100644 index 00000000..5775eb10 Binary files /dev/null and b/docs/eigenda/architecture/assets/assignment-module.png differ diff --git a/docs/eigenda/architecture/assets/attestation-layer-parts.png b/docs/eigenda/architecture/assets/attestation-layer-parts.png new file mode 100644 index 00000000..1fa03894 Binary files /dev/null and b/docs/eigenda/architecture/assets/attestation-layer-parts.png differ diff --git a/docs/eigenda/architecture/assets/attestation-layer.png b/docs/eigenda/architecture/assets/attestation-layer.png new file mode 100644 index 00000000..c39f37cf Binary files /dev/null and b/docs/eigenda/architecture/assets/attestation-layer.png differ diff --git a/docs/eigenda/architecture/assets/batcher.png b/docs/eigenda/architecture/assets/batcher.png new file mode 100644 index 00000000..fbc30f81 Binary files /dev/null and b/docs/eigenda/architecture/assets/batcher.png differ diff --git a/docs/eigenda/architecture/assets/bridging-module.png b/docs/eigenda/architecture/assets/bridging-module.png new file mode 100644 index 00000000..7b747131 Binary files /dev/null and b/docs/eigenda/architecture/assets/bridging-module.png differ diff --git a/docs/eigenda/architecture/assets/disperser-components.png b/docs/eigenda/architecture/assets/disperser-components.png new file mode 100644 index 00000000..bb4f6115 Binary files /dev/null and b/docs/eigenda/architecture/assets/disperser-components.png differ diff --git a/docs/eigenda/architecture/assets/disperser.png b/docs/eigenda/architecture/assets/disperser.png new file mode 100644 index 00000000..6a032e28 Binary files /dev/null and b/docs/eigenda/architecture/assets/disperser.png differ diff --git a/docs/eigenda/architecture/assets/encoder.png b/docs/eigenda/architecture/assets/encoder.png new file mode 100644 index 00000000..086fce21 Binary files /dev/null and b/docs/eigenda/architecture/assets/encoder.png differ diff --git a/docs/eigenda/architecture/assets/encoding-groups.png b/docs/eigenda/architecture/assets/encoding-groups.png new file mode 100644 index 00000000..337b90a6 Binary files /dev/null and b/docs/eigenda/architecture/assets/encoding-groups.png differ diff --git a/docs/eigenda/architecture/assets/encoding-module.png b/docs/eigenda/architecture/assets/encoding-module.png new file mode 100644 index 00000000..bef722b6 Binary files /dev/null and b/docs/eigenda/architecture/assets/encoding-module.png differ diff --git a/docs/eigenda/architecture/assets/network-layer.png b/docs/eigenda/architecture/assets/network-layer.png new file mode 100644 index 00000000..10d54f1f Binary files /dev/null and b/docs/eigenda/architecture/assets/network-layer.png differ diff --git a/docs/eigenda/architecture/assets/overview.png b/docs/eigenda/architecture/assets/overview.png new file mode 100644 index 00000000..04162999 Binary files /dev/null and b/docs/eigenda/architecture/assets/overview.png differ diff --git a/docs/eigenda/architecture/assignment.md b/docs/eigenda/architecture/assignment.md new file mode 100644 index 00000000..d09e9c98 --- /dev/null +++ b/docs/eigenda/architecture/assignment.md @@ -0,0 +1,80 @@ +# Assignment Module + +The assignment module is essentially a rule which takes in the Ethereum chain state and outputs an allocation of chunks to DA operators. This can be generalized to a function that outputs a set of valid allocations. + +A chunk assignment has the following parameters: + +1) **Indices**: the chunk indices that will be assigned to each DA node. Some DA nodes receive more than one chunk. +2) **ChunkLength**: the length of each chunk (measured in number of symbols, as defined by the encoding module). We currently require all chunks to be of the same length, so this parameter is a scalar. + +The assignment module is implemented by the `AssignmentCoordinator` interface. + +![image](./assets/assignment-module.png) + +## Assignment Logic + +The standard assignment coordinator implements a very simple logic for determining the number of chunks per node and the chunk length, which we describe here. + +**Chunk Length** + +Chunk lengths must be sufficiently small that operators with a small proportion of stake will be able to receive a quantity of data commensurate with their stake share. For each operator $i$, let $S_i$ signify the amount of stake held by that operator. + +We require that the chunk size $C$ satisfy + +$$ +C \le \text{NextPowerOf2}\left(\frac{B}{\gamma}\max\left(\frac{\min_jS_j}{\sum_jS_j}, \frac{1}{M_\text{max}} \right) \right) +$$ + +where $\gamma = \beta-\alpha$, with $\alpha$ and $\beta$ the adversary and quorum thresholds as defined in the [Overview](../overview.md). + +This means that as long as an operator has a stake share of at least $1/M_\text{max}$, then the encoded data that they will receive will be within a factor of 2 of their share of stake. Operators with less than $1/M_\text{max}$ of stake will receive no more than a $1/M_\text{max}$ of the encoded data. $M_\text{max}$ represents the maximum number of chunks that the disperser can be required to encode per blob. This limit is included because proving costs scale somewhat super-linearly with the number of chunks. + +In the future, additional constraints on chunk length may be added; for instance, the chunk length may be set in order to maintain a fixed number of chunks per blob across all system states. Currently, the protocol does not mandate a specific value for the chunk length, but will accept the range satisfying the above constraint. The `CalculateChunkLength` function is provided as a convenience function that can be used to find a chunk length satisfying the protocol requirements. + +**Index Assignment** + +For each operator $i$, let $S_i$ signify the amount of stake held by that operator. We want for the number of chunks assigned to operator $i$ to satisfy + +$$ +\frac{\gamma m_i C}{B} \ge \frac{S_i}{\sum_j S_j} +$$ + +Let + +$$ +m_i = \text{ceil}\left(\frac{B S_i}{C\gamma \sum_j S_j}\right)\tag{1} +$$ + +**Correctness** +Let's show that any sets $U_q$ and $U_a$ satisfying the constraints in the [Consensus Layer Overview](../overview.md#consensus-layer), the data held by the operators $U_q \setminus U_a$ will constitute an entire blob. The amount of data held by these operators is given by + +$$ +\sum_{i \in U_q \setminus U_a} m_i C +$$ + +We have from (1) and from the definitions of $U_q$ and $U_a$ that + +$$ +\sum_{i \in U_q \setminus U_a} m_i C \ge =\frac{B}{\gamma}\sum_{i \in U_q \setminus U_a}\frac{S_i}{\sum_j S_j} = \frac{B}{\gamma}\frac{\sum_{i \in U_q} S_i - \sum_{i \in U_a} S_i}{\sum_jS_j} \ge B \frac{\beta-\alpha}{\gamma} = B \tag{2} +$$ + +Since the unique data held by these operators exceeds the size of a blob, the encoding module ensures that the original blob can be reconstructed from this data. + +## Validation Actions + +Validation with respect to assignments is performed at different layers of the protocol: + +### DA Nodes + +When the DA node receives a `StoreChunks` request, it performs the following validation actions relative to each blob header: + +- It uses the `ValidateChunkLength` to validate that the `ChunkLength` for the blob satisfies the above constraints. +- It uses `GetOperatorAssignment` to calculate the chunk indices for which it is responsible, and verifies that each of the chunks that it has received lies on the polynomial at these indices (see [Encoding validation actions](./encoding.md#validation-actions)) + +This step ensures that each honest node has received the blobs for which it is accountable. + +Since the DA nodes will allow a range of `ChunkLength` values, as long as they satisfy the constraints of the protocol, it is necessary for there to be consensus on the `ChunkLength` that is in use for a particular blob and quorum. For this reason, the `ChunkLength` is included in the `BlobQuorumParam` which is hashed to create the merkle root contained in the `BatchHeaderHash` signed by the DA nodes. + +### Rollup Smart Contract + +When the rollup confirms its blob against the EigenDA batch, it checks that the `ConfirmationThreshold` for the blob is greater than the `AdversaryThreshold`. This means that if the `ChunkLength` determined by the disperser is invalid, the batch cannot be confirmed as a sufficient number of nodes will not sign. diff --git a/docs/eigenda/architecture/bridging.md b/docs/eigenda/architecture/bridging.md new file mode 100644 index 00000000..c31afb0d --- /dev/null +++ b/docs/eigenda/architecture/bridging.md @@ -0,0 +1,37 @@ +# Signature verification and bridging + +![image](./assets/bridging-module.png) + +### L1 Bridging + +Bridging a DA attestion for a specific blob requires the following stages: + +- *Bridging the batch attestation*. This involves checking the aggregate signature of the DA nodes for the batch, and tallying up the total amount of stake the signing nodes. +- *Verifying the blob inclusion*. Each batch contains a the root of a a Merkle tree whose leaves correspond to the blob headers contained in the batch. To verify blob inclusion, the associate Merkle proof must be supplied and evaluated. Furthermore, the specific quorum threshold requirement for the blob must be checked against the total amount of signing stake for the batch. + +For the first stage, EigenDA makes use of the EigenLayer's default utilities for managing operator state, verifying aggregate BLS signatures, and checking the total stake held by the signing operators. + +For the second stage, the EigenDA provides a utility contract with a `verifyBlob` method which rollups would typically integrate into their fraud proof pathway in the following manner: + +1. The rollup sequencer posts all lookup data needed to verify a blob against a batch to the rollup inbox contract. +2. To initiate a fraud proof, the challenger must call the `verifyBlob` method with the supplied lookup data. If the blob does not verify correctly, the blob is considered invalid. + +#### Reorg behavior (this section is outdated) + +One aspect of the chain behavior of which the attestation protocol must be aware is that of chain reorganization. The following requirements relate to chain reorganizations: + +1. Signed attestations should remain valid under reorgs so that a disperser never needs to resend the data and gather new signatures. +2. If an attestation is reorged out, a disperser should always be able to simply resubmit it after a specific waiting period. +3. Payloads constructed by a disperser and sent to DA nodes should never be rejected due to reorgs. + +These requirements result in the following design choices: + +- Chunk allotments should be based on registration state from a finalized block. +- If an attestation is reorged out and if the transaction containing the header of a batch is not present within `BLOCK_STALE_MEASURE` blocks since `referenceBlockNumber` and the block that is `BLOCK_STALE_MEASURE` blocks since `referenceBlockNumber` is finalized, then the disperser should again start a new dispersal with that blob of data. Otherwise, the disperser must not re-submit another transaction containing the header of a batch associated with the same blob of data. +- Payment payloads sent to DA nodes should only take into account finalized attestations. + +The first and second decisions satisfy requirements 1 and 2. The three decisions together satisfy requirement 3. + +Whenever the `confirmBatch` method of the ServiceManager.sol is called, the following checks are used to ensure that only finalized registration state is utilized: + +- Stake staleness check. The `referenceBlockNumber` is verified to be within `BLOCK_STALE_MEASURE` blocks before the confirmation block.This is to make sure that batches using outdated stakes are not confirmed. It is assured that stakes from within `BLOCK_STALE_MEASURE` blocks before confirmation are valid by delaying removal of stakes by `BLOCK_STALE_MEASURE + MAX_DURATION_BLOCKS`. diff --git a/docs/eigenda/architecture/components.mdx b/docs/eigenda/architecture/components.mdx new file mode 100644 index 00000000..85b1e953 --- /dev/null +++ b/docs/eigenda/architecture/components.mdx @@ -0,0 +1,68 @@ +# Components + +## Operator Node + +EigenDA Nodes are responsible for storing blobs and responding to retrieval requests. A decentralized set of operators run EigenDA nodes in exchange for payment. The [API](https://github.com/Layr-Labs/eigenda/blob/master/api/proto/node/node.proto#L9) of an operator node consists of three endpoints: `StoreChunks()`, `RetrieveChunks()` and `GetBlobHeader()`. +{/**/} + +## Disperser Backend + +

+Description +

+ +The disperser is an untrusted API hosted by Eigen Labs responsible for encoding, chunking and writing blobs to nodes. It returns a DA certificate to the client as a result, which can be used to verify that the data was dispersed correctly and later used to retrieve the blob and verify its accuracy. + +The full dispersal process involves registering a batch of blobs on Ethereum in a relatively expensive transaction, so Eigen Labs runs a disperser to amortize this fee across clients. Since the disperser is untrusted, any client can run their own disperser if they wish. + +Under the hood, the disperser is made up of an API service, several encoder services, and a batcher service. + +### Disperser-API Service + +The API service hosts the +[interface](https://github.com/Layr-Labs/eigenda/blob/master/api/proto/disperser/disperser.proto#L7) +through which clients interact with the disperser and by extension the EigenDA +operator set. + +Clients submit blobs to the `DisperseBlob()` or `DisperseBlobAuthenticated()` endpoints, receiving a request ID in response. The client can use the request ID with the `GetBlobStatus()` endpoint to track the progress of the blob's dispersal. Once the blob has been dispersed, this endpoint returns a DA certificate, which is used for later retrieving the blob with the `RetrieveBlob()` disperser/retriever endpoint. + +The following diagram shows the request flow when a blob is dispersed: + +disperser-encoder service + +{/*Link to details of DA certificate.*/} + +### Disperser-Encoder Service + +The encoder runs in a loop encoding all blobs that are enqueued for encoding. The following diagram describes the steps of this encoding process: + +disperser-batcher + +### Disperser-Batcher Service + +Periodically the batcher assembles a transaction with the blobs that were recently dispersed to submit to the manager contract onchain. The following diagram lays out the steps: + +dispserser-batcher + +## Manager Contract + +The manager contract on Ethereum is the central trusted location where +blobs are registered as available. This contract maintains a map in storage of +all of the blob batches that have ever been dispersed. This makes it easy for +the client contract to later verify whether a DA certificate that is being +submitted to it is referencing a blob that was made available. + +## Retriever Service + +The `RetrieveBlob()` endpoint available on the disperser is also made available +for convenience on a standalone service called the "retriever", which clients +can run if they wish. By performing network reads through a self-hosted +retriever sidecar, clients can avoid potential performance bottlenecks inherent +to the Eigen Labs disperser. + + is a standalone API service for reading, assembling, +and decoding blobs from the EigenDA operator set. For +convenience the disperser hosted by Eigen Labs also encapsulates a retriever +API. + +{/* */} diff --git a/docs/eigenda/architecture/encoding.md b/docs/eigenda/architecture/encoding.md new file mode 100644 index 00000000..703b6e7c --- /dev/null +++ b/docs/eigenda/architecture/encoding.md @@ -0,0 +1,57 @@ +# Encoding Module + +The encoding module defines a procedure for blobs to be encoded in such a way that their successful reconstruction can be guaranteed given a large enough collection of unique encoded chunks. The procedure also allows for the chunks to be trustlessly verified against a blob commitment so that the disperser cannot violate the protocol. + +![image](./assets/encoding-module.png) + +One way to think of the encoding module is that it must satisfy the following security requirements: + +1. *Adversarial tolerance for DA nodes*: We need to have tolerance to arbitrary adversarial behavior by any number of DA nodes up to some threshold. Note that while simple sharding approaches such as duplicating slices of the blob data have good tolerance to random node dropout, they have poor tolerance to worst-case adversarial behavior. +2. *Adversarial tolerance for disperser*: We do not want to put trust assumptions on the encoder or rely on fraud proofs to detect if an encoding is done incorrectly. + +## Trustless Encoding via KZG and Reed-Solomon + +EigenDA uses a combination of Reed-Solomon (RS) erasure coding and KZG polynomial commitments to perform trustless encoding. In this section, we provide a high level overview of how the EigenDA encoding module works and how it achieves these properties. + +### Reed Solomon Encoding + +Basic RS encoding is used to achieve the first requirement of *Adversarial tolerance for DA nodes*. This looks like the following: + +1. The blob data is represented as a string of symbols, where each symbol is elements in a certain finite field. The number of symbols is called the `BlobLength` +2. These symbols are interpreted as the coefficients of a `BlobLength`-1 degree polynomial. +3. This polynomial is evaluated at `NumChunks`*`ChunkLength` distinct indices. +4. Chunks are constructed, where each chunk consists of the polynomial evaluations at `ChunkLength` distinct indices. + +Notice that given any number of chunks $M$ such that $M \times$`ChunkLength` >= `BlobLength`, via [polynomial interpolation](https://en.wikipedia.org/wiki/Polynomial_interpolation) it is possible to reconstruct the original polynomial, and therefore its coefficients which represent the original blob. + +### Validation via KZG + +To address the requirement *Adversarial tolerance for disperser* using RS encoding alone requires fraud proofs: a challenger must download all of the encoded chunks and check that they lie on a polynomial corresponding to the blob commitment. + +To avoid the need for fraud proofs, EigenDA follows the trail blazed by the Ethereum DA sharding roadmap in using [KZG polynomial commitments](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html). + +**Chunk Validation** + +Blobs sent to EigenDA are identified by their KZG commitment (which can be calculated by the disperser and easily validated by the rollup sequencer). When the disperser generates the encoded blob chunks, it also generates a collection of opening proofs which the DA nodes can use to trustlessly verify that their chunks fall on the blob polynomial the correct indices (note: the indices are jointly derived by the disperser and DA nodes from the chain state using the logic in the Assignment module to ensure that the evaluation indices for each node are unique). + +**Blob Size Verification** +KZG commitments also can be used to verify the degree of the original polynomial, which in turn corresponds to the size of the original blob. Having a trustlessly verifiable upper bound on the size of the blob is necessary for DA nodes to verify the correctness of the chunk assignment defined by the assignment module. + +The KZG commitment relies on a structured reference string (SRS) containing a generator point $G$ multiplied by all of the powers of some secret field element $\tau$, up to some maximum power $n$. This means that it is not possible to use this SRS to commit to a polynomial of degree greater than $n$. A consequence of this is that if $p(x)$ is a polynomial of degree greater than $m$, it will not be possible to commit to the polynomial $x^{n-m}p(x)$. A "valid" commitment to the polynomial $x^{n-m}p(x)$ thus constitutes a proof that the polynomial $p(x)$ is of degree less then or equal to $m$. + +In practice, this looks like the following: + +1. If the disperser wishes to claim that the polynomial $p(x)$ is of degree less than or equal to $m$, they must provide along with the commitment $C_1$ to $p$, a commitment $C_2$ to $q(x) = x^{n-m}p(x)$. +2. The verifier then performs the pairing check $e(C_1,[x^{n-m}]_2) = e(C_2,H)$, where $H$ is the G2 generator and $[x^{n-m}]_2$ is the $n-m$'th power of tau. This pairing will only evaluate correctly when $C_2$ was constructed as described above and $deg(p) <= m$. + +Note: The blob length verification here allows for the blob length to be upper-bounded; it cannot be used to prove the exact blob length. + +### Prover Optimizations + +EigenDA makes use of the results of [Fast Amortized Kate Proofs](https://github.com/khovratovich/Kate/blob/master/Kate_amortized.pdf), developed for Ethereum's sharding roadmap, to reduce the computational complexity for proof generation. + +See the [full discussion](./amortized-proving.md) + +### Verifier Optimizations + +Without any optimizations, the KZG verification complexity can lead to a computational bottleneck for the DA nodes. Fortunately, the [Universal Verification Equation](https://ethresear.ch/t/a-universal-verification-equation-for-data-availability-sampling/13240) developed for Danksharding data availability sampling dramatically reduces the complexity. EigenDA has implemented this optimization to eliminate this bottleneck for the DA nodes. diff --git a/docs/eigenda/blob-explorer.md b/docs/eigenda/blob-explorer.md index 40e32bf6..8b883b9d 100644 --- a/docs/eigenda/blob-explorer.md +++ b/docs/eigenda/blob-explorer.md @@ -1,5 +1,5 @@ --- -sidebar_position: 3 +sidebar_position: 4 --- # Blob Explorer diff --git a/docs/eigenda/operator-guides/_category_.json b/docs/eigenda/operator-guides/_category_.json index f8664f58..76ef0144 100644 --- a/docs/eigenda/operator-guides/_category_.json +++ b/docs/eigenda/operator-guides/_category_.json @@ -1,4 +1,4 @@ { "label": "Operator Guides", - "position": 6 + "position": 5 } \ No newline at end of file diff --git a/docs/eigenda/performance-metrics.md b/docs/eigenda/performance-metrics.md index 0aac7103..bc97deb4 100644 --- a/docs/eigenda/performance-metrics.md +++ b/docs/eigenda/performance-metrics.md @@ -1,5 +1,5 @@ --- -sidebar_position: 2 +sidebar_position: 3 --- # Performance Metrics