Rob Behnke
December 28th, 2023
Blockchain technology is designed to create a decentralized digital ledger where no participant is required to trust any other participant. For this to be possible, each node in the blockchain network needs to have the ability to independently validate the correctness of each new block before adding it to its copy of the digital ledger. To do so, they check each transaction in the block to ensure that the sender has the required balance in their account, the transaction is not double-spent, etc.
Data availability refers to the availability of transaction data to the nodes in the blockchain network. As blockchain technology evolves and new solutions are built on top of Layer-1 blockchains, the problem of ensuring data availability becomes more complex.
Traditional, Layer-1 blockchains have built-in support for data availability. When a block header is produced, it includes a Merkle Tree or similar construct that securely summarizes the transaction data contained within that block. The actual transaction data is contained in the body of the block, and the two chunks of data are distributed together. With access to both the block header and the transaction data, any node in the network can check the validity of every transaction and of the block as a whole.
Under this model, full nodes store complete copies of the transaction data included in each block, which is a significant challenge. Light nodes, which only download block headers, could query the full nodes for transaction data on any blocks that they were interested in.
The data availability challenge emerges as new layers are built on top of these blockchains. Most traditional, Layer-1 blockchains have limits on block size, which constrain the rate at which they can process transactions. If demand for block space exceeds supply, overflow transactions wait in mempools until space becomes available for them.
Layer-2 solutions such as rollups address this issue by moving transactions off-chain. A rollup will bundle a set of off-chain transactions together and only publish limited data — including the resulting state update of executing the transactions — to the blockchain. Since this bundled data is smaller than the full set of transactions included in the bundle, these Layer-2s/rollups improve blockchain throughput.
However, they also introduce challenges around data availability. A blockchain node needs to be able to verify that the off-chain transactions contained in a bundle are valid before it performs the state update triggered by the bundle. Typically, on Ethereum, this is accomplished by recording off-chain transaction data as calldata, which has a lower gas price than recording the original transactions on-chain.
Traditionally, blockchains have been fairly monolithic systems. For example, the node that produces a new block in the blockchain is responsible for selecting the transactions, building a valid block, and distributing that block to the rest of the blockchain network. Additionally, full nodes are responsible for maintaining their own copies of the distributed ledger and may respond to requests for information from other parties such as light nodes.
Blockchain technology is increasingly moving toward a more modular future where the various functions within the blockchain ecosystem are broken up into separate layers.
Some of the key layers include:
Execution: At the execution layer, a transaction is executed, and the resulting state change is applied. When dealing with rollups, this transaction involves executing the state update of the bundle as a whole.
Settlement: The settlement layer is where transaction data is organized, recorded, and finalized. This helps to ensure the immutability of the blockchain’s distributed ledger.
Consensus: The consensus layer performs the function of consensus algorithms, which involves achieving agreement across the blockchain network about the current state of the historical ledger, including the transactions included in blocks and their ordering.
Data Availability: In a modular blockchain architecture, data availability is broken out into its own layer. This ensures that nodes in the network can access transaction data without being required to store it themselves.
Historical State: On Ethereum, proposals such as Proto-Danksharding are intended to prune on-chain transaction data after a particular window. A historical state layer is a third-party solution for maintaining this pruned transaction data after the window has expired.
As blockchains move toward a more modular architecture, off-chain data availability solutions have begun to emerge. These solutions help to ensure that the transaction data for off-chain transactions performed on rollups is still available to nodes in the network and other interested parties.
In general, the main offerings for data availability can be classified into two groups, including:
General: General data availability protocols such as Celestia and Polygon Avail offer decentralized data availability solutions for blockchains. In some cases, these are Layer-1 blockchains whose sole purpose is to ensure data availability for other blockchain solutions. Since these aren’t solution-specific any Layer-1 or Layer-2 could use these solutions for data availability.
Rollup-Provided: Rollup-provided data availability solutions — such as StarkEx DAC, zkPorter, and Arbitrum Nova — are specific to a particular rollup. In these cases, users of the protocol need to rely on the rollup provider to keep the data and make it accessible to be able to withdraw their funds from the rollup.
Data availability is of critical importance for blockchain technology. Whether a transaction is performed on-chain on a Layer-1 or off-chain on a Layer-2 rollup, nodes in the network need to be able to verify the transaction validity before recording it on their copies of the digital ledger. To do so, they need access to either the original transaction data or a zero-knowledge proof (ZKP) that demonstrates its validity.
As blockchain solutions grow more modularized and new solutions are built on top of the blockchain, the data availability layer has emerged to provide access to transaction data. Solutions such as Celestia or Polygon Avail offer solution-agnostic data availability that can be used by a range of Layer-2 solutions, unlike the platform-specific data availability support offered by some rollup solutions.