Rob Behnke
November 3rd, 2022
One of the essential attributes of blockchain technology is the dispersion of data among distributed and transparent ledgers instead of centralized, permissioned databases characteristic of Web2 architectures. By disseminating transactional records globally, blockchains have changed how people think about data ownership, access, and storage. But this design is not without limitations. When data is duplicated across nodes, it creates a storage headache, which worsens as networks grow. This, in turn, leads to problems with scalability, performance, and availability.
The issue of storage is one of the most commonly discussed challenges facing blockchains today. All blockchain transactions are recorded and preserved on the network’s ledger. As more transactions are executed on the network, more data is created, necessitating an increase in storage capacity. Moreover, blockchains are immutable, meaning that storage requirements constantly grow because nothing is ever deleted from the ledger.
In this article, we’ll examine blockchain’s storage constraints and some potential solutions to the problem.
Blockchain data is hosted on globally distributed machines referred to as nodes. Nodes essentially run software to validate and store information about the network’s state. There are various types of nodes serving different functions. Some may retain a full copy of the ledger, while others store only the most recent blocks. Although this architecture may vary from one network to another, a full node typically stores the entire network state, which is a complete history of transactions executed on the blockchain. Running a network node requires meeting some minimum hardware requirements. In the case of Bitcoin, among other requirements, a device must have at least 500 GB of free storage space with a minimum read/write speed of 100 MB/s to run a node.
As Ethereum co-founder Vitalik Buterin argues, storage limitation imposes a severe constraint on blockchain scalability. In an ideal scenario, considerably more users on blockchain networks would run their own nodes, but this requires significant hardware and bandwidth resources (a minimum of 1TB of SSD storage is needed to run Eth 2.0 full nodes) that are prohibitively high for the average user. A quick peek at Etherscan shows an average of fewer than 10,000 nodes running on the Ethereum network over the past 30 days. This has raised questions about computational limits for blockchains and just how decentralized networks might be in the future.
With growing hardware requirements comes the need for specialized projects running blockchain nodes as a service. Infura and Alchemy are two leading projects maintaining nodes for Web3 protocols and developers. But these services have raised concerns as they centralize blockchain data in the hands of specialized service providers, creating a single point of failure (SPOF) and privacy risks.
Several solutions have been developed to tackle the blockchain storage problem, mainly:
Blockchains are designed to be fault-tolerant systems. This means that they remain highly available even in the absence of some network participants. However, serious limitations on on-chain storage could significantly impact network performance. As transaction data grows, so too does necessary storage needs. Achieving decentralization amidst this ever-growing demand requires a highly distributed infrastructure that is not beyond the affordability of users. By lowering hardware requirements, blockchains achieve greater security and decentralization.