What is Data Availability in Blockchain? A 5-Minute Explainer
Ethereum, for long, has been stumbling upon one fundamental roadblock: the blockchain trilemma. The tradeoff between security, decentralization, and scalability forces developers to consider two at the cost of the third. But, when it comes to resolving scalability issues, an often overlooked issue is data availability.
Data availability in the blockchain ensures that transaction data included in each produced block is accessible to every node in the network. This essentially maintains the integrity and trust of blockchain, allowing everyone to independently verify the validity of transactions – thus the saying "don't trust, verify."
However, guaranteeing data availability presents its own set of problems.
In this blog, we'll look at what data availability is, how it affects user and developer experience, and explore the data availability solutions to handle the scalability issue.
What is data availability?
Data availability refers to guarantees that the data needed to verify a block in the blockchain is actually available to all network participants.
This concept is fundamental to the proper functioning of blockchains, as it allows for full validation of the blockchain's history and current state by any participant, thereby maintaining the decentralized and trustless nature of the network.
Without guaranteed data availability, participants could not independently verify the legitimacy of transactions and blocks, which could lead to issues like fraud or censorship within the network.
Importance of data availability
Ensures trustlessness: Data availability enables network participants to independently verify the blockchain's dataset.
Facilitates decentralization: Equal access to the complete ledger data prevents centralization of control, ensuring no single entity can dominate the network.
Improves security: Since blockchain data is readily available, the network of nodes can detect and rectify inconsistent records.
Moreover, data availability can turn into a scalability bottleneck as the network and data volume grow.
Scaling solutions like rollups resolve this issue by off-loading processing overhead from the mainnet, but then this again underscores the importance of data availability.
Data availability in rollups
Rollup scaling solutions such as Optimistic and ZK-rollups reduce the transaction burden from Ethereum by processing them off-chain and publishing them in batches of multiple transactions on the mainnet.
This increases Ethereum throughput and reduces gas, but it necessitates rollups to guarantee data availability for the correctness of off-chain processing.
Optimistic rollups ensure data availability by posting compressed transaction data – CALLDATA on the Ethereum mainnet. Verifiers use this data to validate or challenge transactions through fraud-proof during the challenge window period.
On the other hand, Zero-knowledge rollups use cryptographic zero-knowledge validity proofs to guarantee the correctness of state transactions. While the proof ensures the validity of state updates, zk-rollups does not publish all transaction data. Thus, it still needs to guarantee the availability of data.
Issues concerning blockchain data availability
Here are some key issues concerning blockchain data availability:
Throughput
Full nodes in monolithic blockchains are required to download and store complete block data, which slows down throughput, limiting scalability as the blockchain grows larger. This results in the slow performance of decentralized applications, and long confirmation times & high gas fees create a poor user experience.
Storage cost
Storage requirements to meet on-chain data availability can be expensive, and relying on off-chain storage introduces additional costs for maintaining data integrity and security. This can create a financial burden for both developers and users.
Trustlessness
Data availability is crucial for maintaining trust in a blockchain system. If block producers or sequencers (for rollups) withhold the transaction data, it can compromise the transparency & reliability of the blockchain.
This can make the system susceptible to withholding attacks and fraudulent transactions and undermine its overall security.
Exploring data availability solutions
Data availability solutions exist in two approaches: blockchain-level or on-chain solutions and off-chain data availability solutions. Let's discuss them below:
Blockchain level solutions
Blockchain-level solutions modify on-chain data storage to address the limitations of full-node storage – decentralization and scalability for higher efficiency.
Data-availability sampling (DAS)
Data availability sampling requires each node to download a randomly selected subset of the block data to confirm its availability or alert other nodes if any data is missing.
Benefit: By reducing the amount of data each node has to download, DAS significantly improves scalability and enables more transactions to be processed within the same time period.
Drawback: The DAS approach does not guarantee all data will always be accessible and can be vulnerable to data withholding attacks. This happens when block producers or sequencers don't publish all the transaction data they're supposed to include in a block.
To protect against it, DAS employs data erasure coding.
Erasure coding: This involves adding redundant pieces of information to the data in a way that if some of the original data is missing, it can be reconstructed using this extra information.
The Ethereum scalability roadmap plans to implement DAS after the EIP-4844 upgrade. This combined approach aims to significantly increase the network's capacity by enabling efficient data verification and reducing storage needs for individual nodes.
B) Off-chain data availability solutions
These reduce storage burned from nodes, storing block data in trusted off-chain storage solutions.
Data availability committees (DACs)
Data availability committees are a set of trusted nodes or entities that store data off-chain and make it available as requested. This way, during transaction processing, block producers and sequencers publish state transitions directly to DAC, which then attests to promise data availability.
Benefits:
- This improves Ethereum scalability and reduces centralization risks, guaranteeing data availability for users.
- DACs typically have a limited set of permissioned entities, which makes them easier to implement and cost-effective.
Starkware's StarkEx is a Validium that relies on permissioned DAC to ensure data availability. It processes transactions and generates a validity proof, sending transaction data to the DAC. The committee members provide attestation, which functions as proof of data availability and is verified alongside the zk-rollup proof on Ethereum.
Some other DAC solutions are Arbitrum Nova and Immutable X.
Data availability protocols
Data availability protocols are permissionless networks that offer off-chain storage solutions without relying on trusted third parties. They are similar to DACs, but rather than relying on permissioned entities, these protocols use Proof-of-stake validator systems.
Anyone can become a validator or data availability manager and store data off-chain by submitting a 'bond' (stake tokens) into smart contracts. If a validator node withholds the data or acts maliciously, the system slashes its bond, maintaining data integrity.
Benefits:
- These protocols offer a more decentralized and secure environment than permissioned DACs.
- Off-chain storage reduces the load from Ethereum mainnet, allowing quicker transaction confirmation and lower gas.
- The larger participation in data availability protocols makes it more reliable, reducing the risk of malicious attacks or failure.
Some notable data availability protocols are NEAR, Celestia, and Polygon Avail.
The future of data availability and its solutions
No doubt, data availability is a crucial component to achieving high blockchain scalability. It ensures accessibility to stored data and enables efficient and secure transaction verification by all participants.
Web3 communities and advocates are actively working to overcome the challenges and recent advancements – data availability layers and danksharding, are paving the way to meet higher scalability without sacrificing data's integrity.
We hope this blog helps you understand the criticality of data availability, its problems, and solutions.
If you have any questions, join 40,000+ other builders in our Discord community or reach out to the team directly for more info on how to get started with building Web3 projects using thirdweb APIs.