phive2015 - stock.adobe.com

6 steps to how blockchain storage works

See how blockchain technology has the potential to provide a secure and reliable enterprise data storage environment with our step-by-step guide to blockchain data storage.

Robert Sheldon

By

Robert Sheldon

Published: 10 Jan 2019

Storing data in large, centralized data centers comes with performance, availability and scalability issues, as well as high capital or operational expenses. Centralized data is also an open invitation to sophisticated cyberattacks. For these reasons, companies are looking for ways to decentralize data storage. Blockchain storage is one way to do that.

Blockchain storage is still a relatively young technology, but its popularity is growing. Potential enterprise use cases have started to emerge in an effort to increase data storage security and reliability. Understanding how this technology works is a critical first step to determining if it's the right approach for your organization.

The blockchain storage process

Blockchain is a distributed ledger technology for recording transactions between two or more parties. Until recently, the technology had been used primarily to support cryptocurrencies, such as bitcoin, but it's now gaining ground in other areas.

The blockchain ledger serves as a decentralized database that maintains details about each transaction. The transactions are added to the ledger in chronological order and stored as a series of blocks. Each block references the preceding block to form an interconnected chain.

Because of its distributed nature, blockchain is being touted as a natural fit for P2P, decentralized storage.

The ledger is distributed across multiple nodes, with each node maintaining a complete copy. Blockchain automatically synchronizes and validates the transactions across all nodes. The ledger is transparent to and verifiable by all participating members, eliminating the need for a central authority or third-party verification service.

Because of its distributed nature, blockchain is being touted as a natural fit for peer-to-peer (P2P), decentralized storage. In this scenario, blockchain provides the structure necessary to create a logical storage pool of geographically dispersed storage resources that serve as the blockchain nodes.

The following figure provides an overview of how blockchain storage works.

How blockchain storage works — A step-by-step look at how blockchain storage works

A blockchain-based storage system prepares the data for storage and then distributes it across a decentralized infrastructure, a process that can be broken into the six steps that follow:

Create data shards. The storage system breaks the data into smaller segments, a process called sharding. Sharding involves breaking the data into manageable chunks that can be distributed across multiple nodes. The exact approach to sharding depends on the type of data and the application doing the sharding. Sharding a relational database is different from sharding a NoSQL database or sharding files on a file share.
Encrypt each shard. The storage system then encrypts each data shard on the local system. The content owner has complete control over this process. The goal is to ensure that no one other than the content owner can view or access the data in a shard, wherever the data is located and whether that data is at rest or in motion.
Generate a hash for each shard. The blockchain storage system generates a unique hash -- an encrypted output string of a fixed length -- based on the shard's data or encryption keys. The hash is added to both the ledger and shard metadata to link transactions to the stored shards. The exact approach to generating hashes varies from one system to the next.
Replicate each shard. The storage system replicates each shard so there are enough redundant copies to ensure availability and performance and protect against degradation and data loss. The content owner chooses how many copies to make of each shard and where those shards are located. As part of this process, the content owner should establish a threshold for the minimum number of copies to maintain to ensure against data loss.
Distribute the replicated shards. A P2P network distributes the replicated shards to geographically dispersed storage nodes, either regionally or globally. Multiple organizations or individuals -- sometimes referred to as farmers -- own the storage nodes, leasing extra storage space in exchange for some type of compensation, usually cryptocurrency. No one entity owns all the storage resources or controls the storage infrastructure. Only content owners have full access to all their data, no matter where those nodes are located.
Record transactions to the ledger. The storage system records all transactions in the blockchain ledger and syncs that information across all nodes. The ledger stores details relevant to the transaction, such as the shard location, shard hash and leasing costs. Because the ledger is based on blockchain technology, it's transparent, verifiable, traceable and tamper-proof.

Although step six is listed last, blockchain integration is an ongoing process, with the exact approach dependent on the storage system. For example, it might initially record the transaction in the blockchain ledger when the storage process first begins. Then, it would update the transaction with information, such as the unique hash or node-specific details, as they become available. Then, after the transaction has been verified by the participating nodes, the system marks the transaction as final within the ledger and locks it to prevent changes.

The six steps described here are meant as a way to conceptualize the blockchain storage process. The exact approach will depend on how the specific storage system is implemented for a given use case and how that data storage is managed.

Dig Deeper on Storage architecture and strategy

Disaster Recovery

When is a change to multi-cloud the right resilience move?
A multi-cloud strategy has numerous benefits, including enhanced resilience. However, it can also bring in several technical and ...
New SIOS console enables high availability visualization
IT generalists on Linux systems can avoid the complexity of HA management for mission-critical apps or databases with a new ...
4 disaster recovery plan best practices for any business
Disaster recovery plans are unique, built around an organization's size, type and industry. However, there are some key best ...

Develop a backup KPI to improve performance
Key performance indicators help ensure that IT teams meet the needs of the business. A backup KPI demonstrates effective data ...
Commvault adds Cleanroom Recovery for ransomware attacks
A new Cleanroom Recovery service enables customers to spin up data center infrastructure within Commvault Cloud for continued ...
Backup vendors embrace GenAI, but features remain immature
Data backup and disaster recovery vendors are keeping up with the GenAI hype by quickly releasing new features -- but the use ...

How to maintain data center power systems
Regular maintenance of data center power systems is necessary to have properly functioning equipment and backup power options. ...
How to calculate data center cooling requirements
Data center cooling requirements are affected by several factors, including the equipment's heat output, floor area, facility ...
Lenovo, AMD broaden AI options for customers
Lenovo is expanding its partnership with AMD to bring more options for servers and HCI devices aimed at AI. It also launched an ...

Sustainability
and ESG

AI can be sustainability enabler, but cost is steep
In this Q&A, Kumar Parakala of GHD Digital explains some of the benefits of using GenAI in sustainability initiatives, as well as...
Green IT audit: What it is and how to prepare
A green IT audit uses standards to help companies understand the ways an organization's tech practices affect the environment. ...
Businesses need to prepare for SEC climate rules, EU's CSRD
While the SEC's new climate rules and the EU's CSRD are both facing delays, businesses still need to identify methods for ...

Close