Storage for blockchain and modern distributed database processing
IBM Blockchain is transforming enterprise capabilities across industries, fostering new insights and decisions: from trust and transparency in our food using IBM Food Trust™, to digitization and transparency of trade using TradeLens, to transforming digital identity into trusted identity using Identity, to redefining access to money for people and businesses everywhere using World Wire. The IBM Blockchain Platform and IBM Blockchain Solutions are rapidly innovating to make commercial operations more efficient and secure.
Pioneered by cryptocurrencies, blockchain is a distributed and shared ledger that enables the decentralized processing and storage of transactions. Participating firms join a network (or consortium) and interact with the network via a node (also known as a peer). Each node holds a subset, or its view, of the shared ledger, and each transaction has an identifying cryptographic signature that enables a secure peer-to-peer network and fabric and meets data sovereignty requirements.
Enterprises are increasingly managing data for blockchain solutions in both the public and private cloud, and IBM Storage enables flexible data management capabilities across hybrid, multicloud deployment models. Data for a blockchain solution can either be stored on-chain as part of the core ledger managed by the blockchain protocol, or off-chain, using more traditional data stores.
There are several reasons why a blockchain solution may store data off-chain, and this is one of the major considerations that should be decided when architecting a blockchain solution.
- Off-chain data stores can be used to store large documents of application artifacts when the only shared value of the application is the evidence of the artifact state at a point in time. Many blockchain solutions are allowing businesses to digitize hard-copy forms with the blockchain supporting the evidence and digital signature of the form.For example, a retail blockchain network establishes a blockchain for clients to purchase its goods across a consortium of retailers. When a customer purchases a widget from company A, it states the aspects of the product and claims that it will deliver on a specific date. The customer registers a photograph of the widget when it arrives as proof of delivery of the order. The purchase and delivery agreement are transaction data captured on-chain, but the photograph of the delivery person and the product in the customer’s possession is stored in an off-chain content management system with on-chain evidence. The ledger includes cryptographic hashes identifying the corresponding data residing on the off-chain datastore.
- Another common use case of off-chain storage is to support a cache of the most recent values of the state of on-chain data, or to leverage fit-for-purpose technology, such as advanced search and analytics to guide the blockchain application’s interaction with the blockchain network.
- Sensitive data can be stored off-chain since, by definition, on-chain data cannot be tampered with and cannot be deleted. Enlisting cryptographic signatures stored on-chain permit the deletion of off-chain data while providing the advantages of blockchain trust and transparency. Read this article for more details on privacy considerations and techniques for managing privacy.
Most blockchain peers utilize local databases to manage ledger data. The Linux Foundation’s Hyperledger Fabric has a pluggable architecture and currently supports both CouchDB ad LevelDB for the StateDB. Additionally, Hyperledger Fabric has built-in support for managing off-chain transactional data within the protocol, called private data collections.
To date, enterprises have deployed nodes (peers) and their supporting data primarily in the public cloud. You can learn more about this from IBM Blockchain Services. Increasingly, firms are deploying peers and managing supporting data on-premises, as part of their blockchain service and hybrid cloud deployment models.
IBM Storage underpins on-premises distributed peer on-chain and off-chain data, as well as public cloud peers with the IBM Blockchain as a Service. IBM Storage Solutions for IBM Blockchain supports unstructured off-chain data with the IBM Spectrum Scale high-performance scaleout filesystem, structured off-chain and on-chain data with the NVMe-accelerated FlashSystem 9100, and support with IBM Cloud Private or bare-metal deployments with IBM ZLinux. IBM also enables storage and data protection with IBM Cloud Object Storage, backup-and-recovery via snapshot or continuous data synchronization with native-storage snapshots and with IBM Spectrum Protect Plus.
IBM Storage Solutions for IBM Blockchain includes:
- IBM Spectrum Scale high-performance scaleout filesystem for on-chain ledger data
- NVMe-accelerated protection of unstructured off-chain data with FlashSystem 9100
- Archive with IBM Cloud Object Storage
- Backup-and-recovery via snapshot or continuous data synchronization with storage product native functions and with IBM Spectrum Protect
- Support with IBM Cloud Private or bare-metal deployments with IBM ZLinux.
Note that even though each peer has a copy of the ledger, it is also highly recommended to back up your ledger on secure storage and have the ability to restore it safely and quickly.
IBM Storage Best Practices for Blockchain provides planning and deployment guidance for performance, capacity planning and data protection with the following use cases:
Distributed peer: As blockchain solutions and networks are maturing, some consortiums are starting to support the deployment of peers anywhere. The diversity can help keep data unaltered/preserved and can also increase the potential read attack vector brought in by unwanted intruders. Each peer must be deployed on a secure environment to maintain the overall security of the system. With the notion that the blockchain is only as good as its diversity, there lies the need for a highly secured environment regardless of where or how that peer is hosted within the blockchain network.
Off-chain data growth within the on-premises distributed peer: Builds on the distributed peer use case and focuses on the data management of your off-chain storage. While clients don’t need to store the data on-chain, the evidence of that data needs to be present on-chain. Caching a copy of the data locally makes the process more efficient by offloading the processing from the blockchain peer and utilizing native capabilities of a fit-for-purpose off-chain store. Whether the data is residing on your local storage, or, as you move that data between hot and cold storage, you need to ensure that no one has tampered with it. Thus, the need for solutions that greatly facilitate or automatically perform this data synchronization in a resilient and secured way.
Off-chain extension: Addresses firms that already run a peer on the cloud or on-premise and need more off-chain storage. The client base expands as other nodes, and their clients, need to access the off-chain data storage. One client may choose to setup new off-chain data store while another client may choose to leverage their existing. In either cases, performance, security and uniformity of off-chain storage as it connects to the blockchain network are the focus. While the underlying architecture may differ, the off-chain datastore must deliver its part to the consortium’s service-level agreement. In other words, extended the “decentralized trust” concept beyond the Hyperledger and into your off-chain data store.
Storage technologies such as NVMe reduce processor cycle requirements for storage — increasing the cycles available for cryptosecurity-related calculations. Moreover, regardless of the type of storage used for your off-chain data store, IBM Spectrum Virtualize can transform the various off-chain data stores into one uniform storage environment with data management capabilities.
Organizations and blockchain consortia are adopting containers more broadly and expanding data management and availability requirements. Storage solutions for blockchain need to provide secure and automated workflows that allow streamlined creation and management of data copies throughout the data lifecycle.
Learn more about IBM’s blockchain leadership and more than 500 client engagements and check out the IBM Storage for Blockchain POV.