Share this post:
Yes. Yes it does!
Quite often, clients ask us, “do channels scale?” Or, “I heard Hyperledger Fabric can only scale to N nodes, is this true?” They claim that they have read or heard information suggesting that Fabric and the channel model of privacy does not scale horizontally.
When I ask where they got this impression, I am frequently told about write-ups of dated explorations/research reports which used older versions of The Linux Foundation’s Hyperledger Fabric, or ran on constrained environments.
The Hyperledger Fabric (Fabric) maintainers readily acknowledge that prior to the release of Hyperledger Fabric v1.1.0, performance was not great. Our objective for v1.0.0 was to get a functioning version of our new architecture available to users. As with all engineering endeavors, premature optimization is a well-recognized anti-pattern. However, since that time, we have invested considerably in performance improvements.
Learn how Hyperledger makes revolutionary cooperation and innovation possible
A tale of two dimensions
It is important to appreciate that there are two distinct dimensions to this topic of scale. One is performance at scale in terms of the number of organizations running peers in a Fabric “network” and the total number of channels. The other is the operational scalability of channels and the associated chaincode as the number of these that need to be managed increases.
I used scare quotes around the term “network” because, in reality, a Fabric channel defines a network of organizations (and their respective peer nodes connected to the channel). Organizations can participate in multiple channels, and those channels can be hosted on multiple distinct ordering services. In reality, the Fabric architecture enables a “network of overlapping networks”. While this is technically possible, it does present some business challenges for those companies that offer managed Fabric componentry as a service, as a function of a blockchain solution.
I mention this last point because it means the architecture of Fabric naturally supports sharding, where the shards are the channels themselves hosted on independent ordering services. This means, as you approach the limits of an ordering service’s ability to adequately handle the transaction load of its current set of channels, you can spin up a new ordering service to host the new channels.
Operational scaling of channels
Let’s start with the operations of channels and chaincode at scale. I’ll stipulate that as of the current release (v1.4.0 LTS), managing Hyperledger Fabric channels and associated chaincode at scale can indeed be a bit tedious. Okay, maybe a lot tedious. To be honest, there isn’t at present a lot if built-in tooling for administering channels and their associated chaincode.
Many of the Fabric-based managed services such as the IBM Blockchain Platform have implemented their own tooling around channel management operations to make the user experience less cumbersome. Absent the vendor-specific tooling, managing a large number of channels using out-of-the-box Hyperledger Fabric can indeed be a pain. However, the administration can be scripted fairly easily using one of the software development kits (SDKs), or the command line interface (CLI), but that remains an exercise for the user, at present.
The good news is that the Fabric development community is working hard to improve things in this regard. The v2.0.0-beta release expected to be available in the next month or so will bring some welcome changes. We’ve been working on a major refactor of the way that the chaincode lifecycle is managed. This will significantly simplify some of the coordination complexity associated with managing channels once they are created, as the associated chaincode needs to be updated.
Performance of Fabric and channels at scale
Now let’s tackle the elephant in the room. Does Hyperledger Fabric performance suffer with a proliferation of channels? If you have a need for many, many channels — on the order of hundreds or more — does Fabric suffer a performance degradation? The short answer is: not that we have observed with the latest versions of Fabric v1.4.0 and v1.4.1.
We’ve been running a number of performance and scale tests on Fabric to see how far we can push performance and scale, as well as to identify any bottlenecks that if removed would improve performance.
See my previous blog post for some best practices to improve the performance of Hyperledger Fabric that we gleaned from this experimentation.
One experiment that we ran last summer sought to establish that we can indeed scale a Fabric “network” to considerable dimensions in terms of organizations, peer nodes and number of channels. In this experiment we set out to deploy a 32-organization network comprised of 26 “banks” and six “auditors”. We would establish a channel for each pair of “bank” organizations and add to it one of the “auditor” organizations. For each organization, whether “bank” or “auditor”, we configured a cluster of four peer nodes, each capable of serving as an endorser for the channels in which that organization participated.
This creates a network of 128 peer nodes and 325 channels (the function is N(N – 1) / 2 to determine the number of pairwise channels amongst a set of organizations). The experiment was quite successful, and it has now become a staple of our performance and scale testing regime for each release.
We have been working on the ability to deploy and run this test on demand, and the most recent results are fairly consistent with the throughput that can be achieved with a pair of peers on a single channel. The results below are running on the IBM Cloud Kubernetes Service (IKS) with worker nodes configured at 16CPU x 16G w/encrypted disk and allocating up to 6 CPUs and 4Gb memory per container. We used LevelDB for this experiment.
2 peers, 1 channel using LevelDB ledger ~1k TPS
128 peers, 325 channels using LevelDB ledger ~13k TPS (note: the choppiness at the end of the run is due to the fact that some test runners delayed execution until resources were freed up).
Considering that each peer is processing 25 channels at a transaction rate approaching the rate of a peer running a single channel (13,000/325 * 25 = 1,000), we conclude that there is no meaningful performance degradation of the peer itself based on having 25 channels versus one.
In the coming weeks, I hope to be able to run additional experiments that push the number of channels even further. There is likely a point where the sheer number of channels on a given peer node becomes a problem. The good news is that you don’t need to have all of an organization’s channels running through all of an organization’s peer nodes. If it starts to get too crowded, you can shard the channels across the cluster(s) of peer nodes belonging to an organization.
Do I really need all these channels?
Now, while we have demonstrated that we can indeed scale the performance of Hyperledger Fabric in the face of lots of channels, we also need to look at why we are creating channels to begin with, and what will be (or won’t be) shared on them. If you are managing content external to the blockchain, and simply recording hashes such that state can be validated, and such that access to that external state can be access-controlled by the blockchain’s smart contracts, you may not need to create a bunch of bilateral channels to preserve privacy.
One option is to use Identity Mixer to mask the identity of the transacting parties, and to submit random noise transactions to mask any patterns. The hashes themselves don’t reveal much, and access to the state can be access controlled in the chaincode.
Private data collections
Additional changes are making their way into Hyperledger Fabric that will reduce the need for channels as the exclusive means of providing privacy of transactions between two or more organizations.
You might consider leveraging private data collections for small amounts of metadata or small transactions — though for large datasets/objects, such as might be the case for a document store, this may not be a viable option as we designed this for small amounts of related metadata about transactions.
The Private Data Collections feature has been available in Fabric since v1.2.0. This simplifies things a bit, because now you can have one channel with many participants yet engage in transactions that limit the exchange of data to only those organizations that have a need to share the information. However, setting up private collections requires some administrative setup before they can be used.
We are working on a new feature we call implicit collections where the client can choose any N channel members to which to distribute the private data without a priori configuration needed.
Thus, we will reduce (but not eliminate) the need for channels as the exclusive means of delivering privacy amongst a set of organizations. However, this feature isn’t likely to be delivered before the end of 2019.
More good news
I have one last bit of news to share this month. As you may know, we are releasing support for use of Raft consensus with the ordering service. This brings the potential for decentralized operations of the ordering service nodes, and it also simplifies the operational footprint of a Hyperledger Fabric network because there are two fewer components to manage: Kafka and Zookeeper.
The good news from a performance perspective is that with Raft consensus, we also eliminate a layer of network interaction that adds to the overall latency of transactions making their way through the ordering service.
Initial observations on performance of the Raft ordering service are quite promising! Additionally, because you can start with a single ordering service node and scale up to add others over time without bringing down the network, it will make transition from small-scale testing to full-blown testing much easier — for one thing, you don’t need to start from scratch and create a whole new network topology.
As an example, we ran the same topology 1 org, two peers, LevelDB on IKS workers as configured above, one with Raft consensus and the other using Kafka. Because we eliminate that extra network hop, we actually reduce the overall latency, and this enables us to push the transaction ingest rate higher while retaining a reasonable latency of confirmation.
The above is running the Kafka based ordering service (but also running a stable build of Hyperledger Fabric v1.4.1).
The above is with the Raft consensus ordering service that will be available in Hyperledger Fabric 1.4.1 real soon now.
As you can see, the latency of the Raft experiment remains under 1 second, while the Kafka-based orderer climbs to over five seconds over the course of the run. Yet, the Raft experiment yielded almost 2x the performance! It is too early to make a sweeping statement about the performance with the new orderer service, but the early results do seem quite promising.
As hinted, we continue to work on various aspects of performance as we continue our cadence of quarterly releases of Hyperledger Fabric. This next release cycle will focus on the Raft consensus mentioned above, but we will also be making an alpha release of v2.0 available next month that will include a state cache that should realize an overall performance improvement in accessing the state database.
Following on that, we will be working on releasing the lock on the state database once the cache has been updated to reduce lock contention and enable even greater throughput. We are learning a lot from members of the community that are focused on the performance of Fabric, and gradually we hope to leverage that learning in subsequent releases this year.
Deploy the IBM Blockchain Platform across multiple environments