In the cloud era, a compute cluster that once took months to build out can now be created and ready to use in minutes. In this blog post, we will discuss all pieces that come together to make this near-instant infrastructure a reality. From there, we will show how this infrastructure and file system fulfills the promise of performance right out of the box.
IBM Spectrum Scale
The express path to a high-performance distributed file system and compute cluster begins with the IBM Spectrum Scale catalog tile. Follow the link, and the IBM Cloud Schematics interface offers a straightforward process for filling out the parameters to configure your cloud-based storage and compute cluster. After you provide all the configuration details, your input is stored as a Schematics workspace. This workspace contains your infrastructure specification, and upon your command, the workspace connects with the Terraform and Ansible code contained in the repository to create your cloud-based infrastructure.
The IBM Cloud VPC infrastructure used by the Spectrum Scale catalog tile can employ storage nodes based on bare metal instances with NVMe devices or use virtual instances with instance storage. For this post, we will be using bare metal instances that offer the following:
8 3.2 TB NVMe storage devices
48 physical cores (96 vCPUs) from Intel Xeon 8260 processors
192 – 1536 GB of memory
The number and configuration of the compute nodes is up to the user with virtual instance profiles:
From 2 to 176 vCPUs
From 2 GB to 2.5 TB of memory
In addition to the storage and compute nodes, the automation provisions and configures a bastion node that helps to secure the cluster’s VPC in several ways:
Serves as an SSH jump host allowing secure command line access to the cluster’s VPC
Isolates the cluster VPC from the internet by closing non-essential ports
Restricts access to the cluster to approved remote IP addresses or CIDR blocks
IBM Spectrum Scale file system
IBM Spectrum Scale is a high-performance clustered file system that provides concurrent access to a shared file system from multiple nodes. It can be used in a wide variety of hardware and software configurations. For our purposes, it is configured as a collection of nodes built from both bare metal servers for storage and virtual instances for compute. Each bare metal instance has direct-attached NVMe storage serving as NSD volumes and a 100 Gbps network interface. We’ll have more to say about this in the performance section.
The tile automation scripts build a cluster that employs simple and effective security practices to get you started:
User-supplied SSH keys
A login (bastion) node jump host
Firewall with only the SSH port open and restricted to your specified CIDR
All nodes in the cluster can only be accessed from within the VPC
From there, it is expected that you employ the rich set of tools supplied by IBM Cloud and Spectrum Scale to implement the level of security that meets your needs.
As discussed earlier, before it is rendered in real hardware, the cluster exists as a specification stored in a Schematics workspace. This workspace can be thought of as a form of infrastructure that incurs no cost or energy while in storage.
Assuming the cluster is already configured, the process of bringing it to life begins with invoking the “apply” command, which executes the pre-existing and well-tested Terraform scripts from the Schematics repository to provision the cloud resources. Whenever possible, the provision steps are carried out in parallel. In the case of our largest example, a 10 storage node and 64 compute node cluster, there can be close to 100 discrete cloud operations in flight at one time. In this way, for one example, the 64 compute nodes are provisioned concurrently and complete in a little over 1 minute, and so it goes with subnets, security rules, a bastion node, storage nodes and so on. Once the hardware is in place, Ansible scripts are kicked off to install and configure the software.
Time required to create a Spectrum Scale cluster
The following timings were measured on varying cluster configurations in real experiments and can be used as a guideline. As always, your results may vary to some degree. Three different cluster sizes were tested, and the times needed to create them were broken down to give an idea of how long various operations take.
Controller Terraform Time
Controller Ansible Time
Scroll to view full table
Table 1: Cluster creation times (hh:mm:ss)
“Schematics time” is the amount of time spent running Terraform scripts in a Schematics container. This time is spent provisioning a login node and a “controller” node to which we transfer the responsibility for finishing the cluster. The reason we make this transition is to allow us to move execution to a node that we own and control. We can also size to speed up the process that is executing Terraform scripts to provision resources and later Ansible scripts to install software and configure the cluster.
In the table above, this time is split into the Controller Terraform and Controller Ansible components. The “Total Time” column is the elapsed time from “apply” to the cluster being ready to get to work. It is interesting to note how the performance varies as we scale up the cluster size. Schematics time is essentially invariant because it is the same amount of work in this phase, regardless of cluster size. The controller Terraform illustrates how successfully we can parallelize the Terraform provisions. In this case, the time needed to do 74 (10 storage + 64 compute) provisions is less than 5% longer than the time needed to do 6. In contrast, the Ansible-based configurations run serially in many cases, so the time needed is proportional to the number of nodes in the cluster.
Bootstrap Destroy Time
Schematics Destroy Time
Total Destroy Time
Scroll to view full table
Table 1: Cluster destroy times (mm:ss)
We also tested the time needed to destroy a cluster, and the results are in Table 2 above. The total time is made up of two separate operations. There are two operations due to the split nature of the Terraform work. Some of it runs on the Schematics container, while the bulk of the work is carried out on the bootstrap instance.
These two operations run sequentially, so the total time is obtained by adding the two operations together. Regardless of cluster size, it takes approximately 10 minutes to free all the resources and return them to the cloud. Just as in resource creation, we take advantage of the ability to run Terraform operations in parallel to keep the total time down.
Spectrum Scale storage resiliency
Out of the box, our cluster offers resiliency that allows for the loss of a storage node and the loss of a storage block.
This level of redundancy requires two settings that are applied at cluster creation time:
A minimum storage cluster consists of the three nodes
A write replication factor of two is set
The above settings can be seen as providing the basic level of resiliency that befits a large, clustered file system. Beyond this, and depending on your needs, Spectrum Scale and IBM Cloud can be customized to provide resiliency and security at very high levels.
Spectrum Scale storage performance
Scroll to view full table
Table 3: Storage performance
Table 3 provides an overview of Scale file system performance for a few key metrics. The testing was performed on a system with the following characteristics:
10 storage nodes
80 NVMe drives
256 TB of raw storage capacity
100 Gbps network in each storage node
A single 107 TB file system provided by Spectrum Scale 5.1.4
Digging into the results in Table 3, it should be evident that these are very good numbers for a clustered file system. The read bandwidth of 112 GiB/sec is essentially all the bandwidth supplied by the 10 100 Gbps network adapters, which means when it comes to read bandwidth, the Scale software and IBM Cloud network infrastructure is leaving nothing on the table. Write bandwidth is also good, operating under the constraints imposed by replication. The 5.4 million read IOPs supplied are also impressive. In short, this is a very high-performance offering out of the box.
It should be noted that all the results listed above were achieved “out of the box.” As with any high-performance computing system, the cluster has benefited from testing and tuning, but it was done over the course of our development and performance testing and the tuning is now applied automatically when a cluster is built from the tile.
The IBM Cloud Spectrum Scale catalog tile has been designed and built to offer you the shortest path possible to get to a high-performance compute and storage cluster. In less than one hour, you can build a compute/storage cluster to your specification with up to a 100 TB distributed file system, as much compute capacity as you desire and tuned to extract maximum performance from the underlying hardware. We invite you to try out our offering and embrace the cloud-based future of high-performance computing today.