Introduction to IBM Storage Scale Erasure Code Edition

IBM Storage Scale Erasure Code Edition provides IBM Storage Scale RAID as software, allowing customers to create IBM Storage Scale clusters that use scale-out storage on any hardware that meets the minimum hardware requirements.

All of the benefits of IBM Storage Scale and IBM Storage Scale RAID can be realized by using your own commodity hardware.

For example, IBM Storage Scale Erasure Code Edition provides:

Reed-Solomon highly fault-tolerant declustered Erasure Coding, protecting against individual drive failures and node failures.
Disk Hospital to identify issues before they become disasters.
End-to-end checksum to identify and correct errors that are introduced by network and/or media.

IBM Storage Scale Erasure Code Edition uses the same software and most of the same concepts that are used in the Elastic Storage Server (ESS). Elastic Storage Server (ESS) is a solution that consists of two I/O (storage) servers and between one and several JBOD disk enclosures, with each storage device (pdisk) attached to both servers. Elastic Storage Server (ESS) has two recovery groups (RGs). Each RG takes half of each enclosure among all enclosures. Under normal conditions, each I/O server supports one of the two RGs. If either I/O server fails, the remaining I/O server takes over and supports both RGs.

ECE vs ESS — Figure 1. IBM Storage Scale Erasure Code Edition architecture

IBM Storage Scale Erasure Code Edition can have one or more recovery groups in a cluster and each storage server belongs to only one RG. All of the storage servers in a recovery group must have a matching configuration, including identical CPU, memory, network, and storage device configurations. The storage devices (pdisks) are directly attached to only one storage server. Each storage server typically serves two log groups and each log group manages one half of the virtual disks (vdisk NSDs) assigned to a server. If a storage server fails, the log groups and vdisk NSDs are distributed to the remaining storage servers. Any storage server failure causes the remaining storage servers to serve at most one more log group.

In both Elastic Storage Server (ESS) and IBM Storage Scale Erasure Code Edition, the placement of data is topology aware by using a failure domain hierarchy of rack, node, enclosure, and storage device (pdisk). The RAID code makes placement decisions to maximize fault tolerance, depending on the RAID level you choose. IBM Storage Scale Erasure Code Edition supports the following erasure codes and replication levels: 16+2P, 16+3P, 8+2p, 8+3p, 4+2p, 4+3p, 3WayReplication, and 4WayReplication.

With IBM Storage Scale Erasure Code Edition, it is possible for either IBM Storage Scale Cluster Export Services with protocol software or customer applications to run directly on the storage servers if sufficient hardware resources are available. Customer applications must run in a constrained environment by using Linux® cgroups or Docker containers. For protocol workloads with high-performance requirements, the Cluster Export Services must run on separate nodes.

In both Elastic Storage Server (ESS) and IBM Storage Scale Erasure Code Edition, the IBM Storage Scale file system, and file system features are independent of the storage configuration. A file system can be composed of NSDs provided by more than one recovery group, and the recovery groups can be from Elastic Storage Server (ESS) or IBM Storage Scale Erasure Code Edition or a combination of both. All of the IBM Storage Scale file system features can be used in a cluster with IBM Storage Scale Erasure Code Edition storage servers, but there are strict guidelines as to where the various components might run.

For an overview of IBM Storage Scale RAID, see the Introducing IBM Spectrum® Scale RAID topic in the IBM Storage Scale RAID: Administration.

Minimum hardware requirements

At a high level, you must have a limited number of storage servers per recovery group, and each server must run Red Hat® Enterprise Linux. The storage configuration must be identical for all storage servers. The supported storage types are SAS-attached HDD or SSD drives that use specified LSI adapters, or enterprise-class NVMe drives. Each storage server must have at least one SSD or NVMe drive, this is used for a fast write cache and user data storage. When you set up IBM Storage Scale Erasure Code Edition on an IBM Z® server, the minimum supported OS version is Red Hat Enterprise Linux version 8.1. Only enterprise-class NVMe drives are the supported storage types in this environment. For more information about hardware requirements, see Minimum hardware requirements and precheck.

Maximum storage nodes in a cluster

There can be up to 256 IBM Storage Scale Erasure Code Edition storage nodes in a IBM Storage Scale cluster. For example, 8 RGs with 32 nodes each or 16 RGs with 16 nodes each, or some other combination that results in no more than 256 total storage nodes.

Network configurations

The network can be either Ethernet or InfiniBand for x86_64 and ARM, and must be at least 25 Gbps bandwidth, with an average latency of 1.0 msec or less between any two storage nodes. It is recommended to have a determined based on network for storage server traffic. In most cases, the overall storage performance is dictated by network bandwidth and latency. Your performance requirements must be carefully considered when you select the network hardware and the network architecture for your IBM Storage Scale Erasure Code Edition cluster. For more information about networking requirements, see Network requirements and precheck.

Administration and maintenance procedures

IBM Storage Scale Erasure Code Edition administration and maintenance procedures are similar to Elastic Storage Server (ESS), but not identical. With IBM Storage Scale Erasure Code Edition, the customer is responsible for managing the storage server hardware and software. For example, the customer is responsible for updating any firmware and the operating system, including security updates, when needed. Most of the IBM Storage Scale RAID maintenance commands are accomplished by using the mmvdisk command. For more information about IBM Storage Scale Erasure Code Edition admin and maintenance procedures, see Administering IBM Storage Scale Erasure Code Edition.

Health monitoring and problem determination

IBM Storage Scale Erasure Code Edition health monitoring and problem determination procedures rely on IBM Storage Scale mmhealth capabilities and IBM Storage Scale RAID troubleshooting guidelines. For more information, see Troubleshooting.