The mmvdisk administration methodology

The mmvdisk command structures IBM Storage Scale RAID around a well-defined collection of objects: node classes, servers, recovery groups, vdisk sets, and file systems.

Node classes

An mmvdisk node class is a regular IBM Storage Scale node class with the restriction that it can only be altered by the mmvdisk command.

For the paired recovery groups of ESS, the mmvdisk node class contains the pair of servers responsible for a recovery group pair. The mmvdisk command maintains an association between the node class and each of the paired recovery groups.

For the scale-out recovery groups of IBM Spectrum® Scale Erasure Code Edition, the mmvdisk node class contains from 4 to 32 identically equipped servers for a single scale-out recovery group. The mmvdisk command maintains an association between the node class and the scale-out recovery group.

For the shared recovery groups of IBM Elastic Storage® System 3000, the mmvdisk node class contains the two servers that share the single recovery group. The mmvdisk command maintains an association between the node class and the shared recovery group.

There can be no overlap among mmvdisk node classes. A server cannot be in two mmvdisk node classes, and the servers in an mmvdisk node class must either be the primary and backup servers for two related paired recovery groups or must all serve the same scale-out or shared recovery group.

Servers

The servers within an mmvdisk node class are expected to be homogeneous. Each server is of the same type, with the same processor, memory, network, and storage capability. The mmvdisk command explicitly enforces that each server in an mmvdisk node class has the same total real memory and the same server disk topology. The server disk topology is the collection of disks available for IBM Storage Scale RAID on a server. For scale-out recovery group servers, it is the number and type of disks exclusive to the individual server. For shared recovery groups and paired recovery groups, it is the configuration of twin-tailed enclosures shared by the server pair.

A server can belong to only one mmvdisk node class. For scale-out and shared recovery groups, a server belongs only to one recovery group. For paired recovery groups, a server belongs to the two recovery groups of a related recovery group pair.

For all recovery group types, the mmvdisk command uses the mmvdisk node class to maintain the IBM Storage Scale RAID configuration settings for the recovery group servers. The mmvdisk command never uses cluster-wide settings for recovery group servers since different recovery groups can use different settings. If recovery group server settings are ever made independently of mmvdisk, the same practice of using the mmvdisk node class should be followed.

Recovery groups
The mmvdisk command manages the three types of IBM Storage Scale RAID recovery groups:
  1. The paired recovery groups of Elastic Storage Server. The two recovery groups in a paired recovery group node class are also called a recovery group pair.
  2. The scale-out recovery groups of IBM Storage Scale Erasure Code Edition.
  3. The shared recovery groups of IBM Elastic Storage System 3000.

A scale-out recovery group is created by supplying mmvdisk with a recovery group name and an mmvdisk node class containing 4 to 32 servers with the same independent disk configuration.

A recovery group pair is created by supplying mmvdisk with two recovery group names and an mmvdisk node class containing exactly two servers that share the same twin-tailed enclosure configuration.

A shared recovery group is created by supplying mmvdisk with a single recovery group name and an mmvdisk node class containing exactly two servers that share the same twin-tailed enclosure.

The association between a recovery group and an mmvdisk node class is established when the recovery group or recovery group pair is created.

The disk devices from the servers become the recovery group's pdisks. The pdisks are sorted into declustered arrays. A declustered array is a collection of pdisks that all have the same capacity in bytes and the same hardware type (SSD devices, NVMe devices, or HDD devices with the same rotation rate). Each pdisk in a declustered array is expected to have the same indexing and performance characteristics.

The space within a declustered array is allocated to vdisks. Vdisks are declustered RAID logical units that are balanced across the pdisks of a declustered array. There are two types of vdisks: log vdisks, which are used for IBM Storage Scale RAID transaction logging; and user vdisks, which become the vdisk NSDs of IBM Storage Scale file systems.

A recovery group is sub-divided into log groups. A log group is a collection of vdisks that share a common RAID transaction log (a log home vdisk). In the case of a recovery group pair, there is just one log group per recovery group, and in this case a paired recovery group is equivalent to its one and only log group. Each of the two servers has exclusive primary responsibility for the vdisks of one log group. In the case of a scale-out or shared recovery group, the recovery group is sub-divided into equally sized log groups, and each server is responsible for the vdisks of two log groups. Log groups act to equitably sub-divide and balance vdisks among the servers of a recovery group or recovery group pair.

The mmvdisk command considers the log vdisks of a recovery group as an essential component of the recovery group, and the formatting of log vdisks is part of recovery group creation.

Vdisk sets

File system vdisk NSDs are managed collectively as members of vdisk sets. A vdisk set is a collection of identical vdisk NSDs from one or more recovery groups. Each log group of a recovery group included in a vdisk set contributes one member vdisk NSD. This means that when a vdisk set is defined using paired recovery groups, there will be one member vdisk NSD from each recovery group because a paired recovery group is equivalent to a single log group. When a vdisk set is defined using scale-out or shared recovery groups, there will be one member vdisk NSD from each log group of the recovery group, which is equivalent to two vdisk NSDs per server of the scale-out or shared recovery group.

A vdisk set must be managed as a unit: All member vdisk NSDs of a vdisk set must belong to the same file system. Defining a vdisk set across multiple recovery groups helps to ensure that a file system is balanced using identical vdisk NSDs from the recovery groups in the vdisk set.

Once recovery groups are created, the mmvdisk vdiskset command is used to define vdisk sets across the recovery groups. A vdisk set definition is a specification template that permits administrators to preview and evaluate how the vdisk sets are sized within the servers and declustered arrays of the recovery groups.

When the vdisk sets have been defined and sized satisfactorily in the desired recovery groups, mmvdisk is used to create the vdisk sets. This instantiates a vdisk set definition into a real collection of vdisk NSDs.

The created vdisk sets are then used as the units from which mmvdisk builds IBM Storage Scale file systems.

File systems

An mmvdisk file system is an IBM Storage Scale file system where all of the vdisk NSDs in the file system are from vdisk sets.

It is possible for an mmvdisk file system to contain non-vdisk NSDs, but the vdisk NSDs must all be members of vdisk sets, and non-vdisk NSDs and vdisk NSDs cannot reside in the same file system storage pool.

It is also possible for a file system to contain some vdisk NSDs that are from vdisk sets and some vdisk NSDs that are not from vdisk sets. In this case, the file system is not an mmvdisk file system. A cluster could be in the process of being converted to mmvdisk administration, so that some of the vdisk NSDs are from converted recovery groups and some are not. The file system will not be an mmvdisk file system until the last legacy recovery group represented in the file system is converted to mmvdisk administration.