Partitioning function matrix in automatic deployment

Each data disk is divided into two parts. One part is used for an ext4 file system to store the map, or reduce intermediate data, while the other part is used as a data disk in the IBM Spectrum Scale file system. Only the data disks can be partitioned. Meta disks cannot be partitioned.

If a node is not selected as NodeManager for Yarn there will not be a map or reduce tasks running on that node. In this case, partitioning the disks of the node is not favorable because the local partition will not be used.

The following table describes the partitioning function matrix:

Table 1. IBM® Spectrum Scale partitioning function matrix
Node manager host list	Specify the standard NSD file	Specify the simple NSD file without the -meta label	Specify the simple NSD file with the -meta label
#1: <node manager host list> == <IBM Spectrum Scale NSD server nodes> The node manager hostlist is equal to IBM Spectrum® Scale NSD server nodes.	No partitioning. Create an NSD directly with the NSD file.	Partition and select the meta disks for the customer according to Disk-partitioning algorithm and Failure Group selection rules.	No partitioning. All disks marked with the -meta label are used for metadata NSD disks. All others are marked as data NSDs.
#2: <node manager host list>><IBM Spectrum Scale NSD server nodes> Some node manager hosts are not in the IBM Spectrum Scale NSD server nodes but all IBM Spectrum Scale NSD server nodes are in the node manager host list.	No partitioning. Create the NSD directly with the specified NSD file.	No partitioning, but select the meta disks for the customer according to Disk-partitioning algorithm and Failure Group selection rules.	No partitioning. All disks marked with the -meta label are used for metadata NSD disks. All others are marked as data NSDs.
<node manager host list><<IBM Spectrum Scale NSD server nodes> Some IBM Spectrum Scale NSD server nodes are not in the node manager host list but all node manager host lists are in the IBM Spectrum Scale NSD server nodes.	No partitioning. Create the NSD directly with the specified NSD file.	No partitioning, but select the meta disks for customer according to Disk-partitioning algorithm and Failure Group selection rules.	No partitioning. All disks marked with the -meta label are used for metadata NSD disks. All others are marked as data NSDs.

For standard NSD files, or simple NSD files with the -meta label, the IBM Spectrum Scale NSD and file system are created directly.

To specify the disks that must be used for metadata, and have data disks partitioned, use the partition_disks_general.sh script to partition the disks first, and specify the partition that is used for GPFS NSD in a simple NSD file.

Send an email to scale@us.ibm.com to request the partition_disks_general.sh script.

For example:

$ cat /var/lib/ambari-server/resources/gpfs_nsd

DISK|compute001.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2
DISK|compute002.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2
DISK|compute003.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2
DISK|compute005.private.dns.zone:/dev/sdb-meta,/dev/sdc2,/dev/sdd2
DISK|compute006.private.dns.zone:/dev/sdb,/dev/sdc2,/dev/sdd2
DISK|compute007.private.dns.zone:/dev/sdb,/dev/sdc2,/dev/sdd2

After deployment is done by this mode, manually update the yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs files to contain the directory list from the disk partitions that are used to map or reduce intermediate data.