Setting up IBM Spectrum Scale (GPFS) FPO for IBM Db2 Warehouse

If you want to deploy Db2® Warehouse in an MPP environment, you must set up a POSIX-compliant cluster file system. One of the storage technologies that you can use for this file system is IBM Spectrum Scale File Placement Optimizer (FPO), formerly known as IBM® General Parallel File System (GPFS) FPO.

Before you begin

Note: If you are using IBM Spectrum Scale FPO for your local storage, you might, depending on the configuration, experience remote I/O performance degradation during high-availability (HA), scale-in, and scale-out operations.

Set up one or more storage devices that support IBM Spectrum Scale FPO, such as IBM Cloud® block storage. For information about how to set up IBM Cloud block storage, see Setting up IBM Cloud block storage for IBM Db2 Warehouse.

Obtain the IBM Spectrum Scale FPO software and license.

About this task

The following procedure uses an example of a three-node system, and applies to CentOS and RedHat operating systems. The procedure for other operating systems or numbers of nodes will be similar but not identical.

For more information about all of the mm* commands (such as mmcrcluster) that are used in the following procedure, see the IBM Spectrum Scale documentation.

Ensure that any files that you create using the following procedure are in the same working directory.

Procedure

On each node, create an /etc/hosts file that has the host names and IP addresses of all the nodes.

An example follows:


[root@bluhelix16 ~]# cat /etc/hosts 
127.0.0.1 localhost.localdomain localhost 
10.114.36.226 bluhelix16 
10.114.36.220 bluhelix17 
10.114.36.227 bluhelix18

Configure passwordless SSH as follows. The nodes in the cluster must be able to communicate with each other without the use of a password for the root user and without the remote shell displaying any extraneous output.
1. Generate an SSH key for the system by issuing the following command:
```
ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N "" 
```
2. Copy the public key of the SSH key to the ~/.ssh/authorized_keys directory by issuing the following command:
```
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
```
3. Change the permissions of the ~/.ssh/authorized_keys directory by issuing the following command:
```
chmod 600 ~/.ssh/authorized_keys
```
4. Copy the ~/.ssh directory to all nodes in the cluster.
5. Test the SSH setup to ensure that all nodes can communicate with all other nodes. Test by using short names, fully qualified names, and IP addresses. For example, assume that the environment has three nodes: node1.mydomain.com:10.0.0.1, node2.mydomain.com:10.0.0.2, and node3.mydomain.com:10.0.0.3. Repeat the following test by using the short names (node1, node2, and node 3), the fully qualified names (node1.mydomain.com, node2.mydomain.com, and node3.mydomain.com), and the IP addresses:
```
#!/bin/bash

# Edit node list
nodes="node1 node2 node3"

# Test ssh configuration
for i in $nodes
do for j in $nodes
 do echo -n "Testing ${i} to ${j}: "
 ssh ${i} "ssh ${j} date"
 done
done
```
  Sample output follows:
```
Testing node1 to node1 Wed Oct 15 10:14:34 CDT 2016
Testing node1 to node2 Wed Oct 15 10:14:34 CDT 2016
Testing node1 to node3 Wed Oct 15 10:14:35 CDT 2016
Testing node2 to node1 Wed Oct 15 10:14:35 CDT 2016
Testing node2 to node2 Wed Oct 15 10:14:36 CDT 2016
Testing node2 to node3 Wed Oct 15 10:14:36 CDT 2016
Testing node3 to node1 Wed Oct 15 10:14:37 CDT 2016
Testing node3 to node2 Wed Oct 15 10:14:37 CDT 2016
Testing node3 to node3 Wed Oct 15 10:14:38 CDT 2016
```
Partition the data across the Network Shared Disks (NSDs).
For example, if the disk is /dev/xvdc, issue the following commands:
```
parted /dev/xvdc mklabel gpt 
parted /dev/xvdc -s -a optimal mkpart primary 0% 100%
```

On all nodes, install the IBM Spectrum Scale FPO binary file.

The commands in the following example apply to a CentOS or RedHat operating system; commands for other operating systems might be different:


yum -y install unzip ksh perl libaio.x86_64 net-tools m4 kernel-devel gcc-c++ psmisc.x86_64 kernel-devel.x86_64 
unzip mpp-gpfs-4.1.1.5-16.02.18.zip 
rpm -ivh gpfs/*.rpm 
mv gpfsUpdates/gpfs.hadoop-connector-2.7.0-5.x86_64.rpm gpfsUpdates/gpfs.hadoop-connector-2.7.0-5.x86_64.rpm.backup
rpm -Uvh gpfsUpdates/*.rpm 
sed -i 's/)/) Red Hat Enterprise Linux/g' /etc/redhat-release 
/usr/lpp/mmfs/bin/mmbuildgpl

On the head node only, configure IBM Spectrum Scale FPO by performing the following substeps:

Issue the following command:
```
export PATH=$PATH:/usr/lpp/mmfs/bin
```
Create a nodes file. An example follows. For more information about the format of the file, see the description of the NodeFile option for the -N parameter of the mmcrcluster command.
```
cat gpfs-fpo-nodefile
bluhelix16:quorum-manager: 
bluhelix17:quorum-manager:  
bluhelix18:quorum-manager: 
```

Create the cluster by issuing the mmcrcluster command with the -N, -p, -s, -r, and -R parameters. An example follows:

mmcrcluster -N gpfs-fpo-nodefile -p bluhelix16 -s bluhelix17 -C gpfs-fpo-cluster -A -r /usr/bin/ssh -R /usr/bin/scp

Set the license mode for each node by issuing the mmchlicense command. Use a server or client license setting as appropriate. In the following example, all three nodes are servers:
```
mmchlicense server --accept -N bluhelix16   
mmchlicense server --accept -N bluhelix17 
mmchlicense server --accept -N bluhelix18
```
Start the cluster by issuing the following command:
```
mmstartup -a
```

Issue the following command. Ensure that the output shows that all quorum nodes are active:

mmgetstate -a -L

Sample output follows:


root@bluhelix16 ~]# mmgetstate -a -L

 Node number  Node name       Quorum  Nodes up  Total nodes  GPFS state  Remarks
------------------------------------------------------------------------------------
       1      bluhelix16         1        1          3       active      quorum node
       2      bluhelix17         1        1          3       active      quorum node
       3      bluhelix18         1        1          3       active      quorum node

Create a storage pool file. Examples follow. For more information about the format of the file, see the description of the -F parameter of the mmcrnsd command.

For SSD:


cat fpo-poolfile 
%pool:
pool=system
blocksize=1024K
usage=dataAndMetadata
layoutMap=cluster
allowWriteAffinity=yes
writeAffinityDepth=1
blockGroupFactor=10 
%nsd: nsd=bluhelix16_ssd_1 device=/dev/xvdc servers=bluhelix16 failureGroup=1,0,1 pool=system
%nsd: nsd=bluhelix17_ssd_1 device=/dev/xvdc servers=bluhelix17 failureGroup=2,0,1 pool=system
%nsd: nsd=bluhelix18_ssd_1 device=/dev/xvdc servers=bluhelix18 failureGroup=3,0,1 pool=system

For HDD:


cat fpo-poolfile 
%pool:
pool=pool1
blockSize=1M
usage=dataOnly
layoutMap=cluster
allowWriteAffinity=yes
writeAffinityDepth=1
blockGroupFactor=10
%nsd: nsd=bluhelix16_ssd_1 device=/dev/xvdc servers=bluhelix16 failureGroup=1,0,1 pool=system
%nsd: nsd=bluhelix17_ssd_1 device=/dev/xvdc servers=bluhelix17 failureGroup=2,0,1 pool=system
%nsd: nsd=bluhelix18_ssd_1 device=/dev/xvdc servers=bluhelix18 failureGroup=3,0,1 pool=system

Configure the disks for the cluster by issuing the mmcrnsd command. A sample command follows:

mmcrnsd -F fpo-poolfile

Sample output follows:


mmcrnsd: Processing disk xvdc 
mmcrnsd: Processing disk xvdc 
mmcrnsd: Processing disk xvdc 
mmcrnsd: Propagating the cluster configuration data to all 
 affected nodes. This is an asynchronous process.

Verify that the disks were added by issuing the following command:

mmlsnsd -m

Sample output follows:


[root@bluhelix16 ~]# mmlsnsd -m 
Disk name        NSD volume ID    Device    Node name  Remarks
---------------------------------------------------------------------------------------
bluhelix16_ssd_1 0A7224E256572545 /dev/xvdc bluhelix16 server node  
bluhelix17_ssd_1 0A7224DC56572547 /dev/xvdc bluhelix17 server node  
bluhelix18_ssd_1 0A7224E35657254B /dev/xvdc bluhelix18 server node

Create the cluster file system by issuing the mmcrfs command. An example follows. The parameters in the example are recommended, but ensure that they are appropriate for your setup.

mmcrfs clusterfs -F fpo-poolfile -j scatter -B 1048576 -L 16M -A yes -i 4096 -m  3 -M 3 -n 32 -r 3 -R 3 -S relatime -E no -T /mnt/clusterfs

Sample output follows:


The following disks of clusterfs will be formatted on node bluhelix16:    
  bluhelix16_ssd_1: size 30720 MB     
  bluhelix17_ssd_1: size 30720 MB    
  bluhelix18_ssd_1: size 30720 MB 
Formatting file system ... 
Disks up to size 275 GB can be added to storage pool system. 
Creating Inode File   
 94 % complete on Thu Nov 26 09:32:33 2015  
 100 % complete on Thu Nov 26 09:32:34 2015 
Creating Allocation Maps 
Creating Log Files 
Clearing Inode Allocation Map 
Clearing Block Allocation Map 
Formatting Allocation Map for storage pool system 
Completed creation of file system /dev/clusterfs. 
mmcrfs: Propagating the cluster configuration data to all   
 affected nodes.  This is an asynchronous process.

Mount the cluster file system by issuing the following command. If you plan to load data from the mounted file system, use /mnt/clusterfs/scratch as the mount point. Otherwise, use /mnt/clusterfs.
```
mmmount clusterfs mount_point -N all
```

Verify that the cluster file system is mounted on all nodes by issuing the following commands:

df -h

Sample output follows:


Filesystem     Size Used Avail Use% Mounted on
/dev/clusterfs 90G  4.1G   86G   5% /mnt/clusterfs

[root@bluhelix16 ~]# mount | grep gpfs

Sample output follows:

/dev/clusterfs on /mnt/clusterfs type gpfs (rw,relatime)