Setting up IBM Spectrum Scale (GPFS) FPO for IBM Db2 Warehouse

If you want to deploy Db2® Warehouse in an MPP environment, you must set up a POSIX-compliant cluster file system. One of the storage technologies that you can use for this file system is IBM Spectrum Scale File Placement Optimizer (FPO), formerly known as IBM® General Parallel File System (GPFS) FPO.

Before you begin

Note: If you are using IBM Spectrum Scale FPO for your local storage, you might, depending on the configuration, experience remote I/O performance degradation during high-availability (HA), scale-in, and scale-out operations.

Set up one or more storage devices that support IBM Spectrum Scale FPO, such as IBM Cloud® block storage. For information about how to set up IBM Cloud block storage, see Setting up IBM Cloud block storage for IBM Db2 Warehouse.

Obtain the IBM Spectrum Scale FPO software and license.

About this task

The following procedure uses an example of a three-node system, and applies to CentOS and RedHat operating systems. The procedure for other operating systems or numbers of nodes will be similar but not identical.

For more information about all of the mm* commands (such as mmcrcluster) that are used in the following procedure, see the IBM Spectrum Scale documentation.

Ensure that any files that you create using the following procedure are in the same working directory.

Procedure

  1. On each node, create an /etc/hosts file that has the host names and IP addresses of all the nodes.
    An example follows:
    
    [root@bluhelix16 ~]# cat /etc/hosts 
    127.0.0.1 localhost.localdomain localhost 
    10.114.36.226 bluhelix16 
    10.114.36.220 bluhelix17 
    10.114.36.227 bluhelix18
  2. Configure passwordless SSH as follows. The nodes in the cluster must be able to communicate with each other without the use of a password for the root user and without the remote shell displaying any extraneous output.
    1. Generate an SSH key for the system by issuing the following command:
      
      ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N "" 
    2. Copy the public key of the SSH key to the ~/.ssh/authorized_keys directory by issuing the following command:
      
      cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
      
    3. Change the permissions of the ~/.ssh/authorized_keys directory by issuing the following command:
      
      chmod 600 ~/.ssh/authorized_keys
    4. Copy the ~/.ssh directory to all nodes in the cluster.
    5. Test the SSH setup to ensure that all nodes can communicate with all other nodes. Test by using short names, fully qualified names, and IP addresses. For example, assume that the environment has three nodes: node1.mydomain.com:10.0.0.1, node2.mydomain.com:10.0.0.2, and node3.mydomain.com:10.0.0.3. Repeat the following test by using the short names (node1, node2, and node 3), the fully qualified names (node1.mydomain.com, node2.mydomain.com, and node3.mydomain.com), and the IP addresses:
      
      #!/bin/bash
      
      # Edit node list
      nodes="node1 node2 node3"
      
      # Test ssh configuration
      for i in $nodes
      do for j in $nodes
       do echo -n "Testing ${i} to ${j}: "
       ssh ${i} "ssh ${j} date"
       done
      done
      Sample output follows:
      
      Testing node1 to node1 Wed Oct 15 10:14:34 CDT 2016
      Testing node1 to node2 Wed Oct 15 10:14:34 CDT 2016
      Testing node1 to node3 Wed Oct 15 10:14:35 CDT 2016
      Testing node2 to node1 Wed Oct 15 10:14:35 CDT 2016
      Testing node2 to node2 Wed Oct 15 10:14:36 CDT 2016
      Testing node2 to node3 Wed Oct 15 10:14:36 CDT 2016
      Testing node3 to node1 Wed Oct 15 10:14:37 CDT 2016
      Testing node3 to node2 Wed Oct 15 10:14:37 CDT 2016
      Testing node3 to node3 Wed Oct 15 10:14:38 CDT 2016
  3. Partition the data across the Network Shared Disks (NSDs).
    For example, if the disk is /dev/xvdc, issue the following commands:
    
    parted /dev/xvdc mklabel gpt 
    parted /dev/xvdc -s -a optimal mkpart primary 0% 100%
  4. On all nodes, install the IBM Spectrum Scale FPO binary file.
    The commands in the following example apply to a CentOS or RedHat operating system; commands for other operating systems might be different:
    
    yum -y install unzip ksh perl libaio.x86_64 net-tools m4 kernel-devel gcc-c++ psmisc.x86_64 kernel-devel.x86_64 
    unzip mpp-gpfs-4.1.1.5-16.02.18.zip 
    rpm -ivh gpfs/*.rpm 
    mv gpfsUpdates/gpfs.hadoop-connector-2.7.0-5.x86_64.rpm gpfsUpdates/gpfs.hadoop-connector-2.7.0-5.x86_64.rpm.backup
    rpm -Uvh gpfsUpdates/*.rpm 
    sed -i 's/)/) Red Hat Enterprise Linux/g' /etc/redhat-release 
    /usr/lpp/mmfs/bin/mmbuildgpl
  5. On the head node only, configure IBM Spectrum Scale FPO by performing the following substeps:
    1. Issue the following command:
      export PATH=$PATH:/usr/lpp/mmfs/bin
    2. Create a nodes file. An example follows. For more information about the format of the file, see the description of the NodeFile option for the -N parameter of the mmcrcluster command.
      
      cat gpfs-fpo-nodefile
      bluhelix16:quorum-manager: 
      bluhelix17:quorum-manager:  
      bluhelix18:quorum-manager: 
      
    3. Create the cluster by issuing the mmcrcluster command with the -N, -p, -s, -r, and -R parameters. An example follows:
      mmcrcluster -N gpfs-fpo-nodefile -p bluhelix16 -s bluhelix17 -C gpfs-fpo-cluster -A -r /usr/bin/ssh -R /usr/bin/scp
    4. Set the license mode for each node by issuing the mmchlicense command. Use a server or client license setting as appropriate. In the following example, all three nodes are servers:
      
      mmchlicense server --accept -N bluhelix16   
      mmchlicense server --accept -N bluhelix17 
      mmchlicense server --accept -N bluhelix18
    5. Start the cluster by issuing the following command:
      mmstartup -a
    6. Issue the following command. Ensure that the output shows that all quorum nodes are active:
      mmgetstate -a -L
      Sample output follows:
      
      root@bluhelix16 ~]# mmgetstate -a -L
      
       Node number  Node name       Quorum  Nodes up  Total nodes  GPFS state  Remarks
      ------------------------------------------------------------------------------------
             1      bluhelix16         1        1          3       active      quorum node
             2      bluhelix17         1        1          3       active      quorum node
             3      bluhelix18         1        1          3       active      quorum node
    7. Create a storage pool file. Examples follow. For more information about the format of the file, see the description of the -F parameter of the mmcrnsd command.
      • For SSD:
        
        cat fpo-poolfile 
        %pool:
        pool=system
        blocksize=1024K
        usage=dataAndMetadata
        layoutMap=cluster
        allowWriteAffinity=yes
        writeAffinityDepth=1
        blockGroupFactor=10 
        %nsd: nsd=bluhelix16_ssd_1 device=/dev/xvdc servers=bluhelix16 failureGroup=1,0,1 pool=system
        %nsd: nsd=bluhelix17_ssd_1 device=/dev/xvdc servers=bluhelix17 failureGroup=2,0,1 pool=system
        %nsd: nsd=bluhelix18_ssd_1 device=/dev/xvdc servers=bluhelix18 failureGroup=3,0,1 pool=system 
        
      • For HDD:
        
        cat fpo-poolfile 
        %pool:
        pool=pool1
        blockSize=1M
        usage=dataOnly
        layoutMap=cluster
        allowWriteAffinity=yes
        writeAffinityDepth=1
        blockGroupFactor=10
        %nsd: nsd=bluhelix16_ssd_1 device=/dev/xvdc servers=bluhelix16 failureGroup=1,0,1 pool=system
        %nsd: nsd=bluhelix17_ssd_1 device=/dev/xvdc servers=bluhelix17 failureGroup=2,0,1 pool=system
        %nsd: nsd=bluhelix18_ssd_1 device=/dev/xvdc servers=bluhelix18 failureGroup=3,0,1 pool=system
    8. Configure the disks for the cluster by issuing the mmcrnsd command. A sample command follows:
      mmcrnsd -F fpo-poolfile
      Sample output follows:
      
      mmcrnsd: Processing disk xvdc 
      mmcrnsd: Processing disk xvdc 
      mmcrnsd: Processing disk xvdc 
      mmcrnsd: Propagating the cluster configuration data to all 
       affected nodes. This is an asynchronous process.
    9. Verify that the disks were added by issuing the following command:
      mmlsnsd -m
      Sample output follows:
      
      [root@bluhelix16 ~]# mmlsnsd -m 
      Disk name        NSD volume ID    Device    Node name  Remarks
      ---------------------------------------------------------------------------------------
      bluhelix16_ssd_1 0A7224E256572545 /dev/xvdc bluhelix16 server node  
      bluhelix17_ssd_1 0A7224DC56572547 /dev/xvdc bluhelix17 server node  
      bluhelix18_ssd_1 0A7224E35657254B /dev/xvdc bluhelix18 server node
    10. Create the cluster file system by issuing the mmcrfs command. An example follows. The parameters in the example are recommended, but ensure that they are appropriate for your setup.
      mmcrfs clusterfs -F fpo-poolfile -j scatter -B 1048576 -L 16M -A yes -i 4096 -m  3 -M 3 -n 32 -r 3 -R 3 -S relatime -E no -T /mnt/clusterfs
      Sample output follows:
      
      The following disks of clusterfs will be formatted on node bluhelix16:    
        bluhelix16_ssd_1: size 30720 MB     
        bluhelix17_ssd_1: size 30720 MB    
        bluhelix18_ssd_1: size 30720 MB 
      Formatting file system ... 
      Disks up to size 275 GB can be added to storage pool system. 
      Creating Inode File   
       94 % complete on Thu Nov 26 09:32:33 2015  
       100 % complete on Thu Nov 26 09:32:34 2015 
      Creating Allocation Maps 
      Creating Log Files 
      Clearing Inode Allocation Map 
      Clearing Block Allocation Map 
      Formatting Allocation Map for storage pool system 
      Completed creation of file system /dev/clusterfs. 
      mmcrfs: Propagating the cluster configuration data to all   
       affected nodes.  This is an asynchronous process.
  6. Mount the cluster file system by issuing the following command. If you plan to load data from the mounted file system, use /mnt/clusterfs/scratch as the mount point. Otherwise, use /mnt/clusterfs.
    mmmount clusterfs mount_point -N all
  7. Verify that the cluster file system is mounted on all nodes by issuing the following commands:
    df -h
    Sample output follows:
    
    Filesystem     Size Used Avail Use% Mounted on
    /dev/clusterfs 90G  4.1G   86G   5% /mnt/clusterfs
    [root@bluhelix16 ~]# mount | grep gpfs
    Sample output follows:
    /dev/clusterfs on /mnt/clusterfs type gpfs (rw,relatime)