Installing the ESS software

This topic includes information about installing and configuring the ESS software.

This topic includes the installation and configuration procedure for an ESS 4.0 system with one or more building blocks. To complete this procedure, you need to have a working knowledge of Power Systems™ servers, IBM Spectrum Scale™, and xCAT.

For information about known issues, mitigation, and workarounds, see ESS 4.0.0 issues. Depending on which fix level you are installing, these might or might not apply to you.

For information about upgrading to ESS 4.0, see Upgrading the Elastic Storage Server.

Networking requirements

The following networks are required:

Service network
This network connects the flexible service processor (FSP) on the management server and I/O server nodes with the HMC, as shown in yellow in Figure 1. The HMC runs the Dynamic Host Configuration Protocol (DHCP) server on this network. If the HMC is not included in the solution order, a customer-supplied HMC is used.
Management and provisioning network
This network connects the management server to the I/O server nodes and HMCs, as shown as blue in Figure 1. The management server runs DHCP on the management and provisioning network. If a management server is not included in the solution order, a customer-supplied management server is used.
Clustering network
This high-speed network is used for clustering and client node access. It can be a 10 Gigabit Ethernet (GbE), 40 GbE, or InfiniBand network. It might not be included in the solution order.
External and campus management network
This public network is used for external and campus management of the management server, the HMC, or both.

The management and provisioning network and the service network must run as two non-overlapping networks implemented as two separate physical networks or two separate virtual local-area networks (VLANs).

The HMC, the management server, and the switches (1 GbE switches and high-speed switches) might not be included in a solution order in which an existing or customer-supplied HMC or management server is used. Perform any advance planning tasks that might be needed to access and use these solution components.

Figure 1 is a high-level logical view of the management and provisioning network and the service network for an ESS building block.

Figure 1. The management and provisioning network and the service network: a logical view

A logical view of the management and provisioning network and the service network

Installing the ESS 4.0 software

Preparing for the installation

The software requirements for ESS 4.0 installation and configuration follow.

Obtain the current ESS 4.0 installation code from the Fix Central website.
To download from Fix Central, you must have entitlement for the given installation package. Check with your IBM® representative if you have questions.
Obtain a Red Hat Enterprise Linux 7.1 ISO image (RHEL 7.1 Binary DVD) file or DVD for 64-bit IBM Power Systems architecture, for example:
rhel-server-7.1-ppc64-dvd.iso

For more information, see the Red Hat Enterprise Linux website.

Perform the following tasks and gather all required information before starting the installation process. Table 1 includes information about components that must be set up before you start installing the ESS 4.0 software.

For tips about how to name nodes, see Node name considerations.

Table 1. Pre-installation tasks
ESS component	Description	Required actions	System settings
1. Service network	This private network connects the HMC with the management server's FSP and the I/O server nodes. The service network must not be seen by the OS running on the node being managed (that is, the management server or the I/O server node). The HMC uses this network to discover the management server and the I/O server nodes and perform such hardware management tasks as creating and managing logical partitions, allocating resources, controlling power, and rebooting.	Perform any advance planning tasks that might be needed to access and use the HMC if it is not part of the solution order and a customer-supplied HMC will be used. Set up this network if it has not been set up already.	Set the HMC to be the DHCP server for the service network.
2. Management and provisioning network	This network connects the management server node with the HMC and the I/O server nodes. It typically runs over 1Gb. This network is visible to the OS that is running on the nodes. The management server uses this network to communicate with the HMC and to discover the I/O server nodes. The management server will be the DHCP server on this network. There cannot be any other DHCP server on this network. This network is also used to provision the node and therefore deploy and install the OS on the I/O server nodes.	Perform any advance planning tasks that might be needed to access and use the management server if it is not part of the solution order and a customer-supplied management server will be used. Set up this network if it has not been set up already.
3. Clustering network	This network is for high-performance data access. In most cases, this network is also part of the clustering network. It is typically composed of 10GbE, 40GbE, or InfiniBand networking components.	Set up this network if it has not been set up already.
4. Management network domain	The management server uses this domain for the proper resolution of hostnames.	Set the domain name using lowercase characters. Do not use any uppercase characters.	Example: `gpfs.net`
5. HMC node (IP address and hostname)	The IP address of the HMC node on the management network has a console name, which is the hostname and a domain name. This IP address must be configured and the link to the network interface must be up. The management server must be able to reach the HMC using this address.	Set the fully-qualified domain name (FQDN) and the hostname using lowercase characters. Do not use any uppercase characters. Do not use a suffix of `-enx`, where x is any character. Do not use an _ (underscore) in the hostname.	Example: IP address: `192.168.45.9` Hostname: `hmc1` FQDN: `hmc1.gpfs.net`
6. Management server node (IP address)	The IP address of the management server node has an FQDN and a hostname. This IP address must be configured and the link to the network interface must be up. The management network must be reachable from this IP address.	Set the FQDN and hostname using lowercase characters. Do not use any uppercase characters. Do not use a suffix of `-enx`, where x is any character. Do not use an _ (underscore) in the hostname.	Example: IP address: `192.168.45.10` Hostname: `ems1` FQDN: `ems1.gpfs.net`
7. I/O server nodes (IP addresses)	The IP addresses of the I/O server nodes have FQDNs and hostnames. These addresses are assigned to the I/O server nodes during node deployment. The I/O server nodes must be able to reach the management network using this address.	Set the FQDN and hostname using lowercase characters. These names must match the name of the partition created for these nodes using the HMC. Do not use any uppercase characters. Do not use a suffix of `-enx`, where x is any character. Do not use an _ (underscore) in the host name.	Example: I/O server 1: IP address: `192.168.45.11` Hostname: `gssio1` FQDN: `gssio1.gpfs.net` I/O server 2: IP address: `192.168.45.12` Hostname: `gssio2` FQDN: `gssio2.gpfs.net`
8. Management server node (management network interface)	The management network interface of the management server node must have the IP address that you set in item 6 assigned to it. This interface must have only one IP address assigned.	To obtain this address, run: `ip addr`	Example: `enP7p128s0f0`
9. HMC (hscroot password)		Set the password for the hscroot user ID.	Example: `abc123` This is the default password.
10. I/O servers (user IDs and passwords)	The user IDs and passwords of the I/O servers are assigned during deployment.		Example: User ID: `root` Password: `cluster` (this is the default password)
11. Clustering network (hostname prefix or suffix)	This high-speed network is implemented on a 10Gb Ethernet, 40Gb Ethernet or InfiniBand network.	Set a hostname for this network. It is customary to use hostnames for the high-speed network that use the prefix and suffix of the actual hostname. Do not use a suffix of `-enx`, where x is any character.	Examples: Suffixes: `-bond0`, `-ib`, `-10G`, `-40G` Hostnames with a suffix: `gssio1-ib`, `gssio2-ib`
12. High-speed cluster network (IP address)	The IP addresses of the management server nodes and I/O server nodes on the high-speed cluster network have FQDNs and hostnames. In the example, `172.10.0.11` is the IP address that the GPFS™ daemon uses for clustering. The corresponding FQDN and hostname are `gssio1-ib` and `gssio1-ib.data.net`, respectively.	Set the FQDNs and hostnames. Do not make changes in the /etc/hosts file for the high-speed network until the deployment is complete. Do not create or enable the high-speed network interface until the deployment is complete.	Example: Management server: IP address: `172.10.0.10` Hostname: `ems1-ib` FQDN: `ems1-ib.gpfs.net` I/O server 1: IP address: `172.10.0.11` Hostname: `gssio1-ib` FQDN: `gssio1-ib.data.net` I/O server 2: IP address: `172.10.0.12` Hostname: `gssio2-ib` FQDN: `gssio2-ib.data.net`
13. Red Hat Enterprise Linux 7.1	The Red Hat Enterprise Linux 7.1 DVD or ISO file is used to create a temporary repository for the xCAT installation. xCAT uses it to create a Red Hat Enterprise Linux repository on the management server node.	Obtain this DVD or ISO file and download. For more information, see the Red Hat Enterprise Linux website.	Example: `RHEL-7.1-20150219.1- Server-ppc64-dvd1.iso`
14. Management network switch	The switch that implements the management network must allow the Bootstrap Protocol (BOOTP) to go through.	Obtain the IP address and access credentials (user ID and password) of this switch. Some switches generate many Spanning Tree Protocol (STP) messages, which interfere with the network boot process. You need to disable STP to mitigate this.
15. Target file system	You need to provide information about the target file system that is created using storage in the ESS building blocks.	Set the target file system name, the mount point, the block size, the number of data NSDs, and the number of metadata NSDs.	Example: `Block size = 8M, #datansd=4, #metadatansd=2`

The following is an example of a typical etc/hosts file.

[root@ems1 ~]# cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.45.131 hmc1.gpfs.net hmc1

192.168.45.20 ems1.gpfs.net ems1

192.168.45.21 gssio1.gpfs.net gssio1

192.168.45.22 gssio2.gpfs.net gssio2

172.16.45.20 ems1-hs.gpfs.net ems1-hs

172.16.45.21 gssio1-hs.gpfs.net gssio1-hs

172.16.45.22 gssio2-hs.gpfs.net gssio2-hs

Note: High speed network definitions added in the /etc/hosts file ensure that the high speed interface is not enabled (does not return the assigned IP Address in the highs speed interface) during deployment. For example, the high speed cables can be disconnected during the deployment.

Set up the HMC and the management server (MS)

For information about setting up the HMC network for use by xCAT, see the xCAT website .

To set up the HMC and the management server, follow these steps:

Make sure the POWER8® servers are powered on in standby mode.
Connect the ESS I/O server nodes and the management server (if it is part of the order) to the HMC. If the HMC is not part of the order, you will need to provide it.
Verify that the partitions of the I/O servers and the management server (if it is part of the order) are visible on the HMC. (The HMC might prompt you for the FSP password. The default password is abc123.) The HMC discovers the I/O server and management server nodes automatically when the nodes are powered on. If this does not happen, power cycle the nodes.
Typically, server names, or central processor complex (CPC) names, are derived from the serial number. It is recommended that you do not change the server name. Make sure the server name and the logical partition (LPAR) name are not identical.
The default partition names follow.
- Management server: ems1
- I/O server 1: gssio1
- I/O server 2: gssio2
- If there are more building blocks in the same order, the additional I/O server node partition names are: gssio3, gssio4, gssio5, ... gssion, where n is the total number of I/O servers.
The management server nodes and I/O server nodes are shipped from IBM with Red Hat Enterprise Linux 7.1 installed in an R10 disk array. The I/O server nodes are redeployed (including reinstallation of Red Hat Enterprise Linux 7.1) at the customer location from the management server. Typically, this process takes approximately 30 minutes to complete. Completion of this process ensures that the installation is consistent with various site-specific parameters. It also minimizes configuration mismatches and incompatibilities between the management server nodes and I/O server nodes.
There is no need to reinstall the management server. It is reinstalled only if the OS cannot boot any more due to hardware damage or failure. See Installing Red Hat Enterprise Linux on the management server to reinstall the management server if needed.
Verify that you can access the management server console using the HMC. After network connectivity is established to the management server node (see the next section), it is recommended that you access the management server over the network using an available secure shell (SSH) client such as PuTTY.

Configure an IP address for the xCAT network on the management server using the HMC console

Log in to the system as root. The default root password from IBM is cluster.
List the available interfaces, which should begin with a prefix of enP7:
```
ip link show | egrep "P7.*state UP" 
```
If you do not see any interfaces with a state of UP, check your network connections before proceeding. Also, verify that the correct interface is UP.
Select the interface that ends with a suffix of f0. For example:
```
enP7p128s0f0
```
By default, enP7p128s0f0 is C10-port 0 and is configured at IBM with an IP address of 192.168.45.10, 192.168.45.11, or 192.168.45.20.
If enP7p128s0f0 is not up and another link is up, move the cable.
Edit the network configuration for this interface and change it as needed. The file name is:
```
/etc/sysconfig/network-scripts/ifcfg-enP7p128s0f0
```
In this file, change the value of BOOTPROTO from dhcp to static and set the value of ONBOOT to yes if it is not set already:
```
 
BOOTPROTO=static 
ONBOOT=yes
```
Add or change the management server's IP address and netmask as needed. For example:
```
 
IPADDR=192.168.45.20 
NETMASK=255.255.255.0 
```
Restart network services if the address is changed:
```
 
systemctl restart network
```
Verify that the management server's management network interface is up. For example, run:
```
ping 192.168.45.20 
```
After the interface is configured, you can log in to the management server node using an SSH client.

Command sequence overview

If you are familiar with the ESS, review the Elastic Storage Server: Quick Deployment Guide for instructions on how to deploy and upgrade. This document, Deploying the Elastic Storage Server provides detailed instructions and information on the steps involved.

An overview of the command sequence used for the installation follows.

Obtain the packed, compressed ESS 4.0 software. Unpack and uncompress the software. For example, run:
```
tar zxvf gss_install-4.0.0_ppc64_advanced_20160126T001311Z.tgz 
```
The name of your ESS 4.0 software tar (.tgz) file could differ based on the IBM Spectrum Scale edition you are using and the fix levels of the ESS release you are installing.

Check the MD5 checksum:

md5sum -c gss_install-4.0.0_ppc64_advanced_20160126T001311Z.md5

To make sure the /opt/ibm/gss/install directory is clean, run:

/bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z--remove

Obtain the ESS 4.0 license, accept the license, and run this command to extract the software:
```
/bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z --text-only 
```
Clean up the current xCAT installation and associated configuration:
```
gssdeploy -c
```
Install the ESS 4.0 packages on the management server node:
```
gssinstall -m manifest -u 
```
Customize the gssdeploy script and run it to configure xCAT:
```
gssdeploy -x
```
In this case, the gssdeploy script runs one step at a time, which is recommended, and waits for user responses.
Update the management server node:
```
updatenode ems1 -P gss_updatenode
```
If indicated by the previous step, reboot the management server node to reflect changes from the management server update node. After rebooting, run updatenode again if instructed to do so.
Update OFED on the management server node:
```
updatenode ems1 -P gss_ofed 
```
Reboot the management server node to reflect changes for the OFED update.
Deploy on the I/O server nodes:
```
gssdeploy -d
```
Reboot the I/O server nodes after the deployment is complete before proceeding with the HW check.

Detailed installation steps follow.

Obtain the ESS 4.0 installation software and install it on the management server node

The ESS 4.0 software is provided as a packed, compressed tar (.tgz) file.

Obtain the software from the Fix Central website. IBM Spectrum Scale
The name of your ESS 4.0 software tar (.tgz) file could differ based on the edition you are using and the fix levels of the ESS release you are installing.

Unpack and uncompress the file to create the installation software and MD5 checksum of the installation software file.

To unpack and uncompress the file, run this command:

tar zxvf gss_install-4.0.0_ppc64_advanced_20160126T001311Z.tgz

The system displays output similar to this:

root@gems5 deploy]# tar zxvf gss_install-4.0.0_ppc64_advanced_20160126T001311Z.tgz 
gss_install-4.0.0_ppc64_advanced_20160126T001311Z  
gss_install-4.0.0_ppc64_advanced_20160126T001311Z.md5

To verify the MD5 checksum of the software, run:

md5sum -c gss_install-4.0.0_ppc64_advanced_20160126T001311Z.md5

The system displays output similar to this:

[root@gems5 deploy]# md5sum -c tar zxvf gss_install-4.0.0_ppc64_advanced_20160126T001311Z.md5 

gss_install-4.0.0_ppc64_advanced_20160126T001311Z: OK

To make sure the /opt/ibm/gss/install directory is clean, run:

/bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z --remove

Use the gss_install* command to accept the ESS 4.0 product license and install the ESS 4.0 software package.
The ESS 4.0 installation software is integrated with the product license acceptance tool. To install the ESS 4.0 software, you must accept the product license. To accept the license and install the package, run the gss_install* command - for example: /bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z - with the appropriate options. The gss_install* command you run could differ based on the IBM Spectrum Scale edition you are using and the fix levels of the ESS release you are installing.
For example, run:
```
/bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z --text-only  
```
See gss_install* command for more information about this command.
Clean the current xCAT installation and associated configuration:
```
gssdeploy -c
```
By default, the product license acceptance tool places the code in the following directory:
```
/opt/ibm/gss/install 
```
You can use the -dir option to specify a different directory.
Run the change directory command:
```
cd /opt/ibm/gss/install 
```

Use the gssinstall script to install the ESS 4.0 packages on the management server node. This script is in the /opt/ibm/gss/install/installer directory.

For example, run:

/opt/ibm/gss/install/installer/gssinstall -m /opt/ibm/gss/install/manifest -u

The system displays output similar to this:

# /opt/ibm/gss/install/installer/gssinstall -m /opt/ibm/gss/install/manifest -u

[INFO]: GSS package installer
[INFO]: Using LOG: /var/log/gss/gssinstall.log
[INFO]: [EMS] Audit Summary:
[INFO]: [EMS] Manifest Ver:         4.0.0-20160126T001311Z_ppc64_advanced
[INFO]: [EMS] Group gpfs RPMs:      Not Inst: 10, Current:  0, New:  0, Old:  0
[INFO]: [EMS] Group gss RPMs:       Not Inst:  2, Current:  0, New:  0, Old:  0
[INFO]: [EMS] Group gui RPMs:       Not Inst:  3, Current:  0, New:  0, Old:  0
[INFO]: [EMS] Group ofed RPMs:      Not Inst:  1, Current:  0, New:  0, Old:  0
[INFO]: [EMS] Group xcat-core RPMs: Not Inst:  5, Current:  0, New:  0, Old:  0
[INFO]: [EMS] Group xcat-dfm RPMs:  Not Inst:  2, Current:  0, New:  0, Old:  0
[RESP]: Install EMS software repositories? [y/n]: y
[INFO]: Installing EMS software repository to (/install/gss)
[INFO]: Creating yum repo data for gss pkgs (Please wait...)
[INFO]: GSS package installer - Update complete.

See gssinstall script for more information about this script.

Configure the installed packages on the management server node and prepare for deployment

A script called gssdeploy is provided to help configure installed packages and deploy the code on the I/O server node. This script is in the /opt/ibm/gss/install/samples directory.

Copy the gssdeploy script from the /opt/ibm/gss/install/samples directory to another directory and then customize the copy to match your environment. You need to make changes to several lines at the top of your copy of this script for the target configuration, as shown in bold typeface in the following example.

For ESS 4.0, DEPLOY_OSIMAGE must be set to rhels7.1-ppc64-install-gss. You might see other OSIMAGE values that correspond to earlier releases (xCAT command lsdef -l osimage, for example).

 
#########################################################################  
#  
# Customize/change following to your environment 
#
#########################################################################  
#[RHEL] 
# Set to Y if RHEL DVD is used otherwise iso is assumed.  
RHEL_USE_DVD="N"   
# Device location of RHEL DVD used instead of iso  
RHEL_DVD="/dev/cdrom"   
# Mount point to use for RHEL media.  
RHEL_MNT="/opt/ibm/gss/mnt"   
# Directory containing ISO.  
RHEL_ISODIR=/opt/ibm/gss/iso  
# Name ISO file.  
RHEL_ISO="RHEL-7.1-20150219.1-Server-ppc64-dvd1.iso"  

#[EMS]  
# Hostname of EMS  
EMS_HOSTNAME="ems1"   
# Network interface for xCAT management network 
EMS_MGTNETINTERFACE="enP7p128s0f0"  

#[HMC]  
# Hostname of HMC  
HMC_HOSTNAME="hmc1"   
# Default userid of HMC  
HMC_ROOTUID="hrcroot" 
# Default password of HMC  
HMC_PASSWD="Passw0rd"   

#[IOSERVERS]  
# Default userid of IO Server.  
IOSERVERS_UID="root"   
# Default password of IO Server.  
IOSERVERS_PASSWD="cluster"  
# Array of IO servers to provision and deploy.  
IOSERVERS_NODES=(gssio1 gssio2)  

#[DEPLOY]  
# OSIMAGE stanza to deploy to IO servers.  
DEPLOY_OSIMAGE="rhels7.1-ppc64-install-gss"  
######################################################################## 
#  
# End of customization  
#
########################################################################

Run the gssdeploy script.
The gssdeploy script can be run in interactive mode or non-interactive ("silent") mode. Running gssdeploy in interactive mode is recommended.

The gssdeploy script is run in two phases. In the first phase, it is run with the -x option to set up the management server and xCAT. In the second phase, it is run with the -d option to deploy on the I/O server node.

See gssdeploy script for more information about this script.
Every step of the gssdeploy script shows the current step to be run and a brief description of the step. For example, the command to be run is shown in bold typeface and the response of the command in shown in italics:
```
[STEP]: Deploy 4 of 7,, Set osimage attributes for the nodes so current  
values will be used for rnetboot or updatenode  
[CMD]:  =>  nodeset gss_ppc64 osimage=rhels7.1-ppc64-install-gss  
Enter 'r' to run  [CMD]:  
Enter 's' skip this step, or 'e' to exit this script  
Enter response: r  
[CMD_RESP]: gssio1: install rhels7.1-ppc64-gss  
[CMD_RESP]: gssio2: install rhels7.1-ppc64-gss 
[CMD_RESP]: RC: 0
```
Configure xCAT and the management server node
To configure xCAT and the management server node, you will run gssdeploy -x. If xCAT is installed on the node already, the script will fail. If it fails, clean the previous xCAT installation by running gssdeploy -c.
Suppose your modified gssdeploy script is in the /home/deploy directory. Run:
```
/home/deploy/gssdeploy -x
```
The script goes through several steps and configures xCAT on the management server node. Some of the steps (those in which copycds or getmacs is run, for example) take some time to complete.

Update the management server node.

In this step, the IBM Spectrum Scale RPMs, kernel, and OFED updates are installed on the node. This step prepares the node to run as a cluster member node in the IBM Spectrum Scale RAID cluster.

Run the updatenode ManagementServerNodeName -P gss_updatenode command. For example, run:

updatenode ems1 -P gss_updatenode

The system displays output similar to this:

[root@ems1 deploy]# updatenode ems1 -P gss_updatenode  
ems1: Mon Jun 15 18:02:50 CDT 2015 Running postscript: gss_updatenode  
ems1: gss_updatenode [INFO]: Using LOG: /var/log/xcat/xcat.log  
ems1: gss_updatenode [INFO]: Performing update on ems1  
ems1: gss_updatenode [INFO]: Erasing gpfs rpms  
ems1: gss_updatenode [INFO]: Erase complete  
ems1: gss_updatenode [INFO]: Updating ospkgs on ems1 (Please wait...)  
ems1: gss_updatenode [INFO]: Version unlocking kernel for the update  
ems1: gss_updatenode [INFO]: Disabling repos:  
ems1: gss_updatenode [INFO]: Updating otherpkgs on ems1 (Please wait...) 
ems1: gss_updatenode [INFO]: Enabling repos:  
ems1: gss_updatenode [INFO]: Version locking kernel  
ems1: gss_updatenode [INFO]: Checking that GPFS GPL layer matches running kernel  
ems1: gss_updatenode [INFO]: GPFS GPL layer matches running kernel  
ems1: gss_updatenode [INFO]: Checking that OFED ISO supports running kernel  
ems1: gss_updatenode [INFO]: Upgrade complete  
ems1: Postscript: gss_updatenode exited with code 0  
ems1: Running of postscripts has completed.

This step could take some time to complete if vpdupdate is run before the actual update. To determine whether you are waiting for vpdupdate, run this command:

 
ps ef | grep vpd

The system displays output similar to this:

[root@ems1 ~]# ps ef | grep vpd 
root      75272  75271  0 17:05 ?        00:00:00 /usr/sbin/lsvpd  
root      75274  75272  0 17:05 ?        00:00:00 sh -c /sbin/vpdupdate >/dev/null 2>&1  
root      75275  75274  2 17:05 ?        00:00:03 /sbin/vpdupdate  
root      76106  73144  0 17:08 pts/0    00:00:00 grep -color=auto vpd

After the updatenode command completes, you should see an exit code of 0.

Reboot the node only if you are instructed to do so. Also, run the script again if you rebooted.

Run the OFED update using updatenode ManagementServerNodeName -P gss_updatenode. Your version of OFED may be different than what is shown here. For OFED update, run:

 
updatenode ems1 -P gss_ofed

The system displays output similar to this:

    
[root@ems1 deploy]# updatenode ems1 -P gss_ofed  

ems1: Mon Jun 15 18:20:54 CDT 2015 Running postscript: gss_ofed  
ems1: Starting to install OFED.....  
ems1: Mellanox controller found, install Mellanox OFED  
ems1: Unloading HCA driver:[  OK  ]  

ems1: Mounting OFED ISO...  
ems1: /tmp //xcatpost  
ems1: mount: /dev/loop0 is write-protected, mounting read-only  
ems1: Loaded plugins: product-id, subscription-manager, versionlock  
ems1: This system is not registered to Red Hat Subscription Management. 
You can use subscription-manager to register.  
ems1: Error: Error: versionlock delete: no matches  
ems1: Installing OFED stack...  
ems1: TERM environment variable not set.  
ems1: Logs dir: /tmp/MLNX_OFED_LINUX-3.1-1.0.0.2.logs  
ems1:  
ems1: Log File: /tmp/MLNX_OFED_LINUX-3.1-1.0.0.2.logs/fw_update.log  
ems1: Unloading HCA driver:[  OK  ]  

ems1: Loading HCA driver and Access Layer:[  OK  ]  

ems1: Loaded plugins: product-id, subscription-manager, versionlock  
ems1: This system is not registered to Red Hat Subscription Management.  
You can use subscription-manager to register.  
ems1: Adding versionlock on: 0:dapl-devel-2.1.3mlnx-OFED.2.4.37.gb00992f 
ems1: Adding versionlock on: 0:srptools-1.0.1-OFED.2.4.40.g68b353c-OFED.2.3.47.gc8011c5 
.   
.   
. 
ems1: Adding versionlock on: 0:opensm-devel-4.3.0.MLNX20141222.713c9d5-0.1  
ems1: versionlock added: 60  
ems1: //xcatpost  
ems1: Postscript: gss_ofed exited with code 0  
ems1: Running of postscripts has completed.

Reboot the node after the OFED update is complete.

To make sure the OFED is updated and reflects the installed kernel, run this command:

  
ofed_info | grep -e kernel | grep ppc64

The system displays output similar to this:

[root@ems1 deploy]# ofed_info | grep -e kernel | grep ppc64  

kernel-mft-3.8.0-3.10.0_229.el7.ppc64.ppc64  
kernel-ib-devel-2.4-3.10.0_229.el7.ppc64_OFED.2.4.1.0.2.1.ge234f2b.ppc64 
kernel-ib-2.4-3.10.0_229.el7.ppc64_OFED.2.4.1.0.2.1.ge234f2b.ppc64

Deploy the nodes

Before installing the ESS nodes:

Close all console (rcons) sessions on the management server and on the HMC.
If the switch is supplied by the customer (that is, not shipped from IBM), make sure all nodes can communicate using BOOTP and there are no excessive STP messages. BOOTP could fail in the presence of excessive STP messages. You might consider enabling PortFast on the ports that are connected to the I/O server node.
Make sure no other DHCP server is acting on the network.
Make sure the external JBOD storage is powered off or disconnected.

Run the gssdeploy script to deploy the I/O server nodes:

  
gssdeploy -d

At this point, the I/O server nodes are restarted and the OS and other software packages are installed on them.

Monitoring the I/O server node installation process

Use the remote console feature of xCAT to monitor the installation process. The preferred method for monitoring the progress is to watch the console logs using the Linux tailf command.

Example:

 
tailf /var/log/consoles/gssio1

If you want to interact with the console, you can use the remote console utility (rcons).

 
rcons NodeName

If you connect to the console when the Red Hat installer, called Anaconda, is running, you are sent to a menu system. To display various menus, press <Ctrl-b> n, where n is the number of the menu you want to view. For example, if you press <Ctrl-b> 2, you are placed in the Anaconda shell. It is recommended that you not perform any actions using the Anaconda menu unless instructed to do so.

To exit an rcons remote console session, press <Ctrl-e> c . (period). If you cannot exit, to restart the console server, run:

  
makeconservercf

You can also use the nodestat command to display the status of the installation process, which can take approximately 20 to 30 minutes to complete. For example, run:

  
nodestat gss_ppc64

The system displays output similar to this:

[root@ems1 ~]#nodestat gss_ppc64 

gssio1: installing post 
gssio2: installing post

A status of installing status indicates that it is still running. The command displays the percentage of total packages already installed. When the command completes, it displays a status of sshd. For example, the system displays output similar to this:

 
[root@ems1 ~]#nodestat gss_ppc64 

gssio1: sshd 
gssio2: sshd

Make sure the xCAT post-installation script is complete by checking the process on the I/O server nodes:

 
xdsh gss_ppc64 "ps -eaf | grep -v grep | grep xcatpost"

If there are any processes still running, wait for them to complete.

It is possible that the installation could fail due to network boot issues. If the installation fails, run makeconservercf before trying it again. Retry the installation at least three times and see if that fixes the issue.

If one of the I/O server nodes failed to install, to restart the installation, run:

  
nodeset gssio2 osimage=rhels7.1-ppc64-install-gss 
rnetboot gssio2 -V

This command sequence restarts the installation process on gssio2. Monitor the console using tailf or rcons. Check the messages that are displayed during the initial phase of the boot process. Most issues will occur during this phase.

Check for synchronization files

As part of the operating system and I/O server code installation, xCAT runs post-installation scripts. These scripts install the required RPMs, upgrade and configure the networks (10 GbE, 40GbE, and InfiniBand), and configure the SAS adapters.

First, check for the synchronization files directory from the management server. Run:

  
xdsh gss_ppc64 "ls /install/gss/sync"

The system displays output similar to this:

  
gssio1: mofed  
gssio2: mofed

If this directory exists, you can move on and check the system hardware next. If it does not exist, continue waiting for approximately 10 more minutes for the synchronization phase to complete. If the synchronization directory is still not present, synchronization is not working and must be run manually using updatenode:

 
updatenode gss_ppc64 -F

Check for post-installation scripts

If the updatenode synchronization command fails, the post-installation scripts will not run. If updatenode synchronization is run manually, the updatenode post-installation scripts must also be run manually. If the post-installation scripts are not run, the OFED stack will not be installed and the SAS adapter will not be set properly. Run the post-installation scripts manually using the following command:

  
updatenode gss_ppc64 -V -P gss_ofed,gss_sashba

The updatenode command could take some time to complete. This is because updatenode calls vpdupdate on the node. You can check by running ps -ef | grep vpd on each node. If you see vpdupdate running, the updatenode command is waiting for it to complete.

Apply Red Hat updates

After deployment is complete, you can apply Red Hat updates as needed. Note that kernel and OFED components are matched with the ESS software stack and are therefore locked during deployment to prevent unintended changes during update.

See Red Hat Enterprise Linux update considerations for additional considerations.

Check the system hardware

Now that the software is installed on the I/O server nodes, the next step is to verify the hardware configuration. In the next several steps, you will check and validate the hardware configuration and health of the hardware including correct adapter locations, SAS connectivity, and disks installed in the JBOD enclosures. You can run all of the gss* commands from the management server. You will run the following commands during the system check:

gssstoragequickcheck checks the server, adapter, and storage configuration quickly.
gssfindmissingdisks checks the disk paths and connectivity.
gsscheckdisks checks for disk errors under various I/O operations.

Power on JBODs

After the I/O server nodes have been installed successfully, power on the JBODs. Wait approximately 5 to 10 minutes from power on to discover the disks before moving on to the next step.

System check 1: run gssstoragequickcheck

Run the gssstoragequickcheck command from the management server. This command verifies that the correct adapters are installed and are placed in the correct PCI slots. It also checks for attached storage. For example, run:

gssstoragequickcheck -G gss_ppc64

The system displays output similar to this:

[root@ems1 deploy]# gssstoragequickcheck -G gss_ppc64 

2015-06-15T20:17:07.036867 Start of storage quick configuration check  
2015-06-15T20:17:08.745084 nodelist:   gssio1 gssio2  

gssio1: Machine Type: 8247-22L  
gssio2: Machine Type: 8247-22L  
gssio1: Valid SAS Adapter Configuration. Number of Adapter(s) found 3  
gssio1: Valid Network Adapter Configuration. Number of Adapter(s) found: 3  
gssio2: Valid SAS Adapter Configuration. Number of Adapter(s) found 3  
gssio2: Valid Network Adapter Configuration. Number of Adapter(s) found: 3  
gssio1: Enclosure DCS3700 found 2  
gssio1: Disk ST2000NM0023 found 116  
gssio1: SSD PX02SMF040 found 2  
gssio1: Total disk found 116, expected 116  
gssio1: Total SSD found 2, expected 2  
gssio2: Enclosure DCS3700 found 2  
gssio2: Disk ST2000NM0023 found 116  
gssio2: SSD PX02SMF040 found 2  
gssio2: Total disk found 116, expected 116  
gssio2: Total SSD found 2, expected 2  

2015-06-15T20:17:25.670645 End of storage quick configuration check

If the attached SCSI devices are not found, try running modprobe on each of the I/O server nodes on the SAS driver:

xdsh gss_ppc64 "modprobe mpt2sas"

After running modprobe, run gssstoragequickcheck again.

See gssstoragequickcheck command for more information about this command.

System check 1a: run lsifixnv

The lsifixnv utility that sets up NVRAM for the SAS adapter. If it is not set properly, the I/O could fail intermittently. From the management server node, run the following command. This will run lsifixnv on each I/O server node. The lsifixnv utility is called by the gss_sashba script.

xdsh gss_ppc64 "/xcatpost/gss_sashba"

System check 1b: Check the RAID firmware

Check the local RAID adapters' firmware level.

xdsh ems1,gss_ppc64 "for IOA in \$(lsscsi -g | grep SISIOA | awk '{print \$NF}'); 
do iprconfig -c query-ucode-level \$IOA; done"

The system displays output similar to this:

[root@ems1 deploy]# xdsh ems1,gss_ppc64 "for IOA in \$(lsscsi -g | grep SISIOA | 
awk '{print \$NF}'); do iprconfig -c query-ucode-level \$IOA; done" 

ems1: 12511700  
gssio2: 12511700  
gssio1: 12511700

If this system is upgraded from a previous version, you might see a RAID firmware level of 12511400.

If the RAID adapter firmware is not at the correct level, contact the IBM Support Center for update instructions.

System check 1c: Make sure 64-bit DMA is enabled for InfiniBand slots

Check the management server and I/O servers to make sure 64-bit direct memory access (DMA) is enabled for slots populated with the Connect-IB adapter. There should be one line for each adapter. In this example, there are three adapters in each I/O server node and one adapter in the management server node. Run:

xdsh gss_ppc64,bgqess-mgt1 journalctl -b | grep 64-bit | grep -v dma_rw | grep mlx

The system displays output similar to this:

[root@ems1 gss]# xdsh gss_ppc64,bgqess-mgt1 journalctl -b | grep 64-bit | grep -v dma_rw | grep mlx
  
gssio1: Feb 13 09:28:34 bgqess-gpfs02.scinet.local kernel: mlx5_core 0000:01:00.0: Using 64-bit direct DMA at offset 800000000000000  
gssio1: Feb 13 09:29:02 bgqess-gpfs02.scinet.local kernel: mlx5_core 0004:01:00.0: Using 64-bit direct DMA at offset 800000000000000  
gssio1: Feb 13 09:29:30 bgqess-gpfs02.scinet.local kernel: mlx5_core 0009:01:00.0: Using 64-bit direct DMA at offset 800000000000000  
gssio2: Jan 30 16:46:55 bgqess-gpfs01.scinet.local kernel: mlx5_core 0000:01:00.0: Using 64-bit direct DMA at offset 800000000000000  
gssio2: Jan 30 16:47:23 bgqess-gpfs01.scinet.local kernel: mlx5_core 0004:01:00.0: Using 64-bit direct DMA at offset 800000000000000  
gssio2: Jan 30 16:47:50 bgqess-gpfs01.scinet.local kernel: mlx5_core 0009:01:00.0: Using 64-bit direct DMA at offset 800000000000000  
mgt1: Jan 26 16:55:41 bgqess-mgt1 kernel: mlx5_core 0004:01:00.0: Using 64-bit direct DMA at offset 800000000000000

Make sure you see all of the InfiniBand devices in this list. This sample output includes the following device numbers: 0000:01:00.0, 0004:01:00.0, and 0009:01:00.0. The slot-to-device assignments for the Connect-IB adapter follow:

Slot: Device
C5: 0009:01:00.0
C6: 0004:01:00.0
C7: 0000:01:00.0

If a device for a slot where the Connect-IB adapter is installed is not displayed in the xdsh output, follow these steps:

Make sure the OS or partition is shut down.
Click on server on the HMC GUI -> Operations -> Launch ASM.
On the Welcome pane, specify your user ID and password. The default user ID is admin. The default password is abc123.
In the navigation area, expand System Configuration -> System -> I/O Adapter Enlarged Capacity.
Select Enable and specify I/O Adapter Enlarged Capacity 11. This specifies all slots, because the I/O server nodes have 11 slots.
Save your settings.
Restart the server so the changes will take effect.

System check 2: run gssfindmissingdisks

Run the gssfindmissingdisks command to verify that the I/O server nodes are cabled properly. This command reports the status of the disk paths. See gssfindmissingdisks command for more information about this command.

In this example, there are no missing drive paths. Run:

gssfindmissingdisks -G gss_ppc64

The system displays output similar to this:

[root@ems1 deploy]# gssfindmissingdisks -G gss_ppc64  

2015-06-15T20:27:18.793026 Start find missing disk paths  
2015-06-15T20:27:20.556384 nodelist:   gssio1 gssio2  
2015-06-15T20:27:20.556460 May take long time to complete search of all drive paths  
2015-06-15T20:27:20.556501 Checking missing disk paths from node  gssio1  
gssio1 Enclosure SV45221140 (number 1):  
gssio1 Enclosure SV45222733 (number 2):  
gssio1: GSS configuration: 2 enclosures, 2 SSDs, 2 empty slots, 118 disks total, 6 NVRAM partitions  
2015-06-15T20:27:37.698284 Checking missing disk paths from node  gssio2  
gssio2 Enclosure SV45221140 (number 1):  
gssio2 Enclosure SV45222733 (number 2):  
gssio2: GSS configuration: 2 enclosures, 2 SSDs, 2 empty slots, 118 disks total, 6 NVRAM partitions  
2015-06-15T20:27:54.827175 Finish search for missing disk paths. Number of missing disk paths: 0

When there are missing drive paths, the command reports possible configuration or hardware errors:

[root@ems1 setuptools]# ./gssfindmissingdisks -G gss_ppc64  

2014-10-28T04:23:45.714124 Start finding missing disks  
2014-10-28T04:23:46.984946 nodelist:   gssio1 gssio2  
2014-10-28T04:23:46.985026 Checking missing disks from node  gssio1  
gssio1: Enclosure SV24819545 (number undetermined): 4-7  
gssio1: Enclosure SV24819545 (number undetermined): 4-9  
gssio1: Enclosure SV32300072 (number undetermined): 5-5  
2014-10-28T04:25:10.587857 Checking missing disks from node  gssio2  
gssio2: Enclosure SV24819545 (number undetermined): 2-9  
gssio2: Enclosure SV24819545 (number undetermined): 3-4  
gssio2: Enclosure SV24819545 (number undetermined): 4-6  
2014-10-28T04:26:33.253075 Finish search for missing disks. Number of missing disks: 6

In this example, the path to the disks is different from each I/O server node. Missing drives are shown in a different node view. It is most likely not a physical drive issue, but rather a cable or other subsystem issue.

If the cabling is not correct (all of the drives are present, but the cables are connected to the wrong port, for example), the system displays output similar to this:

scsi3[19.00.00.00] U78CB.001.WZS0043-P1-C2-T1  
scsi4[19.00.00.00] U78CB.001.WZS0043-P1-C2-T2 [P1 SV32300072 ESM A (sg67)] [P2  
SV24819545 ESM B (sg126)]  
scsi5[19.00.00.00] U78CB.001.WZS0043-P1-C3-T1  
scsi6[19.00.00.00] U78CB.001.WZS0043-P1-C3-T2 [P2 SV24819545 ESM A (sg187)]  
scsi1[19.00.00.00] U78CB.001.WZS0043-P1-C11-T1  
scsi2[19.00.00.00] U78CB.001.WZS0043-P1-C11-T2 [P2 SV32300072 ESM B (sg8)]

For information about hardware ports, cabling, PCIe adapter installation, and SSD placement, see Cabling the Elastic Storage Server.

System check 2a: run mmgetpdisktopology

Use the gssfindmissingdisks command to verify the I/O server JBOD disk topology. If gssfindmissingdisks shows one or more errors, run the mmgetpdisktopology and topsummary commands to obtain more detailed information about the storage topology for further analysis. These commands are run from the I/O server nodes. It is a best-practice recommendation to run these commands once on each I/O server node.

For more information about mmgetpdisktopology and topsummary, see IBM Spectrum Scale RAID: Administration.

Run mmgetpdisktopology and topsummary together to produce a configuration summary:

mmgetpdisktopology | topsummary

The system displays output similar to this:

[root@gssio1 ~]# mmgetpdisktopology | topsummary 

/usr/lpp/mmfs/bin/topsummary: reading topology from standard input 
GSS enclosures found: SV45221140 SV45222733 
Enclosure SV45221140 (number 1): 
Enclosure SV45221140 ESM A sg188[039A][scsi6 port 2] ESM B sg127[039A][scsi4 port 2] 
Enclosure SV45221140 Drawer 1 ESM sg188 12 disks diskset "10026" ESM sg127 12 disks diskset "10026" 
Enclosure SV45221140 Drawer 2 ESM sg188 12 disks diskset "51918" ESM sg127 12 disks diskset "51918" 
Enclosure SV45221140 Drawer 3 ESM sg188 12 disks diskset "64171" ESM sg127 12 disks diskset "64171" 
Enclosure SV45221140 Drawer 4 ESM sg188 12 disks diskset "02764" ESM sg127 12 disks diskset "02764" 
Enclosure SV45221140 Drawer 5 ESM sg188 12 disks diskset "34712" ESM sg127 12 disks diskset "34712" 
Enclosure SV45221140 sees 60 disks 

Enclosure SV45222733 (number 2): 
Enclosure SV45222733 ESM A sg68[039A][scsi4 port 1] ESM B sg9[039A][scsi2 port 2] 
Enclosure SV45222733 Drawer 1 ESM sg68 11 disks diskset "28567" ESM sg9 11 disks diskset "28567" 
Enclosure SV45222733 Drawer 2 ESM sg68 12 disks diskset "04142" ESM sg9 12 disks diskset "04142" 
Enclosure SV45222733 Drawer 3 ESM sg68 12 disks diskset "29724" ESM sg9 12 disks diskset "29724" 
Enclosure SV45222733 Drawer 4 ESM sg68 12 disks diskset "31554" ESM sg9 12 disks diskset "31554" 
Enclosure SV45222733 Drawer 5 ESM sg68 11 disks diskset "13898" ESM sg9 11 disks diskset "13898" 
Enclosure SV45222733 sees 58 disks 

GSS configuration: 2 enclosures, 2 SSDs, 2 empty slots, 118 disks total, 6 NVRAM partitions 

scsi3[20.00.02.00] U78CB.001.WZS06M2-P1-C2-T1 
scsi4[20.00.02.00] U78CB.001.WZS06M2-P1-C2-T2 [P1 SV45222733 ESM A (sg68)] [P2 SV45221140 ESM B (sg127)] 
scsi5[20.00.02.00] U78CB.001.WZS06M2-P1-C3-T1 
scsi6[20.00.02.00] U78CB.001.WZS06M2-P1-C3-T2 [P2 SV45221140 ESM A (sg188)] 
scsi0[20.00.02.00] U78CB.001.WZS06M2-P1-C11-T1 
scsi2[20.00.02.00] U78CB.001.WZS06M2-P1-C11-T2 [P2 SV45222733 ESM B (sg9)]

Depending on the model and configuration you may see references to enclosure numbers up to 6. This summary is produced by analyzing the SAS physical topology.

Some tips when reading the output follow:

The first line, is a list of the enclosure mid-plane serial numbers, for some enclosure type (DCS3700, for example). This serial number does not appear anywhere on the enclosure itself. The second line shows the enclosure ordering based on the cabling. A system with incorrect cabling will show that the enclosure number is undetermined. The third line shows the enclosure's serial number, then ESM A and ESM B, each followed by a SCSI generic device number that is assigned by the host:
```
Enclosure SV45221140 ESM A sg188[039A][scsi6 port 2] ESM B sg127[039A][scsi4 port 2]
```
The number in the first set of brackets is the code level of the ESM. The ports of the SCSI device are enclosed in the second set of brackets. The SCSI generic device number (sg188 or sg127, for example) is also shown in the gsscheckdisk path output of drive performance and error counter.
Enclosures are numbered physically from bottom to top within a building block. Enclosure 1 is the bottom enclosure; enclosure 6 is the top enclosure.
Analyze the output:
```
Enclosure SV45221140 (number 1): 
Enclosure SV45221140 ESM A sg188[039A][scsi6 port 2] ESM B sg127[039A][scsi4 port 2] 
Enclosure SV45221140 Drawer 1 ESM sg188 12 disks diskset "10026" ESM sg127 12 disks diskset "10026"
                                                          ^                                  ^ 
```
Each line shows two disk-set numbers, one from ESM A and the other from ESM B.
The disk-set number is the checksum of the serial numbers of the drives seen on that path. Checksums that don't match indicate an issue with that path involving an adapter, SAS cable, enclosure ESM, or expanders in the enclosures. If only one disk set is shown, this indicates a complete lack of path, such as a missing cable or ESM.

The end of the topsummary output shows the cable attachment to the SAS adapters:

 
scsi3[20.00.02.00] U78CB.001.WZS06M2-P1-C2-T1 
scsi4[20.00.02.00] U78CB.001.WZS06M2-P1-C2-T2 [P1 SV45222733 ESM A (sg68)] [P2 SV45221140 ESM B (sg127)]
scsi5[20.00.02.00] U78CB.001.WZS06M2-P1-C3-T1 
scsi6[20.00.02.00] U78CB.001.WZS06M2-P1-C3-T2 [P2 SV45221140 ESM A (sg188)] 
scsi0[20.00.02.00] U78CB.001.WZS06M2-P1-C11-T1 
scsi2[20.00.02.00] U78CB.001.WZS06M2-P1-C11-T2 [P2 S45222V733 ESM B (sg9)]

The first two lines represent the SAS adapter in slot C2. There are two SAS 2300 SCSI Controllers in each adapter card, indicated by T1 and T2.

The mapping of ports on the SAS adapter follows:

T1 P1 = Port 0 
T1 P2 = Port 1 
T2 P1 = Port 2 
T2 P2 = Port 3

This shows that Port 2 of the adapter in slot C2 is connected to ESM A of enclosure SV45222733. Similarly, Port 2 of the adapter in slot C11 is connected to ESM B of enclosure 45222V733. See Figure 1 and Figure 2 for the physical location of ports and ESMs.

System check 3: run gsscheckdisks

The gsscheckdisks command initiates I/O to the drives and can be used to identify marginal drives. This command must be run on a system where there is no GPFS cluster configured. If it is run with a write test on a system where a GPFS cluster is already configured, it will overwrite the cluster configuration data stored in the disk, resulting in cluster and data loss. This command can be run from the management server node or from an I/O server node. The default duration is to run for 30 seconds for each I/O test for each path. For a more thorough test, set the duration to run for 5 minutes (300 seconds) or more.

Note: gsscheckdisks must not be run on a system that has GPFS recovery groups. The GSSENV environment variable must be set to INSTALL or MFG to indicate that you are running this command on a system in a manufacturing environment or in an installation and deployment environment. The following message is displayed if this environment variable is not set.

[root@ems1 deploy]# gsscheckdisks -G gss_ppc64  --disk-list sdx,sdc --iotest a --write-enable
  
2015-06-15T20:35:53.408621 Start running check disks  
gsscheckdisks must run in INSTALL or MFG environment. It may result in data loss  
if run in a configured system.  
Please rerun with environment GSSENV=INSTALL or GSSENV=MFG to indicate that it is  
run in install or manufacturing environment.  
Example:  
   GSSENV=INSTALL gsscheckdisks -N gss_ppc64 --show-enclosure-list

Run gsscheckdisks to verify that disks are in a good state.

To run the command from all I/O server nodes (all nodes of the group, for example), select all attached enclosures and all I/O operations for testing:

GSSENV=INSTALL gsscheckdisks -G gss_ppc64 --encl all --iotest a --write-enable

The system displays output similar to this:

[root@gssio1 ~]# GSSENV=INSTALL gsscheckdisks -G gss_ppc64 --encl all --iotest a --write-enable  

2014-11-26T05:30:42.401514 Start running check disks  
List of Enclosures found  
SV32300072  
SV24819545  
Taking inventory of disks in enclosure SV32300072.  
Taking inventory of disks in enclosure SV24819545.  
2014-11-26T05:34:48.317358 Starting r test for 118 of 118 disks. Path: 0, duration 30 secs  
2014-11-26T05:35:25.216815 Check disk analysis for r test Complete  
2014-11-26T05:35:25.218802 Starting w test for 118 of 118 disks. Path: 0, duration 30 secs  
2014-11-26T05:36:02.247192 Check disk analysis for w test Complete  
2014-11-26T05:36:02.249225 Starting R test for 118 of 118 disks. Path: 0, duration 30 secs  
2014-11-26T05:36:39.384888 Check disk analysis for R test Complete  
2014-11-26T05:36:39.386868 Starting W test for 118 of 118 disks. Path: 0, duration 30 secs  
2014-11-26T05:37:16.515254 Check disk analysis for W test Complete  
2014-11-26T05:37:16.517218 Starting r test for 118 of 118 disks. Path: 1, duration 30 secs  
2014-11-26T05:37:53.407486 Check disk analysis for r test Complete  
2014-11-26T05:37:53.409601 Starting w test for 118 of 118 disks. Path: 1, duration 30 secs  
2014-11-26T05:38:30.421883 Check disk analysis for w test Complete  
2014-11-26T05:38:30.423763 Starting R test for 118 of 118 disks. Path: 1, duration 30 secs  
2014-11-26T05:39:07.548179 Check disk analysis for R test Complete  
2014-11-26T05:39:07.550328 Starting W test for 118 of 118 disks. Path: 1, duration 30 secs   
2014-11-26T05:39:44.675574 Check disk analysis for W test Complete

gsscheckdisks displays an error count if any of the drives under test (and path) experience I/O errors. If there are errors on any disks, the output identifes the failing disks. The output details the performance and errors seen by the drives and is saved in the /tmp/checkdisk directory of the management server node (or I/O server node if it is called from there) for further analysis. There are three files in this directory.

hostdiskana[0-1].csv contains summary results of disk I/O throughput of each device every second and a one-line summary of each device showing throughput and error count.

In each I/O server node, it also stores the following files.

diskiostat.csv contains details of the /proc/iostat data for every second for offline detailed analysis of disk performance. The format of the data is: column 1: time epoch, column 2: node where run, column 3: device. Columns 4 through 11 are a dump of /proc/iostat.
deviceerr.csv contains the drive error count. The format of the data: column 1: time epoch, column 2: node where run, column 3: device, column 4: I/O issued, column 5: I/O completed, column 6: io error.

Note: With a default test duration of 30 for each test case and a batch size of 60 drives, it can take up to 20 minutes per node for a GL4 system.

See gsscheckdisks command for more information about this command.

Set up high-speed networking

Set up the high-speed network that will be used for cluster data communication. See Networking: creating a bonded interface for more information.

Choose the hostname that will be associated with the high-speed network IP address. Typically, the hostname associated with the high-speed network is derived from the xCAT hostname using the prefix and suffix. Before you create the GPFS cluster, high-speed networking must be configured with the proper IP address and hostname. See Node name considerations for more information.
Update your /etc/hosts with high-speed network entries showing the high-speed IP address and corresponding host name. Copy the modified /etc/hosts to the I/O Server nodes of the cluster.
Add the high-speed network to the xCAT networks table. Run:
```
makedns
```

Set up the high-speed network

With the Ethernet high-speed network, you can use the gssgennetworks script to create a bonded Ethernet interface over active (up) high-speed network interfaces. You cannot use gssgennetworks IPoIB configurations. See Appendix A: Installation: reference for creating bonded network interface with IP over IB.

To see the current set of active (up) interfaces on all nodes, run:
```
gssgennetworks -G ems1,gss_ppc64 --suffix=-hs
```
To create a bonded Ethernet interface, in all nodes run:
```
gssgennetworks -G ems1,gss_ppc64 --suffix=-hs -create-bond
```
The script sets miimon to 100, the bonding node to 802.3ad (LACP), and xmit_hash_policy to layer3+4. The other bond options keep the default values, including lacp-_rate (the default is slow). For proper network operation, the Ethernet switch setting in the networking infrastructure must match the I/O server node interface bond settings.

Check the installed software and firmware

Run the gssinstallcheck command to check the installed software and firmware.

See gssinstallcheck command for more information about this command.

Create the GPFS cluster

Run the gssgencluster command on the management server to create the cluster. This command creates a GPFS cluster using all of the nodes in the node group if you specify the -G option. You can also provide a list of names using the -N option. The command assigns server licenses to each I/O server node, so it prompts for license acceptance (or use the -accept-license option). It applies the best-practice IBM Spectrum Scale configuration attributes for an NSD server based on IBM Spectrum Scale RAID. At the end of cluster creation, the SAS adapter firmware, storage enclosure firmware, and drive firmware are upgraded if needed. To bypass the firmware update, specify the --no-fw-update option.

Note: This command could take some time to run.

See gssgencluster command for more information about this command.

Note: This command could take some time to run.

Log on to one of the I/O server nodes and verify that the cluster is created correctly. Run:

mmlscluster

The system displays output similar to this:

[root@gssio1 ~]# mmlscluster 

GPFS cluster information  
========================  
  GPFS cluster name:         test01.gpfs.net  
  GPFS cluster id:           14599547031220361759  
  GPFS UID domain:           test01.gpfs.net  
  Remote shell command:      /usr/bin/ssh  
  Remote file copy command:  /usr/bin/scp  
  Repository type:           CCR  


 Node  Daemon node name    IP address    Admin node name     Designation  
-------------------------------------------------------------------------  
   1   gssio1-hs.gpfs.net  172.45.45.23  gssio1-hs.gpfs.net  quorum-manager  
   2   gssio2-hs.gpfs.net  172.45.45.24  gssio2-hs.gpfs.net  quorum-manager

Verify that the GPFS cluster is active

Run the mmgetstate command from an I/O server node to verify that the cluster is active and operational. Use the -a option to include all nodes in the GPFS cluster:

mmgetstate -a

The system displays output similar to this:

[root@gssio1 ~]# mmgetstate -a
  
 Node number  Node name        GPFS state  
------------------------------------------  
       1      gssio1-hs        active  
       2      gssio2-hs        active

After the /etc/hosts file is properly set with high-speed IP addresses and corresponding hostnames, you can use the gssgennetworks script to create a bonded Ethernet network. Note that this script cannot be used to create a bond with the IP over an IB network.

To see the current set of active (up) interfaces, run:

gssgennetworks -G gss_ppc64

To create a bonded interface, run:

gssgennetworks -G gss_ppc64 --create-bond

The script sets miimon to 100, the bonding mode to 802.3ad (LACP), and xmit_hash_policy to layer3+4. The other bond options keep the default values, including lacp_rate (the default is slow). For proper network operation, the Ethernet switch settings in the networking infrastructure must match the I/O server node interface bond settings.

Create the recovery groups

The gssgenclusterrgs command creates the recovery groups (RGs) and declustered arrays (DAs), as well as the associated log tip vdisk, log backup vdisk, and log home vdisk. For each RG, three arrays are created: NVRAM, SSD, and DAn. By default for ESS 3.5, only one DA is created, in which all HDDs (and SSDs for SSD models) belong to this single DA (DA1, for example). If you want to use multiple DAs (assuming there are enough disks), specify the --multi-da option.

The gssgenclusterrgs command can create NSDs and file systems for simple configurations that require one file system. More flexibility can be achieved using gssgenclusterrgs to create the recovery groups only and using gssgenvdisks (the preferred method) to create data vdisks, metadata vdisks, NSDs, and file systems. For backward compatibility, the gssgenclusterrgs command continues to support vdisk, NSD, and file system creation.

The gssgenclusterrgs command creates and saves the stanza files for the data and metadata vdisks and NSD. The stanza files are located in the /tmp directory of the first node of the first building block with names node1_node2_vdisk.cfg.save and node1_node2_nsd.cfg.save. These files can be edited for further customization.

If a customized recovery stanza file is available, it can be used to create the recovery group. The files must be located on the first node (in the node list) of each building block in /tmp. Their names must be in the format xxxxL.stanza and yyyyR.stanza, where L is for the left recovery group and R is for the right recovery group. The name of the recovery group is derived from the I/O server node's short name (with prefix and suffix) by adding a prefix of rg_. When the --create-nsds option is specified, by default, 1% of the space is left as reserved and the remaining space is used to create the NSDs. The amount of reserved space is user-selectable and the default is 1% of the total raw space. Note that the percentage of reserved space is based on the total raw space (not on the available space) before any redundancy overhead is applied.

If the system already contains recovery groups and log vdisks (created in the previous steps), their creation can be skipped using the appropriate options. This can be useful when NSDs are recreated (for a change in the number of NSDs or block size, for example).

Note 1: This command could take some time to complete.

Note 2: NSDs in a building block are assigned to the same failure group by default. If you have multiple building blocks, the NSDs defined in each building block will have a different failure group for each building block. Carefully consider this information and change the failure group assignment when you are configuring the system for metadata and data replication.

For example, to create recovery groups, run:

gssgenclusterrgs -G gss_ppc64 --suffix=-hs

The system displays output similar to this:

[root@ems1 ~]# gssgenclusterrgs -G gss_ppc64 --suffix=-hs 

2015-06-16T00:12:22.176357 Determining peer nodes  
2015-06-16T00:12:23.786661 nodelist:   gssio1 gssio2  
2015-06-16T00:12:23.786749 Getting pdisk topology from node to create partner list gssio1  
2015-06-16T00:12:38.933425 Getting pdisk topology from node to create partner list gssio2  
2015-06-16T00:12:54.049202 Getting pdisk topology from node for recoverygroup creation. gssio1  
2015-06-16T00:13:06.466809 Getting pdisk topology from node for recoverygroup creation. gssio2  
2015-06-16T00:13:25.289541 Stanza files for node pairs  gssio1 gssio2  
/tmp/SV45221140L.stanza /tmp/SV45221140R.stanza  
2015-06-16T00:13:25.289604 Creating recovery group  rg_gssio1-hs  
2015-06-16T00:13:48.556966 Creating recovery group  rg_gssio2-hs  
2015-06-16T00:14:17.627686 Creating log vdisks in recoverygroup rg_gssio1-hs  
2015-06-16T00:15:14.117554 Creating log vdisks in recoverygroup rg_gssio2-hs  
2015-06-16T00:16:30.267607 Task complete.

See gssgenclusterrgs command for more information about this command.

Verify the recovery group configuration

To view the details for one of the recovery groups, log on to one of the I/O server nodes and run:

mmlsrecoverygroup

The system displays output similar to this:

[root@gssio1 ~]# mmlsrecoverygroup 

                    declustered 
                    arrays with  
recovery group        vdisks     vdisks  servers  
------------------  -----------  ------  -------  
 rg_gssio1-hs                 3       3  gssio1-hs.gpfs.net,gssio2-hs.gpfs.net  
 rg_gssio2-hs                 3       3  gssio2-hs.gpfs.net,gssio1-hs.gpfs.net

Running mmlsrecoverygroup with no parameters lists all of the recovery groups in your GPFS cluster. For each recovery group:

NVR contains the NVRAM devices used for the log tip vdisk.
SSD contains the SSD devices used for the log backup vdisk.
DA1 contains the SSD or HDD devices used for the log home vdisk and file system data.
If you used the --multi-da option with the gssgenclusterrgs command, you might see one or more additional DAs:
DAn, where n > 1 (depending on the ESS model), contains the SSD or HDD devices used for file system data.

To see the details of a specific recovery group, add the recovery group name and the -L option. For example, run:

mmlsrecoverygroup rg_gssio1-hs -L

The system displays output similar to this:

[root@gssio1 ~]# mmlsrecoverygroup rg_gssio1-hs -L 

                    declustered  
recovery group        arrays    vdisks  pdisks  format version  
-----------------  -----------  ------  ------  --------------  
rg_gssio1-hs                 3       3      61  4.1.0.1  
  
 declustered   needs                            replace                scrub       background activity  
    array     service  vdisks  pdisks  spares  threshold  free space  duration  task   progress  priority  
 -----------  -------  ------  ------  ------  ---------  ----------  --------  -------------------------  
 SSD          no            1       1     0,0          1     372 GiB   14 days  scrub        4%  low    
 NVR          no            1       2     0,0          1    3648 MiB   14 days  scrub        4%  low    
 DA1          no            1      58    2,31          2     101 TiB   14 days  scrub        0%  low 
 
  
                                         declustered                           checksum  
 vdisk               RAID code              array     vdisk size  block size  granularity  state remarks 
 ------------------  ------------------  -----------  ----------  ----------  -----------  ----- -------  
 rg_gssio1_hs_logtip    2WayReplication  NVR              48 MiB      2 MiB      4096      ok    logTip      
 rg_ssio1_hs_logtipbackup  Unreplicated  SSD              48 MiB      2 MiB      4096      ok    logTipBackup  
 rg_gssio1_hs_loghome   4WayReplication  DA1              20 GiB      2 MiB      4096      ok    log 

     
 config data         declustered array   VCD spares     actual rebuild spare space         remarks  
 ------------------  ------------------  -------------  ---------------------------------  ----------------  
 rebuild space       DA1                 31             35 pdisk       

                                       
 config data         max disk group fault tolerance     actual disk group fault tolerance  remarks  
------------------  ---------------------------------  ---------------------------------  ----------------  
 rg descriptor       4 drawer                           4 drawer                           limiting fault tolerance  
 system index        1 enclosure + 1 drawer             4 drawer                           limited by rg descriptor 


 vdisk               max disk group fault tolerance     actual disk group fault tolerance  remarks  
 ------------------  ---------------------------------  ---------------------------------  ----------------   
 rg_gssio1_hs_logtip        1 pdisk                            1 pdisk                                               
 rg_gssio1_hs_logtipbackup  0 pdisk                            0 pdisk                                               
 rg_gssio1_hs_loghome       1 enclosure + 1 drawer             3 drawer                    limited by rg descriptor 


active recovery group server                     servers  
-----------------------------------------------  -------  
gssio1-hs.gpfs.net                               gssio1-hs.gpfs.net,gssio2-hs.gpfs.net

Create the vdisk stanza

Use gssgenvdisks to create the vdisk stanza file. By default, the vdisk stanza is stored in /tmp/vdisk1.cfg. Optionally, gssgenvdisks can be used to create vdisks, NSDs, and the file system on existing recovery groups. If no recovery groups are specified, all available recovery groups are used. If the command is run on the management server node (or any other node) that is not part of the cluster, a contact node that is part of the cluster must be specified. The contact node must be reachable from the node (the management server node, for example) where the command is run.

You can use this command to add a suffix to vdisk names, which can be useful when creating multiple file systems. A unique suffix can be used with a vdisk name to associate it with a different file system (examples follow). The default reserve capacity is set to 1%. If the vdisk data block size is less than 8M, the reserved space should be increased with decreasing data vdisk block size.

See the gssgenvdisks command for more information.

This command can be used to create a shared-root file system for IBM Spectrum Scale protocol nodes. See Adding IBM Spectrum Scale nodes to an ESS cluster for more information.

Note: NSDs that are in the same building block are given the same failure group by default. If file system replication is set to 2 (m=2 or r=2), there should be more than one building block or the failure group of the NSDs must be adjusted accordingly.

In ESS 3.0 and later, the gssgenvdisks command includes an option for specifying the data vdisk size and the metadata vdisk size in GiB. When the metadata NSD size (due to a one-to-one mapping of NSDs to vdisks) and the metadata percentage are specified, the metadata NSD size takes precedence.

Reserved space considerations

When all available space is allocated, the reserved space should be increased with decreasing data vdisk block size. A default reserved space of 1% works well for a block size of up to 4 MB. For a 2 MB block size, 2% should be reserved. For a 1 MB block size, reserved space should be increased to 3%.

Example 1:

Create two file systems, one with 20 TB (two vdisks, 10 TB each), and the other with 40 TB (two vdisks, 20 TB each) with a RAID code of 8+3p.

To create a file system called fs1, run:

gssgenvdisks --contact-node gssio1 --create-vdisk --create-nsds --create-filesystem 
--vdisk-suffix=_fs1 --filesystem-name fs1 --data-vdisk-size 10240

The system displays output similar to this:

[root@ems1 ~]# gssgenvdisks --contact-node gssio1 --create-vdisk --create-nsds --create-filesystem 
--vdisk-suffix=_fs1 --filesystem-name fs1 --data-vdisk-size 10240  

2015-06-16T00:50:37.254906 Start creating vdisk stanza  
vdisk stanza saved in gssio1:/tmp/vdisk1.cfg  
2015-06-16T00:50:51.809024 Generating vdisks for nsd creation  
2015-06-16T00:51:27.409034 Creating nsds  
2015-06-16T00:51:35.266776 Creating filesystem  
Filesystem successfully created. Verify failure group of nsds and change as needed.  
2015-06-16T00:51:46.688937 Applying data placement policy  
2015-06-16T00:51:51.637243 Task complete.  

Filesystem      Size  Used Avail Use% Mounted on  
/dev/sda3       246G  2.9G  244G   2% /  
devtmpfs         60G     0   60G   0% /dev  
tmpfs            60G     0   60G   0% /dev/shm  
tmpfs            60G   43M   60G   1% /run  
tmpfs            60G     0   60G   0% /sys/fs/cgroup  
/dev/sda2       497M  161M  336M  33% /boot  
/dev/fs1         21T  160M   21T   1% /gpfs/fs1

The last line shows that file system fs1 was created.

To create a file system called fs2, run:

gssgenvdisks --contact-node gssio1 --create-vdisk --create-nsds --create-filesystem 
--vdisk-suffix=_fs2 --filesystem-name fs2 --data-vdisk-size 20480 --raid-code 8+3p

The system displays output similar to this:

[root@ems1 ~]# gssgenvdisks --contact-node gssio1 --create-vdisk --create-nsds --create-filesystem 
--vdisk-suffix=_fs2 --filesystem-name fs2 --data-vdisk-size 20480 --raid-code 8+3p  

2015-06-16T01:06:59.929580 Start creating vdisk stanza  
vdisk stanza saved in gssio1:/tmp/vdisk1.cfg  
2015-06-16T01:07:13.019100 Generating vdisks for nsd creation  
2015-06-16T01:07:56.688530 Creating nsds  
2015-06-16T01:08:04.516814 Creating filesystem  
Filesystem successfully created. Verify failure group of nsds and change as needed.  
2015-06-16T01:08:16.613198 Applying data placement policy  
2015-06-16T01:08:21.637298 Task complete.  

Filesystem      Size  Used Avail Use% Mounted on  
/dev/sda3       246G  2.9G  244G   2% /  
devtmpfs         60G     0   60G   0% /dev  
tmpfs            60G     0   60G   0% /dev/shm  
tmpfs            60G   43M   60G   1% /run  
tmpfs            60G     0   60G   0% /sys/fs/cgroup  
/dev/sda2       497M  161M  336M  33% /boot  
/dev/fs1         21T  160M   21T   1% /gpfs/fs1  
/dev/fs2         41T  160M   41T   1% /gpfs/fs2

The last line shows that file system fs2 was created.

To display the vdisk information, run:

mmlsvdisk

The system displays output similar to this:

[root@gssio1 ~]# mmlsvdisk  
                                                                        declustered  block size  
 vdisk name                        RAID code        recovery group         array       in KiB    remarks  
 ------------------                ---------------  ------------------  -----------  ----------  -------  
rg_gssio1_hs_Data_8M_2p_1_fs1      8+2p             rg_gssio1-hs        DA1                8192            
rg_gssio1_hs_Data_8M_3p_1_fs2      8+3p             rg_gssio1-hs        DA1                8192             
rg_gssio1_hs_MetaData_8M_2p_1_fs1  3WayReplication  rg_gssio1-hs        DA1                1024         
rg_gssio1_hs_MetaData_8M_3p_1_fs2  4WayReplication  rg_gssio1-hs        DA1                1024         
rg_gssio1_hs_loghome               4WayReplication  rg_gssio1-hs        DA1                2048  log        
rg_gssio1_hs_logtip                2WayReplication  rg_gssio1-hs        NVR                2048  logTip     
rg_gssio1_hs_logtipbackup          Unreplicated     rg_gssio1-hs        SSD                2048  logTipBackup  
rg_gssio2_hs_Data_8M_2p_1_fs1      8+2p             rg_gssio2-hs        DA1                8192            
rg_gssio2_hs_Data_8M_3p_1_fs2      8+3p             rg_gssio2-hs        DA1                8192            
rg_gssio2_hs_MetaData_8M_2p_1_fs1  3WayReplication  rg_gssio2-hs        DA1                1024         
rg_gssio2_hs_MetaData_8M_3p_1_fs2  4WayReplication  rg_gssio2-hs        DA1                1024         
rg_gssio2_hs_loghome               4WayReplication  rg_gssio2-hs        DA1                2048  log        
rg_gssio2_hs_logtip                2WayReplication  rg_gssio2-hs        NVR                2048  logTip     
rg_gssio2_hs_logtipbackup          Unreplicated     rg_gssio2-hs        SSD                2048  logTipBackup

Example 2a:

To create a file system with a block size of 1 MB using all available recovery groups and the default settings for all of the other options, run:

vim /var/log/gss/gssinstall.log  
gssgenvdisks --contact-node gssio1 --create-vdisk --create-filesystem --data-blocksize 1M 
--reserved-space 3

The system displays output similar to this:

[root@ems1 ~]# vim /var/log/gss/gssinstall.log  

[root@ems1 ~]# gssgenvdisks --contact-node gssio1 --create-vdisk --create-filesystem --data-blocksize 1M 
--reserved-space 3  

2015-06-16T01:49:07.963323 Start creating vdisk stanza  
vdisk stanza saved in gssio1:/tmp/vdisk1.cfg  
2015-06-16T01:49:21.210383 Generating vdisks for nsd creation  
2015-06-16T01:52:19.688953 Creating nsds  
2015-06-16T01:52:27.766494 Creating filesystem  
Filesystem successfully created. Verify failure group of nsds and change as needed.  
2015-06-16T01:52:47.249103 Applying data placement policy  
2015-06-16T01:52:51.896720 Task complete.

Example 2b:

To create a file system with a block size of 4 MB using all available recovery groups, 2% reserved space, and the default settings for all of the other options, run:

 
gssgenvdisks --contact-node gssio1 --create-vdisk --create-filesystem --data-blocksize 4M --reserved-space 2

The system displays output similar to this:

 
[root@ems1 ~]# gssgenvdisks --contact-node gssio1 --create-vdisk --create-filesystem  --data-blocksize 4M 
--reserved-space 2
  
2015-06-16T01:25:54.455588 Start creating vdisk stanzavdisk stanza saved in gssio1:/tmp/vdisk1.cfg  
2015-06-16T01:26:07.443263 Generating vdisks for nsd creation  
2015-06-16T01:27:46.671050 Creating nsds  
2015-06-16T01:27:54.296765 Creating filesystem  
Filesystem successfully created. Verify failure group of nsds and change as needed.  
2015-06-16T01:28:07.279192 Applying data placement policy  
2015-06-16T01:28:11.836822 Task complete.

Example 3:

Suppose you want to create three file systems. The first file system is called fsystem0. Keep 66% of the space reserved for future file system creation. For the second file system, fsystem1, keep 33% reserved. For the third file system, fsystem2, keep 1% reserved. Because you are going to create multiple file systems, you must specify a unique suffix for vdisk creation. Specify _fs0 as the suffix of the vdisk name for the first file system. Specify a RAID code of 8+3p for data vdisks.

First, run:

 
gssgenvdisks --create-vdisk --vdisk-suffix _fs0 --raid-code 8+3p  --create-filesystem 
--filesystem-name fsystem0 --reserved-space-percent 66

The system displays output similar to this:

[root@ems1 ~]# gssgenvdisks --create-vdisk --vdisk-suffix _fs0 --raid-code 8+3p  --create-filesystem 
--filesystem-name fsystem0 --reserved-space-percent 66  

2015-03-13T07:04:12.703294 Start creating vdisk stanza  
2015-03-13T07:04:12.703364 No contact node provided. Using current node.  ems1  
vdisk stanza saved in ems1:/tmp/vdisk1.cfg  
2015-03-13T07:04:33.088067 Generating vdisks for nsd creation  
2015-03-13T07:05:44.648360 Creating nsds  
2015-03-13T07:05:53.517659 Creating filesystem  
Filesystem successfully created. Verify failure group of nsds and change as needed.  
2015-03-13T07:06:07.416392 Applying data placement policy  
2015-03-13T07:06:12.748168 Task complete.

Next, run:

 
gssgenvdisks --create-vdisk --vdisk-suffix _fs1 --raid-code 8+3p  --create-filesystem 
--filesystem-name fsystem1 --reserved-space-percent 33

The system displays output similar to this:

[root@ems1 ~]# gssgenvdisks --create-vdisk --vdisk-suffix _fs1 --raid-code 8+3p  --create-filesystem 
--filesystem-name fsystem1 --reserved-space-percent 33  

2015-03-13T07:11:14.649102 Start creating vdisk stanza  
2015-03-13T07:11:14.649189 No contact node provided. Using current node.  ems1  
vdisk stanza saved in ems1:/tmp/vdisk1.cfg  
2015-03-13T07:11:34.998352 Generating vdisks for nsd creation  
2015-03-13T07:12:46.858365 Creating nsds  
2015-03-13T07:12:55.416322 Creating filesystem  
Filesystem successfully created. Verify failure group of nsds and change as needed.  
2015-03-13T07:13:09.488075 Applying data placement policy  
2015-03-13T07:13:14.756651 Task complete.

Then run:

 
gssgenvdisks --create-vdisk --vdisk-suffix _fs2 --raid-code 8+3p  --create-filesystem --filesystem-name 
fsystem2 --reserved-space-percent 1

The system displays output similar to this:

[root@ems1 ~]# gssgenvdisks --create-vdisk --vdisk-suffix _fs2 --raid-code 8+3p  --create-filesystem 
--filesystem-name fsystem2 --reserved-space-percent 1

2015-03-13T07:13:37.191809 Start creating vdisk stanza  
2015-03-13T07:13:37.191886 No contact node provided. Using current node.  ems1  
vdisk stanza saved in ems1:/tmp/vdisk1.cfg  
2015-03-13T07:13:57.548238 Generating vdisks for nsd creation  
2015-03-13T07:15:08.838311 Creating nsds  
2015-03-13T07:15:16.666115 Creating filesystem  
Filesystem successfully created. Verify failure group of nsds and change as needed.  
2015-03-13T07:15:30.532905 Applying data placement policy  
2015-03-13T07:15:35.876333 Task complete.

To display the vdisk information, run:

mmlsvdisk

The system displays output similar to this:


[root@ems1 ~]# mmlsvdisk  
                                                                        declustered  block size 
vdisk name                         RAID code        recovery group         array       in KiB    remarks  
------------------                 ---------------  ------------------  -----------  ----------  -------  
rg_gssio1_hs_Data_8M_3p_1_fs0      8+3p             rg_gssio1-hs        DA1                8192              
rg_gssio1_hs_Data_8M_3p_1_fs1      8+3p             rg_gssio1-hs        DA1                8192              
rg_gssio1_hs_Data_8M_3p_1_fs2      8+3p             rg_gssio1-hs        DA1                8192             
rg_gssio1_hs_MetaData_8M_3p_1_fs0  4WayReplication  rg_gssio1-hs        DA1                1024              
rg_gssio1_hs_MetaData_8M_3p_1_fs1  4WayReplication  rg_gssio1-hs        DA1                1024             
rg_gssio1_hs_MetaData_8M_3p_1_fs2  4WayReplication  rg_gssio1-hs        DA1                1024             
rg_gssio1_hs_loghome               4WayReplication  rg_gssio1-hs        DA1                2048  log        
rg_gssio1_hs_logtip                2WayReplication  rg_gssio1-hs        NVR                2048  logTip     
rg_gssio1_hs_logtipbackup          Unreplicated     rg_gssio1-hs        SSD                2048  logTipBackup  
rg_gssio2_hs_Data_8M_3p_1_fs0      8+3p             rg_gssio2-hs        DA1                8192             
rg_gssio2_hs_Data_8M_3p_1_fs1      8+3p             rg_gssio2-hs        DA1                8192             
rg_gssio2_hs_Data_8M_3p_1_fs2      8+3p             rg_gssio2-hs        DA1                8192             
rg_gssio2_hs_MetaData_8M_3p_1_fs0  4WayReplication  rg_gssio2-hs        DA1                1024             
rg_gssio2_hs_MetaData_8M_3p_1_fs1  4WayReplication  rg_gssio2-hs        DA1                1024             
rg_gssio2_hs_MetaData_8M_3p_1_fs2  4WayReplication  rg_gssio2-hs        DA1                1024             
rg_gssio2_hs_loghome               4WayReplication  rg_gssio2-hs        DA1                2048  log        
rg_gssio2_hs_logtip                2WayReplication  rg_gssio2-hs        NVR                2048  logTip     
rg_gssio2_hs_logtipbackup          Unreplicated     rg_gssio2-hs        SSD                2048  logTipBackup

To display the NSD information, run:

mmlsnsd

The system displays output similar to this:

[root@ems1 ~]# mmlsnsd  

File system        Disk name                          NSD servers                                     
---------------------------------------------------------------------------  
fsystem0           rg_gssio1_hs_Data_8M_3p_1_fs0      gssio1-hs,gssio2-hs     
fsystem0           rg_gssio1_hs_MetaData_8M_3p_1_fs0  gssio1-hs,gssio2-hs     
fsystem0           rg_gssio2_hs_Data_8M_3p_1_fs0      gssio2-hs,gssio1-hs     
fsystem0           rg_gssio2_hs_MetaData_8M_3p_1_fs0  gssio2-hs,gssio1-hs     
fsystem1           rg_gssio1_hs_Data_8M_3p_1_fs1      gssio1-hs,gssio2-hs     
fsystem1           rg_gssio1_hs_MetaData_8M_3p_1_fs1  gssio1-hs,gssio2-hs     
fsystem1           rg_gssio2_hs_Data_8M_3p_1_fs1      gssio2-hs,gssio1-hs     
fsystem1           rg_gssio2_hs_MetaData_8M_3p_1_fs1  gssio2-hs,gssio1-hs     
fsystem2           rg_gssio1_hs_Data_8M_3p_1_fs2      gssio1-hs,gssio2-hs     
fsystem2           rg_gssio1_hs_MetaData_8M_3p_1_fs2  gssio1-hs,gssio2-hs     
fsystem2           rg_gssio2_hs_Data_8M_3p_1_fs2      gssio2-hs,gssio1-hs     
fsystem2           rg_gssio2_hs_MetaData_8M_3p_1_fs2  gssio2-hs,gssio1-hs

Check the file system configuration

Use the mmlsfs command to check the file system configuration. This command is run on one of the cluster nodes. If the management server node is not part of the cluster, ssh to one of the cluster nodes. Run:

mmlsfs all

The system displays output similar to this:

[root@gssio1 ~]# mmlsfs all  

File system attributes for /dev/gpfs0:  
======================================  
flag                value                    description  
------------------- ------------------------ -----------------------------------  
 -f                 32768                    Minimum fragment size in bytes (system pool)  
                    262144                   Minimum fragment size in bytes (other pools)  
 -i                 4096                     Inode size in bytes  
 -I                 32768                    Indirect block size in bytes  
 -m                 1                        Default number of metadata replicas  
 -M                 2                        Maximum number of metadata replicas  
 -r                 1                        Default number of data replicas  
 -R                 2                        Maximum number of data replicas  
 -j                 scatter                  Block allocation type  
 -D                 nfs4                     File locking semantics in effect  
 -k                 all                      ACL semantics in effect  
 -n                 32                       Estimated number of nodes that will mount file system  
 -B                 1048576                  Block size (system pool)  
                    8388608                  Block size (other pools)  
 -Q                 none                     Quotas accounting enabled  
                    none                     Quotas enforced  
                    none                     Default quotas enabled  
 --perfileset-quota No                       Per-fileset quota enforcement  
 --filesetdf        No                       Fileset df enabled?  
 -V                 14.10 (4.1.0.4)          File system version  
 --create-time      Tue Jun 16 02:49:45 2015 File system creation time  
 -z                 No                       Is DMAPI enabled?  
 -L                 4194304                  Logfile size  
 -E                 Yes                      Exact mtime mount option  
 -S                 No                       Suppress atime mount option  
 -K                 whenpossible             Strict replica allocation option  
 --fastea           Yes                      Fast external attributes enabled?  
 --encryption       No                       Encryption enabled?  
 --inode-limit      134217728                Maximum number of inodes  
 --log-replicas     0                        Number of log replicas  
 --is4KAligned      Yes                      is4KAligned?  
 --rapid-repair     Yes                      rapidRepair enabled?  
 --write-cache-threshold 0                   HAWC Threshold (max 65536)  
 -P                 system;data              Disk storage pools in file system  
 -d                 rg_gssio1_hs_Data_8M_2p_1;  Disks in file system
                    rg_gssio1_hs_MetaData_8M_2p_1;
                    rg_gssio2_hs_Data_8M_2p_1;
                    rg_gssio2_hs_MetaData_8M_2p_1  											   
 -A                 yes                      Automatic mount option  
 -o                 none                     Additional mount options  
 -T                 /gpfs/gpfs0              Default mount point  
 --mount-priority   0                        Mount priority

Mount the file system

Mounting of the file system is performed for testing purposes only. Use the mmmount command to mount the file system:

mmmount device -a

where device is the name of the file system. The default file system name is gpfs0. For example, run:

mmmount gpfs0 -a

To check whether the file system is mounted properly, run:

mmlsmount gpfs0 -L

The system displays output similar to this:

[root@gssio1 ~]# mmlsmount gpfs0 -L                                          

File system gpfs0 is mounted on 2 nodes: 

  172.45.45.23    gssio1-hs                  
  172.45.45.24    gssio2-hs

To check file system space usage, run:

df

The system displays output similar to this:

[root@gssio1 ~]# df 

Filesystem        1K-blocks    Used    Available Use% Mounted on  
/dev/sda3         257922000 2943152    254978848   2% /  
devtmpfs           62265728       0     62265728   0% /dev  
tmpfs              62302080       0     62302080   0% /dev/shm  
tmpfs              62302080   43584     62258496   1% /run  
tmpfs              62302080       0     62302080   0% /sys/fs/cgroup  
/dev/sda2            508588  164580       344008  33% /boot  
/dev/gpfs0     154148405248  163840 154148241408   1% /gpfs/gpfs0

Initially after creation, you might see that the file system use is at 99%, temporarily.

Test the file system using gpfsperf

Use the gpfsperf script to run some basic I/O tests on the file system to measure the performance of the file system using a variety of I/O patterns. The results you obtain from this script are limited to the extent to which you use such components as disks, interconnect, clients, and bonding. The gpfsperf script is included with IBM Spectrum Scale. To run a basic I/O test by first sequentially creating a file, run this command:

/usr/lpp/mmfs/samples/perf/gpfsperf create seq /gpfs/gpfs0/testfile1 -n 200G -r 16M -th 4

The system displays output similar to this:

[root@gssio1 ~]# /usr/lpp/mmfs/samples/perf/gpfsperf create seq /gpfs/gpfs0/testfile1 -n 200G -r 16M -th 32 

/usr/lpp/mmfs/samples/perf/gpfsperf create seq /gpfs/gpfs0/testfile1  
recSize 16M nBytes 200G fileSize 16G  
nProcesses 1 nThreadsPerProcess 32  
file cache flushed before test  
not using direct I/O  
offsets accessed will cycle through the same file segment  
not using shared memory buffer  
not releasing byte-range token after open  
no fsync at end of test  
Data rate was 4689394.83 Kbytes/sec, thread utilization 0.925

The block size must match the data vdisk block size.

To verify that the ESS is operating as expected, you can use gpfsperf to run other I/O tests such as read and write.

For more information about this script, run:

/usr/lpp/mmfs/samples/perf/gpfsperf

Add nodes to the cluster

The management server node and additional I/O server nodes can be added to the ESS cluster using the gssaddnode command. The management server node is updated with the required RPMs during deployment and prepared to join the cluster if needed.

The I/O server nodes must be deployed properly and the high-speed network configured before gssaddnode can be used to add these nodes to the ESS cluster. gssaddnode adds the nodes to the cluster, runs the product license acceptance tool, configures the nodes (using gssServerConfig.sh or gssClientConfig.sh), and updates the host adapter, enclosure, and drive firmware. Do not use gssaddnode to add non-ESS (I/O server or management server) nodes to the cluster. Use mmaddnode instead.

On the gssaddnode command, the -N ADD-NODE-LIST option specifies the list of nodes that are being added. For the management server node, it is that node's hostname. The --nodetype option specifies the type of node that is being added. For the management server node, the value is ems. This command must run on the management server node when that node is being added. This command can be also used to add I/O server nodes to an existing cluster.

See gssaddnode command for more information about this command, including an example.

To check the number of nodes in the cluster, run:

mmlscluster

The system displays output similar to this:

[root@ems1 ~]# mmlscluster 

GPFS cluster information  
========================  

  GPFS cluster name:         test01.gpfs.net  
  GPFS cluster id:           14599547031220361759  
  GPFS UID domain:           test01.gpfs.net  
  Remote shell command:      /usr/bin/ssh  
  Remote file copy command:  /usr/bin/scp  
  Repository type:           CCR  


 Node  Daemon node name    IP address    Admin node name     Designation  
-------------------------------------------------------------------------  
   1   gssio1-hs.gpfs.net  172.45.45.23  gssio1-hs.gpfs.net  quorum-manager  
   2   gssio2-hs.gpfs.net  172.45.45.24  gssio2-hs.gpfs.net  quorum-manager  
   5   ems1-hs.gpfs.net    172.45.45.22  ems1-hs.gpfs.net    quorum

Check the installed software

Run the gssinstallcheck command to verify that the key components are installed correctly. See gssinstallcheck command for more information about this command.

Run a stress test

After the system is configured correctly and all marginal components are out of the system, run a stress test to stress the disk and network elements. Use the gssstress command to run a stress test on the system.

Note: gssstress is not a performance tool, so performance numbers shown should not be interpreted as performance of the system.

In the following example,gssstress is called from the management server node. The output of the first iteration is shown. Here, gssstress is run on I/O server nodes gssio1 and gssio2 with a target path of /gpfs/gpfs0, where the files are created. Run:

gssstress /gpfs/gpfs0 gssio1 gssio2

The system displays output similar to this:

[root@ems1 ~]# gssstress /gpfs/gpfs0 gssio1 gssio2 

1 gssio1 create  
1 gssio2 create  

Waiting for 1 create to finish  
create seq /gpfs/gpfs0/stressFile.1.gssio1 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 1728569.28 0.980  
create seq /gpfs/gpfs0/stressFile.1.gssio2 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 1706918.52 0.981  

1 gssio1 read  
1 gssio2 read  

Waiting for 1 read to finish  
read seq /gpfs/gpfs0/stressFile.1.gssio1 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 2776149.11 0.997  
read seq /gpfs/gpfs0/stressFile.1.gssio2 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 2776185.62 0.998  

1 gssio1 write  
1 gssio2 write  

Waiting for 1 write to finish  
write seq /gpfs/gpfs0/stressFile.1.gssio2 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 1735661.04 0.971  
write seq /gpfs/gpfs0/stressFile.1.gssio1 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 1733622.96 0.971  

1 gssio1 read  
1 gssio2 read  

Waiting for 1 read to finish  
read seq /gpfs/gpfs0/stressFile.1.gssio1 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 2774776.83 0.997  
read seq /gpfs/gpfs0/stressFile.1.gssio2 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 2770247.35 0.998

gpfsperf is run with the nolabels option, which produces one line of output for each test. The format of the output is: operation, I/O pattern, file name, record size, number of bytes, file size, number of processes, number of threads, stride records, inv, dio, shm, fsync, cycle, reltoken, aio, osync, rate, util.

Throughput is shown in the second field from the end of the line, as shown in bold typeface in the example. While the gssstress is running, you can log on to each node and run dstat to view the disk and network load in the node.

Graphic showing dstat output during installation

Note: By default, each iteration read and writes 800 GB. With 20 iterations, it will perform a total of 16 TB of I/O from each node and therefore could take some time to complete. For a shorter completion time, specify a lower iteration number, a shorter operation list, or both. The test can be interrupted by pressing <Ctrl-c>.

At the end of the test, check for error sreported in the message log and in the GPFS log. Here is an example of a drive media error reported by the kernel and GPFS daemon in the message log:

Dec 28 18:38:16 gssio5 kernel: sd 4:0:74:0: [sdin] CDB: 
Dec 28 18:38:16 gssio5 kernel: Read(32): 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 10 24 b4 90 10 24 b4 90 00 00 00 00 00 00 04 10 
Dec 28 18:38:16 gssio5 kernel: end_request: critical medium error, dev sdin, sector 270840976 
Dec 28 18:38:16 gssio5 mmfs: [E] Pdisk e1d2s03 of RG gssio5-hs path /dev/sdin: I/O error on read: sector 270840976 length 4112 err 5.

At the end of the stress test, check the enclosures and disks for any errors.

Check the enclosures

Use the mmlsenclosure command to show all of the attached enclosures. The needs service indicates whether an enclosure requires attention. Run:

mmlsenclosure all

The system displays output similar to this:

[root@gssio1 gpfs0]# mmlsenclosure all 

                    needs    
 serial number     service nodes 
 -------------     ------- ------  
 SV24819545        no      gssio1-ib0.data.net.gpfs.net  
 SV32300072        no      gssio1-ib0.data.net.gpfs.net

Use the mmlsenclosure command with the -L option to find details about the enclosure if the mmlsenclosure command output shows that an enclosure requires attention. Run:

mmlsenclosure SV24819545 -L  -N all

The system displays output similar to this:

[root@gssio1 gpfs0]# mmlsenclosure SV24819545 -L  -N all 

                    needs    
 serial number     service nodes 
 -------------     ------- ------  
 SV24819545        no      gssio1-ib0.data.net.gpfs.net,gssio2-ib0.data.net.gpfs.net  
   
 component type  serial number     component id    failed value   unit   properties  
 --------------  -------------     ------------    ------ -----   ----   ----------  
 dcm             SV24819545        DCM_0A          no                               
 dcm             SV24819545        DCM_0B          no                               
 dcm             SV24819545        DCM_1A          no                               
 dcm             SV24819545        DCM_1B          no                               
 dcm             SV24819545        DCM_2A          no                               
 dcm             SV24819545        DCM_2B          no                               
 dcm             SV24819545        DCM_3A          no                               
 dcm             SV24819545        DCM_3B          no                               
 dcm             SV24819545        DCM_4A          no                               
 dcm             SV24819545        DCM_4B          no                               
 
 component type  serial number     component id    failed value   unit   properties  
 --------------  -------------     ------------    ------ -----   ----   ----------  
 enclosure       SV24819545        ONLY            no                               
 
 component type  serial number     component id    failed value   unit   properties  
 --------------  -------------     ------------    ------ -----   ----   ----------  
 esm             SV24819545        ESM_A           no                    REPORTER   
 esm             SV24819545        ESM_B           no                    NOT_REPORTER  
 
 component type  serial number     component id    failed value   unit   properties  
 --------------  -------------     ------------    ------ -----   ----   ----------  
 fan             SV24819545        0_TOP_LEFT      no     4890    RPM               
 fan             SV24819545        1_BOT_LEFT      no     4940    RPM               
 fan             SV24819545        2_BOT_RGHT      no     4890    RPM               
 fan             SV24819545        3_TOP_RGHT      no     5040    RPM               
 
 component type  serial number     component id    failed value   unit   properties  
 --------------  -------------     ------------    ------ -----   ----   ----------  
 powerSupply     SV24819545        0_TOP           no                               
 powerSupply     SV24819545        1_BOT           no                               
 
 component type  serial number     component id    failed value   unit   properties  
 --------------  -------------     ------------    ------ -----   ----   ----------  
 tempSensor      SV24819545        DCM_0A          no     46      C                 
 tempSensor      SV24819545        DCM_0B          no     38      C                 
 tempSensor      SV24819545        DCM_1A          no     47      C                 
 tempSensor      SV24819545        DCM_1B          no     40      C                 
 tempSensor      SV24819545        DCM_2A          no     45      C                 
 tempSensor      SV24819545        DCM_2B          no     40      C                 
 tempSensor      SV24819545        DCM_3A          no     45      C                 
 tempSensor      SV24819545        DCM_3B          no     37      C                 
 tempSensor      SV24819545        DCM_4A          no     45      C                 
 tempSensor      SV24819545        DCM_4B          no     40      C                 
 tempSensor      SV24819545        ESM_A           no     39      C                 
 tempSensor      SV24819545        ESM_B           no     41      C                 
 tempSensor      SV24819545        POWERSUPPLY_BOT no     39      C                 
 tempSensor      SV24819545        POWERSUPPLY_TOP no     36      C                 
 
 component type  serial number     component id    failed value   unit   properties  
 --------------  -------------     ------------    ------ -----   ----   ----------  
 voltageSensor   SV24819545        12v             no     12      V                 
 voltageSensor   SV24819545        ESM_A_1_0v      no     0.98    V                 
 voltageSensor   SV24819545        ESM_A_1_2v      no     1.19    V                 
 voltageSensor   SV24819545        ESM_A_3_3v      no     3.31    V                 
 voltageSensor   SV24819545        ESM_A_5v        no     5.04    V                 
 voltageSensor   SV24819545        ESM_B_1_0v      no     1       V                 
 voltageSensor   SV24819545        ESM_B_1_2v      no     1.19    V                 
 voltageSensor   SV24819545        ESM_B_3_3v      no     3.31    V                 
 voltageSensor   SV24819545        ESM_B_5v        no     5.07    V

Check for failed disks

To find failed pdisks, use the mmlspdisk all command with the --not-ok option. Run:

mmlspdisk all --not-ok

The system displays output similar to this:

[root@gssio1]# mmlspdisk all --not-ok 

pdisk:  

       replacementPriority = 7.34    
       name = "e1d2s01"  
       device = ""  
       recoveryGroup = "gssio1"  
       declusteredArray = "DA1"  
       state = "failing/noPath/systemDrain/noRGD/noVCD/noData"  
       capacity  = 2000381018112  
       freeSpace = 1999307276288  
       fru = "42D0768"  
       location = "SV12616682-2-1"  
       WWN = "naa.5000C500262630DF"  
       server = "gssio1.gpfs.net"  
       reads = 295  
       writes = 915  
       bytesReadInGiB = 0.576  
       bytesWrittenInGiB = 1.157  
       IOErrors = 0  
       IOTimeouts = 0  
       mediaErrors = 0  
       checksumErrors = 0  
       pathErrors = 0  
       relativePerformance = 1.003  
       dataBadness = 0.000  
       rgIndex = 9  
       userLocation = "Enclosure SV12616682 Drawer 2 Slot 1"  
       userCondition = "replaceable"  
       hardware = "IBM-ESXS ST32000444SS BC2B 9WM40AQ10000C1295TH8"  
       hardwareType = Rotating 7200  
       nPaths = 0 active 0 total

mmlspdisk displays the details of the failed or failing disk, including the pdisk name, the enclosure (serial number), and the location of the disk.

Replacing a disk

If a disk fails and needs to be replaced, follow the proper disk replacement procedure. Improper disk replacement could greatly increase the possibility of data loss. Use the mmchcarrier command to replace a failed pdisk. This command updates the firmware automatically when replacing a disk. For more information about mmchcarrier, see IBM Spectrum Scale RAID: Administration.

Run gnrhealthcheck

After the stress test is complete, use the gnrhealthcheck script to make sure there are no new issues. Run:

gnrhealthcheck

The system displays output similar to this:

[root@gssio1 gpfs0]# gnrhealthcheck 

################################################################  
# Beginning topology checks.  
################################################################  
Topology checks successful.  
################################################################  
# Beginning enclosure checks.  
################################################################  
Enclosure checks successful.  
################################################################  
# Beginning recovery group checks.  
################################################################  
Recovery group checks successful.  
################################################################  
# Beginning pdisk checks.  
################################################################  
Pdisk checks successful.

See IBM Spectrum Scale RAID: Administration for more information about this script.

Collecting data

At the end of a successful configuration, collect configuration and service data. Use the gsssnap command to collect vdisk information. Save the output with an identifier so that it can be mapped to the installed system. Run the following command from any I/O server node:

gsssnap

The configuration and service data collected at the end of the installation can be very valuable during future problem determination and troubleshooting. Send the collected service data to your IBM representative.

See gsssnap script for more information about this command.

Cleaning up the system

If you need to perform a quick cleanup of the system, follow these steps:

ssh to any I/O server node
To delete the file system and the associated NSDs and vdisks, run:
```
/opt/ibm/gss/tools/samples/gssdelvdisks 
```
To shut down IBM Spectrum Scale and delete the cluster, run:
```
mmshutdown -a 
mmdelnode -N all
```