Installing the ESS software
This topic includes information about installing and configuring the ESS software.
This topic includes the installation and configuration procedure for an ESS 4.0 system with one or more building blocks. To complete this procedure, you need to have a working knowledge of Power Systems™ servers, IBM Spectrum Scale™, and xCAT.
For information about known issues, mitigation, and workarounds, see ESS 4.0.0 issues. Depending on which fix level you are installing, these might or might not apply to you.
For information about upgrading to ESS 4.0, see Upgrading the Elastic Storage Server.
Networking requirements
- Service network
This network connects the flexible service processor (FSP) on the management server and I/O server nodes with the HMC, as shown in yellow in Figure 1. The HMC runs the Dynamic Host Configuration Protocol (DHCP) server on this network. If the HMC is not included in the solution order, a customer-supplied HMC is used.
- Management and provisioning network
This network connects the management server to the I/O server nodes and HMCs, as shown as blue in Figure 1. The management server runs DHCP on the management and provisioning network. If a management server is not included in the solution order, a customer-supplied management server is used.
- Clustering network
This high-speed network is used for clustering and client node access. It can be a 10 Gigabit Ethernet (GbE), 40 GbE, or InfiniBand network. It might not be included in the solution order.
- External and campus management network
This public network is used for external and campus management of the management server, the HMC, or both.
The management and provisioning network and the service network must run as two non-overlapping networks implemented as two separate physical networks or two separate virtual local-area networks (VLANs).
The HMC, the management server, and the switches (1 GbE switches and high-speed switches) might not be included in a solution order in which an existing or customer-supplied HMC or management server is used. Perform any advance planning tasks that might be needed to access and use these solution components.

Installing the ESS 4.0 software
Preparing for the installation
- Obtain the current ESS 4.0 installation code from the Fix Central website.
To download from Fix Central, you must have entitlement for the given installation package. Check with your IBM® representative if you have questions.
- Obtain a Red Hat Enterprise Linux 7.1 ISO image (RHEL 7.1 Binary DVD) file
or DVD for 64-bit IBM Power Systems architecture,
for example:
rhel-server-7.1-ppc64-dvd.iso
For more information, see the Red Hat Enterprise Linux website.
Perform the following tasks and gather all required information before starting the installation process. Table 1 includes information about components that must be set up before you start installing the ESS 4.0 software.
For tips about how to name nodes, see Node name considerations.
| ESS component | Description | Required actions | System settings |
|---|---|---|---|
| 1. Service network | This private network connects the HMC with the management server's FSP and the I/O server nodes. The
service network must not be seen by the OS running on the node being
managed (that is, the management server or the I/O server node). The HMC uses this network to discover the management server and the I/O server nodes and perform such hardware management tasks as creating and managing logical partitions, allocating resources, controlling power, and rebooting. |
Perform any advance planning tasks that might
be needed to access and use the HMC if it is not part of the solution
order and a customer-supplied HMC will be used. Set up this network if it has not been set up already. |
Set the HMC to be the DHCP server for the service network. |
| 2. Management and provisioning network | This network connects the management server
node with the HMC and the I/O server nodes. It typically runs over
1Gb.
|
Perform any advance planning tasks that might
be needed to access and use the management server if it is not part
of the solution order and a customer-supplied management server will
be used. Set up this network if it has not been set up already. |
|
| 3. Clustering network | This network is for high-performance data access. In most cases, this network is also part of the clustering network. It is typically composed of 10GbE, 40GbE, or InfiniBand networking components. | Set up this network if it has not been set up already. | |
| 4. Management network domain | The management server uses this domain for the proper resolution of hostnames. | Set the domain name using lowercase characters. Do not use any uppercase characters. | Example:
|
| 5. HMC node (IP address and hostname) | The IP address of the HMC node on the management
network has a console name, which is the hostname
and a domain name.
|
Set the fully-qualified domain name (FQDN) and the hostname using lowercase characters. Do not use any uppercase characters. Do not use a suffix of -enx, where x is any character. Do not use an _ (underscore) in the hostname. | Example:
|
| 6. Management server node (IP address) | The IP address of the management server node
has an FQDN and a hostname.
|
Set the FQDN and hostname using lowercase characters. Do not use any uppercase characters. Do not use a suffix of -enx, where x is any character. Do not use an _ (underscore) in the hostname. | Example:
|
| 7. I/O server nodes (IP addresses) | The IP addresses of the I/O server nodes have FQDNs and hostnames.
|
Set the FQDN and hostname using lowercase characters. These names must match the name of the partition created for these nodes using the HMC. Do not use any uppercase characters. Do not use a suffix of -enx, where x is any character. Do not use an _ (underscore) in the host name. | Example: I/O server 1:
I/O server 2:
|
| 8. Management server node (management network interface) | The management network interface of the management server node must have the IP address that you set in item 6 assigned to it. This interface must have only one IP address assigned. | To obtain this address, run:
|
Example:
|
| 9. HMC (hscroot password) | Set the password for the hscroot user ID. | Example:
This is the default password. |
|
| 10. I/O servers (user IDs and passwords) | The user IDs and passwords of the I/O servers are assigned during deployment. | Example: User ID: root Password: cluster (this is the default password) |
|
| 11. Clustering network (hostname prefix or suffix) | This high-speed network is implemented on a 10Gb Ethernet, 40Gb Ethernet or InfiniBand network. | Set a hostname for this network. It is customary to use hostnames for the high-speed network that use the prefix and suffix of the actual hostname. Do not use a suffix of -enx, where x is any character. | Examples: Suffixes: -bond0, -ib, -10G, -40G Hostnames with a suffix: gssio1-ib, gssio2-ib |
| 12. High-speed cluster network (IP address) | The IP addresses of the management
server nodes and I/O server nodes on the high-speed cluster network
have FQDNs and hostnames. In the example, 172.10.0.11 is the IP address that the GPFS™ daemon uses for clustering. The corresponding FQDN and hostname are gssio1-ib and gssio1-ib.data.net, respectively. |
Set the FQDNs and hostnames. Do not make changes in the /etc/hosts file for the high-speed network until the deployment is complete. Do not create or enable the high-speed network interface until the deployment is complete. |
Example: Management server:
I/O server 1:
I/O server 2:
|
| 13. Red Hat Enterprise Linux 7.1 | The Red Hat Enterprise Linux 7.1 DVD or ISO file is used to create a temporary repository for the xCAT installation. xCAT uses it to create a Red Hat Enterprise Linux repository on the management server node. | Obtain this DVD or ISO file and download. For more information, see the Red Hat Enterprise Linux website. |
Example:
|
| 14. Management network switch | The switch that implements the management network must allow the Bootstrap Protocol (BOOTP) to go through. | Obtain the IP address and access credentials
(user ID and password) of this switch. Some switches generate many Spanning Tree Protocol (STP) messages, which interfere with the network boot process. You need to disable STP to mitigate this. |
|
| 15. Target file system | You need to provide information about the target file system that is created using storage in the ESS building blocks. | Set the target file system name, the mount point, the block size, the number of data NSDs, and the number of metadata NSDs. | Example:
|
The following is an example of a typical etc/hosts file.
[root@ems1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.45.131 hmc1.gpfs.net hmc1
192.168.45.20 ems1.gpfs.net ems1
192.168.45.21 gssio1.gpfs.net gssio1
192.168.45.22 gssio2.gpfs.net gssio2
172.16.45.20 ems1-hs.gpfs.net ems1-hs
172.16.45.21 gssio1-hs.gpfs.net gssio1-hs
172.16.45.22 gssio2-hs.gpfs.net gssio2-hs
Set up the HMC and the management server (MS)
For information about setting up the HMC network for use by xCAT, see the xCAT website .
- Make sure the POWER8® servers are powered on in standby mode.
- Connect the ESS I/O server nodes and the management server (if it is part of the order) to the HMC. If the HMC is not part of the order, you will need to provide it.
- Verify that the partitions of the I/O servers and the management server (if it is part of the order) are visible on the HMC. (The HMC might prompt you for the FSP password. The default password is abc123.) The HMC discovers the I/O server and management server nodes automatically when the nodes are powered on. If this does not happen, power cycle the nodes.
- Typically, server names, or central processor complex (CPC) names, are derived from the serial number. It is recommended that you do not change the server name. Make sure the server name and the logical partition (LPAR) name are not identical.
- The default partition names follow.
- Management server: ems1
- I/O server 1: gssio1
- I/O server 2: gssio2
- If there are more building blocks in the same order, the additional I/O server node partition names are: gssio3, gssio4, gssio5, ... gssion, where n is the total number of I/O servers.
- The management server nodes and I/O server nodes are shipped from IBM with Red Hat Enterprise Linux 7.1 installed in an R10 disk
array. The I/O server nodes are redeployed (including reinstallation
of Red Hat Enterprise Linux 7.1)
at the customer location from the management server. Typically, this
process takes approximately 30 minutes to complete. Completion of
this process ensures that the installation is consistent with various
site-specific parameters. It also minimizes configuration
mismatches and incompatibilities between the management server
nodes and I/O server nodes.
There is no need to reinstall the management server. It is reinstalled only if the OS cannot boot any more due to hardware damage or failure. See Installing Red Hat Enterprise Linux on the management server to reinstall the management server if needed.
- Verify that you can access the management server console using the HMC. After network connectivity is established to the management server node (see the next section), it is recommended that you access the management server over the network using an available secure shell (SSH) client such as PuTTY.
Configure an IP address for the xCAT network on the management server using the HMC console
- Log in to the system as root. The default root password from IBM is cluster.
- List the available interfaces, which should begin with a prefix
of enP7:
If you do not see any interfaces with a state of UP, check your network connections before proceeding. Also, verify that the correct interface is UP.ip link show | egrep "P7.*state UP" - Select the interface that ends with a suffix of f0. For example:
By default, enP7p128s0f0 is C10-port 0 and is configured at IBM with an IP address of 192.168.45.10, 192.168.45.11, or 192.168.45.20.enP7p128s0f0If enP7p128s0f0 is not up and another link is up, move the cable.
- Edit the network configuration for this interface and change it
as needed. The file name is:
In this file, change the value of BOOTPROTO from dhcp to static and set the value of ONBOOT to yes if it is not set already:/etc/sysconfig/network-scripts/ifcfg-enP7p128s0f0BOOTPROTO=static ONBOOT=yes - Add or change the management server's IP address and netmask as
needed. For example:
IPADDR=192.168.45.20 NETMASK=255.255.255.0 - Restart network services if the address is changed:
systemctl restart network - Verify that the management server's management network interface
is up. For example, run:
ping 192.168.45.20 - After the interface is configured, you can log in to the management server node using an SSH client.
Command sequence overview
If you are familiar with the ESS, review the Elastic Storage Server: Quick Deployment Guide for instructions on how to deploy and upgrade. This document, Deploying the Elastic Storage Server provides detailed instructions and information on the steps involved.
- Obtain the packed, compressed ESS 4.0 software. Unpack and uncompress
the software. For example, run:
The name of your ESS 4.0 software tar (.tgz) file could differ based on the IBM Spectrum Scale edition you are using and the fix levels of the ESS release you are installing.tar zxvf gss_install-4.0.0_ppc64_advanced_20160126T001311Z.tgz - Check the MD5 checksum:
md5sum -c gss_install-4.0.0_ppc64_advanced_20160126T001311Z.md5 - To make sure the /opt/ibm/gss/install directory
is clean, run:
/bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z--remove - Obtain the ESS 4.0 license, accept the license,
and run this command to extract the software:
/bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z --text-only - Clean up the current xCAT installation and associated
configuration:
gssdeploy -c - Install the ESS 4.0 packages on the management
server node:
gssinstall -m manifest -u - Customize the gssdeploy script and run
it to configure xCAT:
In this case, the gssdeploy script runs one step at a time, which is recommended, and waits for user responses.gssdeploy -x - Update the management server node:
updatenode ems1 -P gss_updatenode - If indicated by the previous step, reboot the management server node to reflect changes from the management server update node. After rebooting, run updatenode again if instructed to do so.
- Update OFED on the management server node:
updatenode ems1 -P gss_ofed - Reboot the management server node to reflect changes for the OFED update.
- Deploy on the I/O server nodes:
gssdeploy -d - Reboot the I/O server nodes after the deployment is complete before proceeding with the HW check.
Detailed installation steps follow.
Obtain the ESS 4.0 installation software and install it on the management server node
- Obtain the software from the Fix Central website. IBM Spectrum
Scale
The name of your ESS 4.0 software tar (.tgz) file could differ based on the edition you are using and the fix levels of the ESS release you are installing.
- Unpack and uncompress the file to create the installation software
and MD5 checksum of the installation software file. To unpack and uncompress the file, run this command:
The system displays output similar to this:tar zxvf gss_install-4.0.0_ppc64_advanced_20160126T001311Z.tgzroot@gems5 deploy]# tar zxvf gss_install-4.0.0_ppc64_advanced_20160126T001311Z.tgz gss_install-4.0.0_ppc64_advanced_20160126T001311Z gss_install-4.0.0_ppc64_advanced_20160126T001311Z.md5 - To verify the MD5 checksum of the software, run:
The system displays output similar to this:md5sum -c gss_install-4.0.0_ppc64_advanced_20160126T001311Z.md5[root@gems5 deploy]# md5sum -c tar zxvf gss_install-4.0.0_ppc64_advanced_20160126T001311Z.md5 gss_install-4.0.0_ppc64_advanced_20160126T001311Z: OK - To make sure the /opt/ibm/gss/install directory
is clean, run:
/bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z --remove - Use the gss_install* command to accept
the ESS 4.0 product license and install
the ESS 4.0 software package.
The ESS 4.0 installation software is integrated with the product license acceptance tool. To install the ESS 4.0 software, you must accept the product license. To accept the license and install the package, run the gss_install* command - for example: /bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z - with the appropriate options. The gss_install* command you run could differ based on the IBM Spectrum Scale edition you are using and the fix levels of the ESS release you are installing.
For example, run:
See gss_install* command for more information about this command./bin/sh gss_install-4.0.0_ppc64_advanced_20160126T001311Z --text-only - Clean the current xCAT installation and associated
configuration:
gssdeploy -c - By default, the product license acceptance
tool places the code in the following directory:
You can use the -dir option to specify a different directory./opt/ibm/gss/install - Run the change directory command:
cd /opt/ibm/gss/install - Use the gssinstall script to install
the ESS 4.0 packages on the management
server node. This script is in the /opt/ibm/gss/install/installer directory. For example, run:
The system displays output similar to this:/opt/ibm/gss/install/installer/gssinstall -m /opt/ibm/gss/install/manifest -u
See gssinstall script for more information about this script.# /opt/ibm/gss/install/installer/gssinstall -m /opt/ibm/gss/install/manifest -u [INFO]: GSS package installer [INFO]: Using LOG: /var/log/gss/gssinstall.log [INFO]: [EMS] Audit Summary: [INFO]: [EMS] Manifest Ver: 4.0.0-20160126T001311Z_ppc64_advanced [INFO]: [EMS] Group gpfs RPMs: Not Inst: 10, Current: 0, New: 0, Old: 0 [INFO]: [EMS] Group gss RPMs: Not Inst: 2, Current: 0, New: 0, Old: 0 [INFO]: [EMS] Group gui RPMs: Not Inst: 3, Current: 0, New: 0, Old: 0 [INFO]: [EMS] Group ofed RPMs: Not Inst: 1, Current: 0, New: 0, Old: 0 [INFO]: [EMS] Group xcat-core RPMs: Not Inst: 5, Current: 0, New: 0, Old: 0 [INFO]: [EMS] Group xcat-dfm RPMs: Not Inst: 2, Current: 0, New: 0, Old: 0 [RESP]: Install EMS software repositories? [y/n]: y [INFO]: Installing EMS software repository to (/install/gss) [INFO]: Creating yum repo data for gss pkgs (Please wait...) [INFO]: GSS package installer - Update complete.
Configure the installed packages on the management server node and prepare for deployment
- Copy the gssdeploy script from the /opt/ibm/gss/install/samples directory to
another directory and then customize the copy to match your environment.
You need to make changes to several lines at the top of your copy
of this script for the target configuration, as shown in bold typeface
in the following example. For ESS 4.0, DEPLOY_OSIMAGE must be set to rhels7.1-ppc64-install-gss. You might see other OSIMAGE values that correspond to earlier releases (xCAT command lsdef -l osimage, for example).
######################################################################### # # Customize/change following to your environment # ######################################################################### #[RHEL] # Set to Y if RHEL DVD is used otherwise iso is assumed. RHEL_USE_DVD="N" # Device location of RHEL DVD used instead of iso RHEL_DVD="/dev/cdrom" # Mount point to use for RHEL media. RHEL_MNT="/opt/ibm/gss/mnt" # Directory containing ISO. RHEL_ISODIR=/opt/ibm/gss/iso # Name ISO file. RHEL_ISO="RHEL-7.1-20150219.1-Server-ppc64-dvd1.iso" #[EMS] # Hostname of EMS EMS_HOSTNAME="ems1" # Network interface for xCAT management network EMS_MGTNETINTERFACE="enP7p128s0f0" #[HMC] # Hostname of HMC HMC_HOSTNAME="hmc1" # Default userid of HMC HMC_ROOTUID="hrcroot" # Default password of HMC HMC_PASSWD="Passw0rd" #[IOSERVERS] # Default userid of IO Server. IOSERVERS_UID="root" # Default password of IO Server. IOSERVERS_PASSWD="cluster" # Array of IO servers to provision and deploy. IOSERVERS_NODES=(gssio1 gssio2) #[DEPLOY] # OSIMAGE stanza to deploy to IO servers. DEPLOY_OSIMAGE="rhels7.1-ppc64-install-gss" ######################################################################## # # End of customization # ######################################################################## - Run the gssdeploy script.
The gssdeploy script can be run in interactive mode or non-interactive ("silent") mode. Running gssdeploy in interactive mode is recommended.
The gssdeploy script is run in two phases. In the first phase, it is run with the -x option to set up the management server and xCAT. In the second phase, it is run with the -d option to deploy on the I/O server node.
See gssdeploy script for more information about this script.
Every step of the gssdeploy script shows the current step to be run and a brief description of the step. For example, the command to be run is shown in bold typeface and the response of the command in shown in italics:[STEP]: Deploy 4 of 7,, Set osimage attributes for the nodes so current values will be used for rnetboot or updatenode [CMD]: => nodeset gss_ppc64 osimage=rhels7.1-ppc64-install-gss Enter 'r' to run [CMD]: Enter 's' skip this step, or 'e' to exit this script Enter response: r [CMD_RESP]: gssio1: install rhels7.1-ppc64-gss [CMD_RESP]: gssio2: install rhels7.1-ppc64-gss [CMD_RESP]: RC: 0 - Configure xCAT and the management server node
To configure xCAT and the management server node, you will run gssdeploy -x. If xCAT is installed on the node already, the script will fail. If it fails, clean the previous xCAT installation by running gssdeploy -c.
Suppose your modified gssdeploy script is in the /home/deploy directory. Run:
The script goes through several steps and configures xCAT on the management server node. Some of the steps (those in which copycds or getmacs is run, for example) take some time to complete./home/deploy/gssdeploy -x - Update the management server node. In this step, the IBM Spectrum Scale RPMs, kernel, and OFED updates are installed on the node. This step prepares the node to run as a cluster member node in the IBM Spectrum Scale RAID cluster.
- Run the updatenode ManagementServerNodeName -P
gss_updatenode command. For example, run:
The system displays output similar to this:updatenode ems1 -P gss_updatenode
This step could take some time to complete if vpdupdate is run before the actual update. To determine whether you are waiting for vpdupdate, run this command:[root@ems1 deploy]# updatenode ems1 -P gss_updatenode ems1: Mon Jun 15 18:02:50 CDT 2015 Running postscript: gss_updatenode ems1: gss_updatenode [INFO]: Using LOG: /var/log/xcat/xcat.log ems1: gss_updatenode [INFO]: Performing update on ems1 ems1: gss_updatenode [INFO]: Erasing gpfs rpms ems1: gss_updatenode [INFO]: Erase complete ems1: gss_updatenode [INFO]: Updating ospkgs on ems1 (Please wait...) ems1: gss_updatenode [INFO]: Version unlocking kernel for the update ems1: gss_updatenode [INFO]: Disabling repos: ems1: gss_updatenode [INFO]: Updating otherpkgs on ems1 (Please wait...) ems1: gss_updatenode [INFO]: Enabling repos: ems1: gss_updatenode [INFO]: Version locking kernel ems1: gss_updatenode [INFO]: Checking that GPFS GPL layer matches running kernel ems1: gss_updatenode [INFO]: GPFS GPL layer matches running kernel ems1: gss_updatenode [INFO]: Checking that OFED ISO supports running kernel ems1: gss_updatenode [INFO]: Upgrade complete ems1: Postscript: gss_updatenode exited with code 0 ems1: Running of postscripts has completed.
The system displays output similar to this:ps ef | grep vpd
After the updatenode command completes, you should see an exit code of 0.[root@ems1 ~]# ps ef | grep vpd root 75272 75271 0 17:05 ? 00:00:00 /usr/sbin/lsvpd root 75274 75272 0 17:05 ? 00:00:00 sh -c /sbin/vpdupdate >/dev/null 2>&1 root 75275 75274 2 17:05 ? 00:00:03 /sbin/vpdupdate root 76106 73144 0 17:08 pts/0 00:00:00 grep -color=auto vpd - Reboot the node only if you are instructed to do so. Also, run the script again if you rebooted.
- Run the OFED update using updatenode ManagementServerNodeName -P
gss_updatenode. Your
version of OFED may be different than what is shown here. For OFED
update, run:
The system displays output similar to this:updatenode ems1 -P gss_ofed[root@ems1 deploy]# updatenode ems1 -P gss_ofed ems1: Mon Jun 15 18:20:54 CDT 2015 Running postscript: gss_ofed ems1: Starting to install OFED..... ems1: Mellanox controller found, install Mellanox OFED ems1: Unloading HCA driver:[ OK ] ems1: Mounting OFED ISO... ems1: /tmp //xcatpost ems1: mount: /dev/loop0 is write-protected, mounting read-only ems1: Loaded plugins: product-id, subscription-manager, versionlock ems1: This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. ems1: Error: Error: versionlock delete: no matches ems1: Installing OFED stack... ems1: TERM environment variable not set. ems1: Logs dir: /tmp/MLNX_OFED_LINUX-3.1-1.0.0.2.logs ems1: ems1: Log File: /tmp/MLNX_OFED_LINUX-3.1-1.0.0.2.logs/fw_update.log ems1: Unloading HCA driver:[ OK ] ems1: Loading HCA driver and Access Layer:[ OK ] ems1: Loaded plugins: product-id, subscription-manager, versionlock ems1: This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. ems1: Adding versionlock on: 0:dapl-devel-2.1.3mlnx-OFED.2.4.37.gb00992f ems1: Adding versionlock on: 0:srptools-1.0.1-OFED.2.4.40.g68b353c-OFED.2.3.47.gc8011c5 . . . ems1: Adding versionlock on: 0:opensm-devel-4.3.0.MLNX20141222.713c9d5-0.1 ems1: versionlock added: 60 ems1: //xcatpost ems1: Postscript: gss_ofed exited with code 0 ems1: Running of postscripts has completed. - Reboot the node after the OFED update is complete.
- To make sure the OFED is updated and reflects the installed kernel,
run this command:
The system displays output similar to this:ofed_info | grep -e kernel | grep ppc64[root@ems1 deploy]# ofed_info | grep -e kernel | grep ppc64 kernel-mft-3.8.0-3.10.0_229.el7.ppc64.ppc64 kernel-ib-devel-2.4-3.10.0_229.el7.ppc64_OFED.2.4.1.0.2.1.ge234f2b.ppc64 kernel-ib-2.4-3.10.0_229.el7.ppc64_OFED.2.4.1.0.2.1.ge234f2b.ppc64
- Run the updatenode ManagementServerNodeName -P
gss_updatenode command. For example, run:
Deploy the nodes
- Close all console (rcons) sessions on the management server and on the HMC.
- If the switch is supplied by the customer (that is, not shipped from IBM), make sure all nodes can communicate using BOOTP and there are no excessive STP messages. BOOTP could fail in the presence of excessive STP messages. You might consider enabling PortFast on the ports that are connected to the I/O server node.
- Make sure no other DHCP server is acting on the network.
- Make sure the external JBOD storage is powered off or disconnected.
gssdeploy -d
At this point, the I/O server nodes are restarted and
the OS and other software packages are installed on them. Monitoring the I/O server node installation process
Use the remote console feature of xCAT to monitor the installation process. The preferred method for monitoring the progress is to watch the console logs using the Linux tailf command.
tailf /var/log/consoles/gssio1
rcons NodeName
If you connect to the console when the Red Hat installer, called Anaconda, is running, you are sent to a menu system. To display various menus, press <Ctrl-b> n, where n is the number of the menu you want to view. For example, if you press <Ctrl-b> 2, you are placed in the Anaconda shell. It is recommended that you not perform any actions using the Anaconda menu unless instructed to do so.
makeconservercf
nodestat gss_ppc64
The system displays output similar to this: [root@ems1 ~]#nodestat gss_ppc64
gssio1: installing post
gssio2: installing post
[root@ems1 ~]#nodestat gss_ppc64
gssio1: sshd
gssio2: sshd
xdsh gss_ppc64 "ps -eaf | grep -v grep | grep xcatpost"
If there are any processes still running, wait for them
to complete. It is possible that the installation could fail due to network boot issues. If the installation fails, run makeconservercf before trying it again. Retry the installation at least three times and see if that fixes the issue.
nodeset gssio2 osimage=rhels7.1-ppc64-install-gss
rnetboot gssio2 -V
This command sequence restarts the installation process
on gssio2. Monitor the console using tailf or rcons.
Check the messages that are displayed during the initial phase of
the boot process. Most issues will occur during this phase. Check for synchronization files
As part of the operating system and I/O server code installation, xCAT runs post-installation scripts. These scripts install the required RPMs, upgrade and configure the networks (10 GbE, 40GbE, and InfiniBand), and configure the SAS adapters.
xdsh gss_ppc64 "ls /install/gss/sync"
The system displays output similar to this:
gssio1: mofed
gssio2: mofed
updatenode gss_ppc64 -F
Check for post-installation scripts
updatenode gss_ppc64 -V -P gss_ofed,gss_sashba
The updatenode command could
take some time to complete. This is because updatenode calls vpdupdate on
the node. You can check by running ps -ef | grep vpd on
each node. If you see vpdupdate running,
the updatenode command is waiting for it
to complete. Apply Red Hat updates
After deployment is complete, you can apply Red Hat updates as needed. Note that kernel and OFED components are matched with the ESS software stack and are therefore locked during deployment to prevent unintended changes during update.
See Red Hat Enterprise Linux update considerations for additional considerations.
Check the system hardware
- gssstoragequickcheck checks the server, adapter, and storage configuration quickly.
- gssfindmissingdisks checks the disk paths and connectivity.
- gsscheckdisks checks for disk errors under various I/O operations.
Power on JBODs
After the I/O server nodes have been installed successfully, power on the JBODs. Wait approximately 5 to 10 minutes from power on to discover the disks before moving on to the next step.
System check 1: run gssstoragequickcheck
gssstoragequickcheck -G gss_ppc64
The system displays output similar to this: [root@ems1 deploy]# gssstoragequickcheck -G gss_ppc64
2015-06-15T20:17:07.036867 Start of storage quick configuration check
2015-06-15T20:17:08.745084 nodelist: gssio1 gssio2
gssio1: Machine Type: 8247-22L
gssio2: Machine Type: 8247-22L
gssio1: Valid SAS Adapter Configuration. Number of Adapter(s) found 3
gssio1: Valid Network Adapter Configuration. Number of Adapter(s) found: 3
gssio2: Valid SAS Adapter Configuration. Number of Adapter(s) found 3
gssio2: Valid Network Adapter Configuration. Number of Adapter(s) found: 3
gssio1: Enclosure DCS3700 found 2
gssio1: Disk ST2000NM0023 found 116
gssio1: SSD PX02SMF040 found 2
gssio1: Total disk found 116, expected 116
gssio1: Total SSD found 2, expected 2
gssio2: Enclosure DCS3700 found 2
gssio2: Disk ST2000NM0023 found 116
gssio2: SSD PX02SMF040 found 2
gssio2: Total disk found 116, expected 116
gssio2: Total SSD found 2, expected 2
2015-06-15T20:17:25.670645 End of storage quick configuration check
xdsh gss_ppc64 "modprobe mpt2sas"
After running modprobe, run gssstoragequickcheck again. See gssstoragequickcheck command for more information about this command.
System check 1a: run lsifixnv
xdsh gss_ppc64 "/xcatpost/gss_sashba"
System check 1b: Check the RAID firmware
xdsh ems1,gss_ppc64 "for IOA in \$(lsscsi -g | grep SISIOA | awk '{print \$NF}');
do iprconfig -c query-ucode-level \$IOA; done"
The system displays output similar to this: [root@ems1 deploy]# xdsh ems1,gss_ppc64 "for IOA in \$(lsscsi -g | grep SISIOA |
awk '{print \$NF}'); do iprconfig -c query-ucode-level \$IOA; done"
ems1: 12511700
gssio2: 12511700
gssio1: 12511700
If this system is upgraded from a previous version, you
might see a RAID firmware level of 12511400. If the RAID adapter firmware is not at the correct level, contact the IBM Support Center for update instructions.
System check 1c: Make sure 64-bit DMA is enabled for InfiniBand slots
xdsh gss_ppc64,bgqess-mgt1 journalctl -b | grep 64-bit | grep -v dma_rw | grep mlx
The system displays output similar to this: [root@ems1 gss]# xdsh gss_ppc64,bgqess-mgt1 journalctl -b | grep 64-bit | grep -v dma_rw | grep mlx
gssio1: Feb 13 09:28:34 bgqess-gpfs02.scinet.local kernel: mlx5_core 0000:01:00.0: Using 64-bit direct DMA at offset 800000000000000
gssio1: Feb 13 09:29:02 bgqess-gpfs02.scinet.local kernel: mlx5_core 0004:01:00.0: Using 64-bit direct DMA at offset 800000000000000
gssio1: Feb 13 09:29:30 bgqess-gpfs02.scinet.local kernel: mlx5_core 0009:01:00.0: Using 64-bit direct DMA at offset 800000000000000
gssio2: Jan 30 16:46:55 bgqess-gpfs01.scinet.local kernel: mlx5_core 0000:01:00.0: Using 64-bit direct DMA at offset 800000000000000
gssio2: Jan 30 16:47:23 bgqess-gpfs01.scinet.local kernel: mlx5_core 0004:01:00.0: Using 64-bit direct DMA at offset 800000000000000
gssio2: Jan 30 16:47:50 bgqess-gpfs01.scinet.local kernel: mlx5_core 0009:01:00.0: Using 64-bit direct DMA at offset 800000000000000
mgt1: Jan 26 16:55:41 bgqess-mgt1 kernel: mlx5_core 0004:01:00.0: Using 64-bit direct DMA at offset 800000000000000
Make sure you see all of the InfiniBand
devices in this list. This sample output includes the following device
numbers: 0000:01:00.0, 0004:01:00.0,
and 0009:01:00.0. The slot-to-device assignments
for the Connect-IB adapter follow: - Slot
- Device
- C5
- 0009:01:00.0
- C6
- 0004:01:00.0
- C7
- 0000:01:00.0
- Make sure the OS or partition is shut down.
- Click on server on the HMC GUI -> Operations -> Launch ASM.
- On the Welcome pane, specify your user ID and password. The default user ID is admin. The default password is abc123.
- In the navigation area, expand System Configuration -> System -> I/O Adapter Enlarged Capacity.
- Select Enable and specify I/O Adapter Enlarged Capacity 11. This specifies all slots, because the I/O server nodes have 11 slots.
- Save your settings.
- Restart the server so the changes will take effect.
System check 2: run gssfindmissingdisks
Run the gssfindmissingdisks command to verify that the I/O server nodes are cabled properly. This command reports the status of the disk paths. See gssfindmissingdisks command for more information about this command.
gssfindmissingdisks -G gss_ppc64
The
system displays output similar to this: [root@ems1 deploy]# gssfindmissingdisks -G gss_ppc64
2015-06-15T20:27:18.793026 Start find missing disk paths
2015-06-15T20:27:20.556384 nodelist: gssio1 gssio2
2015-06-15T20:27:20.556460 May take long time to complete search of all drive paths
2015-06-15T20:27:20.556501 Checking missing disk paths from node gssio1
gssio1 Enclosure SV45221140 (number 1):
gssio1 Enclosure SV45222733 (number 2):
gssio1: GSS configuration: 2 enclosures, 2 SSDs, 2 empty slots, 118 disks total, 6 NVRAM partitions
2015-06-15T20:27:37.698284 Checking missing disk paths from node gssio2
gssio2 Enclosure SV45221140 (number 1):
gssio2 Enclosure SV45222733 (number 2):
gssio2: GSS configuration: 2 enclosures, 2 SSDs, 2 empty slots, 118 disks total, 6 NVRAM partitions
2015-06-15T20:27:54.827175 Finish search for missing disk paths. Number of missing disk paths: 0
When there are missing drive paths, the command reports
possible configuration or hardware errors: [root@ems1 setuptools]# ./gssfindmissingdisks -G gss_ppc64
2014-10-28T04:23:45.714124 Start finding missing disks
2014-10-28T04:23:46.984946 nodelist: gssio1 gssio2
2014-10-28T04:23:46.985026 Checking missing disks from node gssio1
gssio1: Enclosure SV24819545 (number undetermined): 4-7
gssio1: Enclosure SV24819545 (number undetermined): 4-9
gssio1: Enclosure SV32300072 (number undetermined): 5-5
2014-10-28T04:25:10.587857 Checking missing disks from node gssio2
gssio2: Enclosure SV24819545 (number undetermined): 2-9
gssio2: Enclosure SV24819545 (number undetermined): 3-4
gssio2: Enclosure SV24819545 (number undetermined): 4-6
2014-10-28T04:26:33.253075 Finish search for missing disks. Number of missing disks: 6
In this example, the path to the disks is different from
each I/O server node. Missing drives are shown in a different node
view. It is most likely not a physical drive issue, but rather a cable
or other subsystem issue. scsi3[19.00.00.00] U78CB.001.WZS0043-P1-C2-T1
scsi4[19.00.00.00] U78CB.001.WZS0043-P1-C2-T2 [P1 SV32300072 ESM A (sg67)] [P2
SV24819545 ESM B (sg126)]
scsi5[19.00.00.00] U78CB.001.WZS0043-P1-C3-T1
scsi6[19.00.00.00] U78CB.001.WZS0043-P1-C3-T2 [P2 SV24819545 ESM A (sg187)]
scsi1[19.00.00.00] U78CB.001.WZS0043-P1-C11-T1
scsi2[19.00.00.00] U78CB.001.WZS0043-P1-C11-T2 [P2 SV32300072 ESM B (sg8)]
For information about hardware ports, cabling, PCIe adapter
installation, and SSD placement, see Cabling the Elastic Storage Server. System check 2a: run mmgetpdisktopology
Use the gssfindmissingdisks command to verify the I/O server JBOD disk topology. If gssfindmissingdisks shows one or more errors, run the mmgetpdisktopology and topsummary commands to obtain more detailed information about the storage topology for further analysis. These commands are run from the I/O server nodes. It is a best-practice recommendation to run these commands once on each I/O server node.
For more information about mmgetpdisktopology and topsummary, see IBM Spectrum Scale RAID: Administration.
mmgetpdisktopology | topsummary
The
system displays output similar to this: [root@gssio1 ~]# mmgetpdisktopology | topsummary
/usr/lpp/mmfs/bin/topsummary: reading topology from standard input
GSS enclosures found: SV45221140 SV45222733
Enclosure SV45221140 (number 1):
Enclosure SV45221140 ESM A sg188[039A][scsi6 port 2] ESM B sg127[039A][scsi4 port 2]
Enclosure SV45221140 Drawer 1 ESM sg188 12 disks diskset "10026" ESM sg127 12 disks diskset "10026"
Enclosure SV45221140 Drawer 2 ESM sg188 12 disks diskset "51918" ESM sg127 12 disks diskset "51918"
Enclosure SV45221140 Drawer 3 ESM sg188 12 disks diskset "64171" ESM sg127 12 disks diskset "64171"
Enclosure SV45221140 Drawer 4 ESM sg188 12 disks diskset "02764" ESM sg127 12 disks diskset "02764"
Enclosure SV45221140 Drawer 5 ESM sg188 12 disks diskset "34712" ESM sg127 12 disks diskset "34712"
Enclosure SV45221140 sees 60 disks
Enclosure SV45222733 (number 2):
Enclosure SV45222733 ESM A sg68[039A][scsi4 port 1] ESM B sg9[039A][scsi2 port 2]
Enclosure SV45222733 Drawer 1 ESM sg68 11 disks diskset "28567" ESM sg9 11 disks diskset "28567"
Enclosure SV45222733 Drawer 2 ESM sg68 12 disks diskset "04142" ESM sg9 12 disks diskset "04142"
Enclosure SV45222733 Drawer 3 ESM sg68 12 disks diskset "29724" ESM sg9 12 disks diskset "29724"
Enclosure SV45222733 Drawer 4 ESM sg68 12 disks diskset "31554" ESM sg9 12 disks diskset "31554"
Enclosure SV45222733 Drawer 5 ESM sg68 11 disks diskset "13898" ESM sg9 11 disks diskset "13898"
Enclosure SV45222733 sees 58 disks
GSS configuration: 2 enclosures, 2 SSDs, 2 empty slots, 118 disks total, 6 NVRAM partitions
scsi3[20.00.02.00] U78CB.001.WZS06M2-P1-C2-T1
scsi4[20.00.02.00] U78CB.001.WZS06M2-P1-C2-T2 [P1 SV45222733 ESM A (sg68)] [P2 SV45221140 ESM B (sg127)]
scsi5[20.00.02.00] U78CB.001.WZS06M2-P1-C3-T1
scsi6[20.00.02.00] U78CB.001.WZS06M2-P1-C3-T2 [P2 SV45221140 ESM A (sg188)]
scsi0[20.00.02.00] U78CB.001.WZS06M2-P1-C11-T1
scsi2[20.00.02.00] U78CB.001.WZS06M2-P1-C11-T2 [P2 SV45222733 ESM B (sg9)]
Depending on the model and configuration you may see
references to enclosure numbers up to 6. This summary is produced
by analyzing the SAS physical topology. - The first line, is a list of the enclosure mid-plane serial numbers,
for some enclosure type (DCS3700, for example). This serial number
does not appear anywhere on the enclosure itself. The second line
shows the enclosure ordering based on the cabling. A system with incorrect
cabling will show that the enclosure number is undetermined. The third
line shows the enclosure's serial number, then ESM A and ESM
B, each followed by a SCSI generic device number that is
assigned by the host:
The number in the first set of brackets is the code level of the ESM. The ports of the SCSI device are enclosed in the second set of brackets. The SCSI generic device number (sg188 or sg127, for example) is also shown in the gsscheckdisk path output of drive performance and error counter.Enclosure SV45221140 ESM A sg188[039A][scsi6 port 2] ESM B sg127[039A][scsi4 port 2] - Enclosures are numbered physically from bottom to top within a building block. Enclosure 1 is the bottom enclosure; enclosure 6 is the top enclosure.
- Analyze the output:
Each line shows two disk-set numbers, one from ESM A and the other from ESM B.Enclosure SV45221140 (number 1): Enclosure SV45221140 ESM A sg188[039A][scsi6 port 2] ESM B sg127[039A][scsi4 port 2] Enclosure SV45221140 Drawer 1 ESM sg188 12 disks diskset "10026" ESM sg127 12 disks diskset "10026" ^ ^The disk-set number is the checksum of the serial numbers of the drives seen on that path. Checksums that don't match indicate an issue with that path involving an adapter, SAS cable, enclosure ESM, or expanders in the enclosures. If only one disk set is shown, this indicates a complete lack of path, such as a missing cable or ESM.
scsi3[20.00.02.00] U78CB.001.WZS06M2-P1-C2-T1
scsi4[20.00.02.00] U78CB.001.WZS06M2-P1-C2-T2 [P1 SV45222733 ESM A (sg68)] [P2 SV45221140 ESM B (sg127)]
scsi5[20.00.02.00] U78CB.001.WZS06M2-P1-C3-T1
scsi6[20.00.02.00] U78CB.001.WZS06M2-P1-C3-T2 [P2 SV45221140 ESM A (sg188)]
scsi0[20.00.02.00] U78CB.001.WZS06M2-P1-C11-T1
scsi2[20.00.02.00] U78CB.001.WZS06M2-P1-C11-T2 [P2 S45222V733 ESM B (sg9)]
The first two lines represent the SAS adapter in slot
C2. There are two SAS 2300 SCSI Controllers in each adapter card,
indicated by T1 and T2. T1 P1 = Port 0
T1 P2 = Port 1
T2 P1 = Port 2
T2 P2 = Port 3
This shows that Port 2 of the adapter in slot C2 is connected
to ESM A of enclosure SV45222733. Similarly, Port 2 of the adapter
in slot C11 is connected to ESM B of enclosure 45222V733. See Figure 1 and Figure 2 for
the physical location of ports and ESMs. System check 3: run gsscheckdisks
The gsscheckdisks command initiates I/O to the drives and can be used to identify marginal drives. This command must be run on a system where there is no GPFS cluster configured. If it is run with a write test on a system where a GPFS cluster is already configured, it will overwrite the cluster configuration data stored in the disk, resulting in cluster and data loss. This command can be run from the management server node or from an I/O server node. The default duration is to run for 30 seconds for each I/O test for each path. For a more thorough test, set the duration to run for 5 minutes (300 seconds) or more.
[root@ems1 deploy]# gsscheckdisks -G gss_ppc64 --disk-list sdx,sdc --iotest a --write-enable
2015-06-15T20:35:53.408621 Start running check disks
gsscheckdisks must run in INSTALL or MFG environment. It may result in data loss
if run in a configured system.
Please rerun with environment GSSENV=INSTALL or GSSENV=MFG to indicate that it is
run in install or manufacturing environment.
Example:
GSSENV=INSTALL gsscheckdisks -N gss_ppc64 --show-enclosure-list
Run gsscheckdisks to verify that disks are in a good state.
GSSENV=INSTALL gsscheckdisks -G gss_ppc64 --encl all --iotest a --write-enable
The system displays output similar to this: [root@gssio1 ~]# GSSENV=INSTALL gsscheckdisks -G gss_ppc64 --encl all --iotest a --write-enable
2014-11-26T05:30:42.401514 Start running check disks
List of Enclosures found
SV32300072
SV24819545
Taking inventory of disks in enclosure SV32300072.
Taking inventory of disks in enclosure SV24819545.
2014-11-26T05:34:48.317358 Starting r test for 118 of 118 disks. Path: 0, duration 30 secs
2014-11-26T05:35:25.216815 Check disk analysis for r test Complete
2014-11-26T05:35:25.218802 Starting w test for 118 of 118 disks. Path: 0, duration 30 secs
2014-11-26T05:36:02.247192 Check disk analysis for w test Complete
2014-11-26T05:36:02.249225 Starting R test for 118 of 118 disks. Path: 0, duration 30 secs
2014-11-26T05:36:39.384888 Check disk analysis for R test Complete
2014-11-26T05:36:39.386868 Starting W test for 118 of 118 disks. Path: 0, duration 30 secs
2014-11-26T05:37:16.515254 Check disk analysis for W test Complete
2014-11-26T05:37:16.517218 Starting r test for 118 of 118 disks. Path: 1, duration 30 secs
2014-11-26T05:37:53.407486 Check disk analysis for r test Complete
2014-11-26T05:37:53.409601 Starting w test for 118 of 118 disks. Path: 1, duration 30 secs
2014-11-26T05:38:30.421883 Check disk analysis for w test Complete
2014-11-26T05:38:30.423763 Starting R test for 118 of 118 disks. Path: 1, duration 30 secs
2014-11-26T05:39:07.548179 Check disk analysis for R test Complete
2014-11-26T05:39:07.550328 Starting W test for 118 of 118 disks. Path: 1, duration 30 secs
2014-11-26T05:39:44.675574 Check disk analysis for W test Complete
gsscheckdisks displays an error count if any of the drives under test (and path) experience I/O errors. If there are errors on any disks, the output identifes the failing disks. The output details the performance and errors seen by the drives and is saved in the /tmp/checkdisk directory of the management server node (or I/O server node if it is called from there) for further analysis. There are three files in this directory.
hostdiskana[0-1].csv contains summary results of disk I/O throughput of each device every second and a one-line summary of each device showing throughput and error count.
- diskiostat.csv contains details of the /proc/iostat data for every second for offline detailed analysis of disk performance. The format of the data is: column 1: time epoch, column 2: node where run, column 3: device. Columns 4 through 11 are a dump of /proc/iostat.
- deviceerr.csv contains the drive error count. The format of the data: column 1: time epoch, column 2: node where run, column 3: device, column 4: I/O issued, column 5: I/O completed, column 6: io error.
Note: With a default test duration of 30 for each test case and a batch size of 60 drives, it can take up to 20 minutes per node for a GL4 system.
See gsscheckdisks command for more information about this command.
Set up high-speed networking
Set up the high-speed network that will be used for cluster data communication. See Networking: creating a bonded interface for more information.- Choose the hostname that will be associated with the high-speed network IP address. Typically, the hostname associated with the high-speed network is derived from the xCAT hostname using the prefix and suffix. Before you create the GPFS cluster, high-speed networking must be configured with the proper IP address and hostname. See Node name considerations for more information.
- Update your /etc/hosts with high-speed network entries showing the high-speed IP address and corresponding host name. Copy the modified /etc/hosts to the I/O Server nodes of the cluster.
- Add the high-speed network to the xCAT networks table. Run:
makedns
Set up the high-speed network
With the Ethernet high-speed network, you can use the gssgennetworks script to create a bonded Ethernet interface over active (up) high-speed network interfaces. You cannot use gssgennetworks IPoIB configurations. See Appendix A: Installation: reference for creating bonded network interface with IP over IB.
- To see the current set of active (up) interfaces on all nodes,
run:
gssgennetworks -G ems1,gss_ppc64 --suffix=-hs - To create a bonded Ethernet interface, in all nodes
run:
The script sets miimon to 100, the bonding node to 802.3ad (LACP), and xmit_hash_policy to layer3+4. The other bond options keep the default values, including lacp-_rate (the default is slow). For proper network operation, the Ethernet switch setting in the networking infrastructure must match the I/O server node interface bond settings.gssgennetworks -G ems1,gss_ppc64 --suffix=-hs -create-bond
Check the installed software and firmware
Run the gssinstallcheck command to check the installed software and firmware.
See gssinstallcheck command for more information about this command.
Create the GPFS cluster
Run the gssgencluster command on the management server to create the cluster. This command creates a GPFS cluster using all of the nodes in the node group if you specify the -G option. You can also provide a list of names using the -N option. The command assigns server licenses to each I/O server node, so it prompts for license acceptance (or use the -accept-license option). It applies the best-practice IBM Spectrum Scale configuration attributes for an NSD server based on IBM Spectrum Scale RAID. At the end of cluster creation, the SAS adapter firmware, storage enclosure firmware, and drive firmware are upgraded if needed. To bypass the firmware update, specify the --no-fw-update option.
Note: This command could take some time to run.
See gssgencluster command for more information about this command.
Note: This command could take some time to run.
mmlscluster
The system displays output similar to this: [root@gssio1 ~]# mmlscluster
GPFS cluster information
========================
GPFS cluster name: test01.gpfs.net
GPFS cluster id: 14599547031220361759
GPFS UID domain: test01.gpfs.net
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR
Node Daemon node name IP address Admin node name Designation
-------------------------------------------------------------------------
1 gssio1-hs.gpfs.net 172.45.45.23 gssio1-hs.gpfs.net quorum-manager
2 gssio2-hs.gpfs.net 172.45.45.24 gssio2-hs.gpfs.net quorum-manager
Verify that the GPFS cluster is active
mmgetstate -a
The system displays output similar to this: [root@gssio1 ~]# mmgetstate -a
Node number Node name GPFS state
------------------------------------------
1 gssio1-hs active
2 gssio2-hs active
After the /etc/hosts file is properly set with high-speed IP addresses and corresponding hostnames, you can use the gssgennetworks script to create a bonded Ethernet network. Note that this script cannot be used to create a bond with the IP over an IB network.
gssgennetworks -G gss_ppc64
To create a bonded interface,
run:gssgennetworks -G gss_ppc64 --create-bond
The script sets miimon to 100,
the bonding mode to 802.3ad (LACP), and xmit_hash_policy to layer3+4. The other bond options keep
the default values, including lacp_rate (the default is slow). For proper network operation, the
Ethernet switch settings in the networking infrastructure must match the I/O server node interface
bond settings.Create the recovery groups
The gssgenclusterrgs command creates the recovery groups (RGs) and declustered arrays (DAs), as well as the associated log tip vdisk, log backup vdisk, and log home vdisk. For each RG, three arrays are created: NVRAM, SSD, and DAn. By default for ESS 3.5, only one DA is created, in which all HDDs (and SSDs for SSD models) belong to this single DA (DA1, for example). If you want to use multiple DAs (assuming there are enough disks), specify the --multi-da option.
The gssgenclusterrgs command can create NSDs and file systems for simple configurations that require one file system. More flexibility can be achieved using gssgenclusterrgs to create the recovery groups only and using gssgenvdisks (the preferred method) to create data vdisks, metadata vdisks, NSDs, and file systems. For backward compatibility, the gssgenclusterrgs command continues to support vdisk, NSD, and file system creation.
The gssgenclusterrgs command creates and saves the stanza files for the data and metadata vdisks and NSD. The stanza files are located in the /tmp directory of the first node of the first building block with names node1_node2_vdisk.cfg.save and node1_node2_nsd.cfg.save. These files can be edited for further customization.
If a customized recovery stanza file is available, it can be used to create the recovery group. The files must be located on the first node (in the node list) of each building block in /tmp. Their names must be in the format xxxxL.stanza and yyyyR.stanza, where L is for the left recovery group and R is for the right recovery group. The name of the recovery group is derived from the I/O server node's short name (with prefix and suffix) by adding a prefix of rg_. When the --create-nsds option is specified, by default, 1% of the space is left as reserved and the remaining space is used to create the NSDs. The amount of reserved space is user-selectable and the default is 1% of the total raw space. Note that the percentage of reserved space is based on the total raw space (not on the available space) before any redundancy overhead is applied.
If the system already contains recovery groups and log vdisks (created in the previous steps), their creation can be skipped using the appropriate options. This can be useful when NSDs are recreated (for a change in the number of NSDs or block size, for example).
Note 1: This command could take some time to complete.
Note 2: NSDs in a building block are assigned to the same failure group by default. If you have multiple building blocks, the NSDs defined in each building block will have a different failure group for each building block. Carefully consider this information and change the failure group assignment when you are configuring the system for metadata and data replication.
gssgenclusterrgs -G gss_ppc64 --suffix=-hs
The system displays output similar to this: [root@ems1 ~]# gssgenclusterrgs -G gss_ppc64 --suffix=-hs
2015-06-16T00:12:22.176357 Determining peer nodes
2015-06-16T00:12:23.786661 nodelist: gssio1 gssio2
2015-06-16T00:12:23.786749 Getting pdisk topology from node to create partner list gssio1
2015-06-16T00:12:38.933425 Getting pdisk topology from node to create partner list gssio2
2015-06-16T00:12:54.049202 Getting pdisk topology from node for recoverygroup creation. gssio1
2015-06-16T00:13:06.466809 Getting pdisk topology from node for recoverygroup creation. gssio2
2015-06-16T00:13:25.289541 Stanza files for node pairs gssio1 gssio2
/tmp/SV45221140L.stanza /tmp/SV45221140R.stanza
2015-06-16T00:13:25.289604 Creating recovery group rg_gssio1-hs
2015-06-16T00:13:48.556966 Creating recovery group rg_gssio2-hs
2015-06-16T00:14:17.627686 Creating log vdisks in recoverygroup rg_gssio1-hs
2015-06-16T00:15:14.117554 Creating log vdisks in recoverygroup rg_gssio2-hs
2015-06-16T00:16:30.267607 Task complete.
See gssgenclusterrgs command for more information about this command.
Verify the recovery group configuration
mmlsrecoverygroup
The system displays output similar to this: [root@gssio1 ~]# mmlsrecoverygroup
declustered
arrays with
recovery group vdisks vdisks servers
------------------ ----------- ------ -------
rg_gssio1-hs 3 3 gssio1-hs.gpfs.net,gssio2-hs.gpfs.net
rg_gssio2-hs 3 3 gssio2-hs.gpfs.net,gssio1-hs.gpfs.net
- NVR contains the NVRAM devices used for the log tip vdisk.
- SSD contains the SSD devices used for the log backup vdisk.
- DA1 contains the SSD or HDD devices used for the log home vdisk and file system data.
- If you used the --multi-da option
with the gssgenclusterrgs command, you might
see one or more additional DAs:
DAn, where n > 1 (depending on the ESS model), contains the SSD or HDD devices used for file system data.
mmlsrecoverygroup rg_gssio1-hs -L
The system displays output similar to this: [root@gssio1 ~]# mmlsrecoverygroup rg_gssio1-hs -L
declustered
recovery group arrays vdisks pdisks format version
----------------- ----------- ------ ------ --------------
rg_gssio1-hs 3 3 61 4.1.0.1
declustered needs replace scrub background activity
array service vdisks pdisks spares threshold free space duration task progress priority
----------- ------- ------ ------ ------ --------- ---------- -------- -------------------------
SSD no 1 1 0,0 1 372 GiB 14 days scrub 4% low
NVR no 1 2 0,0 1 3648 MiB 14 days scrub 4% low
DA1 no 1 58 2,31 2 101 TiB 14 days scrub 0% low
declustered checksum
vdisk RAID code array vdisk size block size granularity state remarks
------------------ ------------------ ----------- ---------- ---------- ----------- ----- -------
rg_gssio1_hs_logtip 2WayReplication NVR 48 MiB 2 MiB 4096 ok logTip
rg_ssio1_hs_logtipbackup Unreplicated SSD 48 MiB 2 MiB 4096 ok logTipBackup
rg_gssio1_hs_loghome 4WayReplication DA1 20 GiB 2 MiB 4096 ok log
config data declustered array VCD spares actual rebuild spare space remarks
------------------ ------------------ ------------- --------------------------------- ----------------
rebuild space DA1 31 35 pdisk
config data max disk group fault tolerance actual disk group fault tolerance remarks
------------------ --------------------------------- --------------------------------- ----------------
rg descriptor 4 drawer 4 drawer limiting fault tolerance
system index 1 enclosure + 1 drawer 4 drawer limited by rg descriptor
vdisk max disk group fault tolerance actual disk group fault tolerance remarks
------------------ --------------------------------- --------------------------------- ----------------
rg_gssio1_hs_logtip 1 pdisk 1 pdisk
rg_gssio1_hs_logtipbackup 0 pdisk 0 pdisk
rg_gssio1_hs_loghome 1 enclosure + 1 drawer 3 drawer limited by rg descriptor
active recovery group server servers
----------------------------------------------- -------
gssio1-hs.gpfs.net gssio1-hs.gpfs.net,gssio2-hs.gpfs.net
Create the vdisk stanza
Use gssgenvdisks to create the vdisk stanza file. By default, the vdisk stanza is stored in /tmp/vdisk1.cfg. Optionally, gssgenvdisks can be used to create vdisks, NSDs, and the file system on existing recovery groups. If no recovery groups are specified, all available recovery groups are used. If the command is run on the management server node (or any other node) that is not part of the cluster, a contact node that is part of the cluster must be specified. The contact node must be reachable from the node (the management server node, for example) where the command is run.
You can use this command to add a suffix to vdisk names, which can be useful when creating multiple file systems. A unique suffix can be used with a vdisk name to associate it with a different file system (examples follow). The default reserve capacity is set to 1%. If the vdisk data block size is less than 8M, the reserved space should be increased with decreasing data vdisk block size.
See the gssgenvdisks command for more information.
This command can be used to create a shared-root file system for IBM Spectrum Scale protocol nodes. See Adding IBM Spectrum Scale nodes to an ESS cluster for more information.
Note: NSDs that are in the same building block are given the same failure group by default. If file system replication is set to 2 (m=2 or r=2), there should be more than one building block or the failure group of the NSDs must be adjusted accordingly.
In ESS 3.0 and later, the gssgenvdisks command includes an option for specifying the data vdisk size and the metadata vdisk size in GiB. When the metadata NSD size (due to a one-to-one mapping of NSDs to vdisks) and the metadata percentage are specified, the metadata NSD size takes precedence.
Reserved space considerations
When all available space is allocated, the reserved space should be increased with decreasing data vdisk block size. A default reserved space of 1% works well for a block size of up to 4 MB. For a 2 MB block size, 2% should be reserved. For a 1 MB block size, reserved space should be increased to 3%.
Example 1:
Create two file systems, one with 20 TB (two vdisks, 10 TB each), and the other with 40 TB (two vdisks, 20 TB each) with a RAID code of 8+3p.
gssgenvdisks --contact-node gssio1 --create-vdisk --create-nsds --create-filesystem
--vdisk-suffix=_fs1 --filesystem-name fs1 --data-vdisk-size 10240
The system displays output similar to this: [root@ems1 ~]# gssgenvdisks --contact-node gssio1 --create-vdisk --create-nsds --create-filesystem
--vdisk-suffix=_fs1 --filesystem-name fs1 --data-vdisk-size 10240
2015-06-16T00:50:37.254906 Start creating vdisk stanza
vdisk stanza saved in gssio1:/tmp/vdisk1.cfg
2015-06-16T00:50:51.809024 Generating vdisks for nsd creation
2015-06-16T00:51:27.409034 Creating nsds
2015-06-16T00:51:35.266776 Creating filesystem
Filesystem successfully created. Verify failure group of nsds and change as needed.
2015-06-16T00:51:46.688937 Applying data placement policy
2015-06-16T00:51:51.637243 Task complete.
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 246G 2.9G 244G 2% /
devtmpfs 60G 0 60G 0% /dev
tmpfs 60G 0 60G 0% /dev/shm
tmpfs 60G 43M 60G 1% /run
tmpfs 60G 0 60G 0% /sys/fs/cgroup
/dev/sda2 497M 161M 336M 33% /boot
/dev/fs1 21T 160M 21T 1% /gpfs/fs1
The last line shows that file system fs1 was
created. gssgenvdisks --contact-node gssio1 --create-vdisk --create-nsds --create-filesystem
--vdisk-suffix=_fs2 --filesystem-name fs2 --data-vdisk-size 20480 --raid-code 8+3p
The system displays output similar to this: [root@ems1 ~]# gssgenvdisks --contact-node gssio1 --create-vdisk --create-nsds --create-filesystem
--vdisk-suffix=_fs2 --filesystem-name fs2 --data-vdisk-size 20480 --raid-code 8+3p
2015-06-16T01:06:59.929580 Start creating vdisk stanza
vdisk stanza saved in gssio1:/tmp/vdisk1.cfg
2015-06-16T01:07:13.019100 Generating vdisks for nsd creation
2015-06-16T01:07:56.688530 Creating nsds
2015-06-16T01:08:04.516814 Creating filesystem
Filesystem successfully created. Verify failure group of nsds and change as needed.
2015-06-16T01:08:16.613198 Applying data placement policy
2015-06-16T01:08:21.637298 Task complete.
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 246G 2.9G 244G 2% /
devtmpfs 60G 0 60G 0% /dev
tmpfs 60G 0 60G 0% /dev/shm
tmpfs 60G 43M 60G 1% /run
tmpfs 60G 0 60G 0% /sys/fs/cgroup
/dev/sda2 497M 161M 336M 33% /boot
/dev/fs1 21T 160M 21T 1% /gpfs/fs1
/dev/fs2 41T 160M 41T 1% /gpfs/fs2
The last line shows that file system fs2 was
created. mmlsvdisk
The system displays output similar to this: [root@gssio1 ~]# mmlsvdisk
declustered block size
vdisk name RAID code recovery group array in KiB remarks
------------------ --------------- ------------------ ----------- ---------- -------
rg_gssio1_hs_Data_8M_2p_1_fs1 8+2p rg_gssio1-hs DA1 8192
rg_gssio1_hs_Data_8M_3p_1_fs2 8+3p rg_gssio1-hs DA1 8192
rg_gssio1_hs_MetaData_8M_2p_1_fs1 3WayReplication rg_gssio1-hs DA1 1024
rg_gssio1_hs_MetaData_8M_3p_1_fs2 4WayReplication rg_gssio1-hs DA1 1024
rg_gssio1_hs_loghome 4WayReplication rg_gssio1-hs DA1 2048 log
rg_gssio1_hs_logtip 2WayReplication rg_gssio1-hs NVR 2048 logTip
rg_gssio1_hs_logtipbackup Unreplicated rg_gssio1-hs SSD 2048 logTipBackup
rg_gssio2_hs_Data_8M_2p_1_fs1 8+2p rg_gssio2-hs DA1 8192
rg_gssio2_hs_Data_8M_3p_1_fs2 8+3p rg_gssio2-hs DA1 8192
rg_gssio2_hs_MetaData_8M_2p_1_fs1 3WayReplication rg_gssio2-hs DA1 1024
rg_gssio2_hs_MetaData_8M_3p_1_fs2 4WayReplication rg_gssio2-hs DA1 1024
rg_gssio2_hs_loghome 4WayReplication rg_gssio2-hs DA1 2048 log
rg_gssio2_hs_logtip 2WayReplication rg_gssio2-hs NVR 2048 logTip
rg_gssio2_hs_logtipbackup Unreplicated rg_gssio2-hs SSD 2048 logTipBackup
Example 2a:
vim /var/log/gss/gssinstall.log
gssgenvdisks --contact-node gssio1 --create-vdisk --create-filesystem --data-blocksize 1M
--reserved-space 3
The system displays output similar to this: [root@ems1 ~]# vim /var/log/gss/gssinstall.log
[root@ems1 ~]# gssgenvdisks --contact-node gssio1 --create-vdisk --create-filesystem --data-blocksize 1M
--reserved-space 3
2015-06-16T01:49:07.963323 Start creating vdisk stanza
vdisk stanza saved in gssio1:/tmp/vdisk1.cfg
2015-06-16T01:49:21.210383 Generating vdisks for nsd creation
2015-06-16T01:52:19.688953 Creating nsds
2015-06-16T01:52:27.766494 Creating filesystem
Filesystem successfully created. Verify failure group of nsds and change as needed.
2015-06-16T01:52:47.249103 Applying data placement policy
2015-06-16T01:52:51.896720 Task complete.
Example 2b:
gssgenvdisks --contact-node gssio1 --create-vdisk --create-filesystem --data-blocksize 4M --reserved-space 2
The system displays output similar to this:
[root@ems1 ~]# gssgenvdisks --contact-node gssio1 --create-vdisk --create-filesystem --data-blocksize 4M
--reserved-space 2
2015-06-16T01:25:54.455588 Start creating vdisk stanzavdisk stanza saved in gssio1:/tmp/vdisk1.cfg
2015-06-16T01:26:07.443263 Generating vdisks for nsd creation
2015-06-16T01:27:46.671050 Creating nsds
2015-06-16T01:27:54.296765 Creating filesystem
Filesystem successfully created. Verify failure group of nsds and change as needed.
2015-06-16T01:28:07.279192 Applying data placement policy
2015-06-16T01:28:11.836822 Task complete.
Example 3:
Suppose you want to create three file systems. The first file system is called fsystem0. Keep 66% of the space reserved for future file system creation. For the second file system, fsystem1, keep 33% reserved. For the third file system, fsystem2, keep 1% reserved. Because you are going to create multiple file systems, you must specify a unique suffix for vdisk creation. Specify _fs0 as the suffix of the vdisk name for the first file system. Specify a RAID code of 8+3p for data vdisks.
gssgenvdisks --create-vdisk --vdisk-suffix _fs0 --raid-code 8+3p --create-filesystem
--filesystem-name fsystem0 --reserved-space-percent 66
The system displays output similar to this: [root@ems1 ~]# gssgenvdisks --create-vdisk --vdisk-suffix _fs0 --raid-code 8+3p --create-filesystem
--filesystem-name fsystem0 --reserved-space-percent 66
2015-03-13T07:04:12.703294 Start creating vdisk stanza
2015-03-13T07:04:12.703364 No contact node provided. Using current node. ems1
vdisk stanza saved in ems1:/tmp/vdisk1.cfg
2015-03-13T07:04:33.088067 Generating vdisks for nsd creation
2015-03-13T07:05:44.648360 Creating nsds
2015-03-13T07:05:53.517659 Creating filesystem
Filesystem successfully created. Verify failure group of nsds and change as needed.
2015-03-13T07:06:07.416392 Applying data placement policy
2015-03-13T07:06:12.748168 Task complete.
gssgenvdisks --create-vdisk --vdisk-suffix _fs1 --raid-code 8+3p --create-filesystem
--filesystem-name fsystem1 --reserved-space-percent 33
The system displays output similar to this: [root@ems1 ~]# gssgenvdisks --create-vdisk --vdisk-suffix _fs1 --raid-code 8+3p --create-filesystem
--filesystem-name fsystem1 --reserved-space-percent 33
2015-03-13T07:11:14.649102 Start creating vdisk stanza
2015-03-13T07:11:14.649189 No contact node provided. Using current node. ems1
vdisk stanza saved in ems1:/tmp/vdisk1.cfg
2015-03-13T07:11:34.998352 Generating vdisks for nsd creation
2015-03-13T07:12:46.858365 Creating nsds
2015-03-13T07:12:55.416322 Creating filesystem
Filesystem successfully created. Verify failure group of nsds and change as needed.
2015-03-13T07:13:09.488075 Applying data placement policy
2015-03-13T07:13:14.756651 Task complete.
gssgenvdisks --create-vdisk --vdisk-suffix _fs2 --raid-code 8+3p --create-filesystem --filesystem-name
fsystem2 --reserved-space-percent 1
The system displays output similar to this: [root@ems1 ~]# gssgenvdisks --create-vdisk --vdisk-suffix _fs2 --raid-code 8+3p --create-filesystem
--filesystem-name fsystem2 --reserved-space-percent 1
2015-03-13T07:13:37.191809 Start creating vdisk stanza
2015-03-13T07:13:37.191886 No contact node provided. Using current node. ems1
vdisk stanza saved in ems1:/tmp/vdisk1.cfg
2015-03-13T07:13:57.548238 Generating vdisks for nsd creation
2015-03-13T07:15:08.838311 Creating nsds
2015-03-13T07:15:16.666115 Creating filesystem
Filesystem successfully created. Verify failure group of nsds and change as needed.
2015-03-13T07:15:30.532905 Applying data placement policy
2015-03-13T07:15:35.876333 Task complete.
mmlsvdisk
The system displays output similar to this:
[root@ems1 ~]# mmlsvdisk
declustered block size
vdisk name RAID code recovery group array in KiB remarks
------------------ --------------- ------------------ ----------- ---------- -------
rg_gssio1_hs_Data_8M_3p_1_fs0 8+3p rg_gssio1-hs DA1 8192
rg_gssio1_hs_Data_8M_3p_1_fs1 8+3p rg_gssio1-hs DA1 8192
rg_gssio1_hs_Data_8M_3p_1_fs2 8+3p rg_gssio1-hs DA1 8192
rg_gssio1_hs_MetaData_8M_3p_1_fs0 4WayReplication rg_gssio1-hs DA1 1024
rg_gssio1_hs_MetaData_8M_3p_1_fs1 4WayReplication rg_gssio1-hs DA1 1024
rg_gssio1_hs_MetaData_8M_3p_1_fs2 4WayReplication rg_gssio1-hs DA1 1024
rg_gssio1_hs_loghome 4WayReplication rg_gssio1-hs DA1 2048 log
rg_gssio1_hs_logtip 2WayReplication rg_gssio1-hs NVR 2048 logTip
rg_gssio1_hs_logtipbackup Unreplicated rg_gssio1-hs SSD 2048 logTipBackup
rg_gssio2_hs_Data_8M_3p_1_fs0 8+3p rg_gssio2-hs DA1 8192
rg_gssio2_hs_Data_8M_3p_1_fs1 8+3p rg_gssio2-hs DA1 8192
rg_gssio2_hs_Data_8M_3p_1_fs2 8+3p rg_gssio2-hs DA1 8192
rg_gssio2_hs_MetaData_8M_3p_1_fs0 4WayReplication rg_gssio2-hs DA1 1024
rg_gssio2_hs_MetaData_8M_3p_1_fs1 4WayReplication rg_gssio2-hs DA1 1024
rg_gssio2_hs_MetaData_8M_3p_1_fs2 4WayReplication rg_gssio2-hs DA1 1024
rg_gssio2_hs_loghome 4WayReplication rg_gssio2-hs DA1 2048 log
rg_gssio2_hs_logtip 2WayReplication rg_gssio2-hs NVR 2048 logTip
rg_gssio2_hs_logtipbackup Unreplicated rg_gssio2-hs SSD 2048 logTipBackup
mmlsnsd
The system displays output similar to this: [root@ems1 ~]# mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
fsystem0 rg_gssio1_hs_Data_8M_3p_1_fs0 gssio1-hs,gssio2-hs
fsystem0 rg_gssio1_hs_MetaData_8M_3p_1_fs0 gssio1-hs,gssio2-hs
fsystem0 rg_gssio2_hs_Data_8M_3p_1_fs0 gssio2-hs,gssio1-hs
fsystem0 rg_gssio2_hs_MetaData_8M_3p_1_fs0 gssio2-hs,gssio1-hs
fsystem1 rg_gssio1_hs_Data_8M_3p_1_fs1 gssio1-hs,gssio2-hs
fsystem1 rg_gssio1_hs_MetaData_8M_3p_1_fs1 gssio1-hs,gssio2-hs
fsystem1 rg_gssio2_hs_Data_8M_3p_1_fs1 gssio2-hs,gssio1-hs
fsystem1 rg_gssio2_hs_MetaData_8M_3p_1_fs1 gssio2-hs,gssio1-hs
fsystem2 rg_gssio1_hs_Data_8M_3p_1_fs2 gssio1-hs,gssio2-hs
fsystem2 rg_gssio1_hs_MetaData_8M_3p_1_fs2 gssio1-hs,gssio2-hs
fsystem2 rg_gssio2_hs_Data_8M_3p_1_fs2 gssio2-hs,gssio1-hs
fsystem2 rg_gssio2_hs_MetaData_8M_3p_1_fs2 gssio2-hs,gssio1-hs
Check the file system configuration
mmlsfs all
The system displays output similar to this: [root@gssio1 ~]# mmlsfs all
File system attributes for /dev/gpfs0:
======================================
flag value description
------------------- ------------------------ -----------------------------------
-f 32768 Minimum fragment size in bytes (system pool)
262144 Minimum fragment size in bytes (other pools)
-i 4096 Inode size in bytes
-I 32768 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 2 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 2 Maximum number of data replicas
-j scatter Block allocation type
-D nfs4 File locking semantics in effect
-k all ACL semantics in effect
-n 32 Estimated number of nodes that will mount file system
-B 1048576 Block size (system pool)
8388608 Block size (other pools)
-Q none Quotas accounting enabled
none Quotas enforced
none Default quotas enabled
--perfileset-quota No Per-fileset quota enforcement
--filesetdf No Fileset df enabled?
-V 14.10 (4.1.0.4) File system version
--create-time Tue Jun 16 02:49:45 2015 File system creation time
-z No Is DMAPI enabled?
-L 4194304 Logfile size
-E Yes Exact mtime mount option
-S No Suppress atime mount option
-K whenpossible Strict replica allocation option
--fastea Yes Fast external attributes enabled?
--encryption No Encryption enabled?
--inode-limit 134217728 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max 65536)
-P system;data Disk storage pools in file system
-d rg_gssio1_hs_Data_8M_2p_1; Disks in file system
rg_gssio1_hs_MetaData_8M_2p_1;
rg_gssio2_hs_Data_8M_2p_1;
rg_gssio2_hs_MetaData_8M_2p_1
-A yes Automatic mount option
-o none Additional mount options
-T /gpfs/gpfs0 Default mount point
--mount-priority 0 Mount priority
Mount the file system
mmmount device -a
where device is the name of
the file system. The default file system name is gpfs0.
For example, run: mmmount gpfs0 -a
To check whether the file system is mounted properly,
run: mmlsmount gpfs0 -L
The system displays output similar to this: [root@gssio1 ~]# mmlsmount gpfs0 -L
File system gpfs0 is mounted on 2 nodes:
172.45.45.23 gssio1-hs
172.45.45.24 gssio2-hs
To check file system space usage, run: df
The system displays output similar to this: [root@gssio1 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 257922000 2943152 254978848 2% /
devtmpfs 62265728 0 62265728 0% /dev
tmpfs 62302080 0 62302080 0% /dev/shm
tmpfs 62302080 43584 62258496 1% /run
tmpfs 62302080 0 62302080 0% /sys/fs/cgroup
/dev/sda2 508588 164580 344008 33% /boot
/dev/gpfs0 154148405248 163840 154148241408 1% /gpfs/gpfs0
Initially after creation, you might
see that the file system use is at 99%, temporarily. Test the file system using gpfsperf
/usr/lpp/mmfs/samples/perf/gpfsperf create seq /gpfs/gpfs0/testfile1 -n 200G -r 16M -th 4
The system displays output similar to this: [root@gssio1 ~]# /usr/lpp/mmfs/samples/perf/gpfsperf create seq /gpfs/gpfs0/testfile1 -n 200G -r 16M -th 32
/usr/lpp/mmfs/samples/perf/gpfsperf create seq /gpfs/gpfs0/testfile1
recSize 16M nBytes 200G fileSize 16G
nProcesses 1 nThreadsPerProcess 32
file cache flushed before test
not using direct I/O
offsets accessed will cycle through the same file segment
not using shared memory buffer
not releasing byte-range token after open
no fsync at end of test
Data rate was 4689394.83 Kbytes/sec, thread utilization 0.925
The block size must match the data
vdisk block size. To verify that the ESS is operating as expected, you can use gpfsperf to run other I/O tests such as read and write.
/usr/lpp/mmfs/samples/perf/gpfsperf
Add nodes to the cluster
The management server node and additional I/O server nodes can be added to the ESS cluster using the gssaddnode command. The management server node is updated with the required RPMs during deployment and prepared to join the cluster if needed.
The I/O server nodes must be deployed properly and the high-speed network configured before gssaddnode can be used to add these nodes to the ESS cluster. gssaddnode adds the nodes to the cluster, runs the product license acceptance tool, configures the nodes (using gssServerConfig.sh or gssClientConfig.sh), and updates the host adapter, enclosure, and drive firmware. Do not use gssaddnode to add non-ESS (I/O server or management server) nodes to the cluster. Use mmaddnode instead.
On the gssaddnode command, the -N ADD-NODE-LIST option specifies the list of nodes that are being added. For the management server node, it is that node's hostname. The --nodetype option specifies the type of node that is being added. For the management server node, the value is ems. This command must run on the management server node when that node is being added. This command can be also used to add I/O server nodes to an existing cluster.
See gssaddnode command for more information about this command, including an example.
mmlscluster
The system displays output similar to this: [root@ems1 ~]# mmlscluster
GPFS cluster information
========================
GPFS cluster name: test01.gpfs.net
GPFS cluster id: 14599547031220361759
GPFS UID domain: test01.gpfs.net
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR
Node Daemon node name IP address Admin node name Designation
-------------------------------------------------------------------------
1 gssio1-hs.gpfs.net 172.45.45.23 gssio1-hs.gpfs.net quorum-manager
2 gssio2-hs.gpfs.net 172.45.45.24 gssio2-hs.gpfs.net quorum-manager
5 ems1-hs.gpfs.net 172.45.45.22 ems1-hs.gpfs.net quorum
Check the installed software
Run the gssinstallcheck command to verify that the key components are installed correctly. See gssinstallcheck command for more information about this command.
Run a stress test
After the system is configured correctly and all marginal components are out of the system, run a stress test to stress the disk and network elements. Use the gssstress command to run a stress test on the system.
Note: gssstress is not a performance tool, so performance numbers shown should not be interpreted as performance of the system.
gssstress /gpfs/gpfs0 gssio1 gssio2
The system displays output similar to this: [root@ems1 ~]# gssstress /gpfs/gpfs0 gssio1 gssio2
1 gssio1 create
1 gssio2 create
Waiting for 1 create to finish
create seq /gpfs/gpfs0/stressFile.1.gssio1 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 1728569.28 0.980
create seq /gpfs/gpfs0/stressFile.1.gssio2 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 1706918.52 0.981
1 gssio1 read
1 gssio2 read
Waiting for 1 read to finish
read seq /gpfs/gpfs0/stressFile.1.gssio1 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 2776149.11 0.997
read seq /gpfs/gpfs0/stressFile.1.gssio2 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 2776185.62 0.998
1 gssio1 write
1 gssio2 write
Waiting for 1 write to finish
write seq /gpfs/gpfs0/stressFile.1.gssio2 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 1735661.04 0.971
write seq /gpfs/gpfs0/stressFile.1.gssio1 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 1733622.96 0.971
1 gssio1 read
1 gssio2 read
Waiting for 1 read to finish
read seq /gpfs/gpfs0/stressFile.1.gssio1 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 2774776.83 0.997
read seq /gpfs/gpfs0/stressFile.1.gssio2 16777216 214748364800 214748364800 1 16 0 1 0 0 1 1 0 0 0 2770247.35 0.998
gpfsperf is run with the nolabels option,
which produces one line of output for each test. The format of the
output is: operation, I/O pattern, file name, record size, number
of bytes, file size, number of processes, number of threads, stride
records, inv, dio, shm, fsync, cycle, reltoken, aio, osync, rate,
util. Throughput is shown in the second field from the end of the line, as shown in bold typeface in the example. While the gssstress is running, you can log on to each node and run dstat to view the disk and network load in the node.

Note: By default, each iteration read and writes 800 GB. With 20 iterations, it will perform a total of 16 TB of I/O from each node and therefore could take some time to complete. For a shorter completion time, specify a lower iteration number, a shorter operation list, or both. The test can be interrupted by pressing <Ctrl-c>.
Dec 28 18:38:16 gssio5 kernel: sd 4:0:74:0: [sdin] CDB:
Dec 28 18:38:16 gssio5 kernel: Read(32): 7f 00 00 00 00 00 00 18 00 09 20 00 00 00 00 00 10 24 b4 90 10 24 b4 90 00 00 00 00 00 00 04 10
Dec 28 18:38:16 gssio5 kernel: end_request: critical medium error, dev sdin, sector 270840976
Dec 28 18:38:16 gssio5 mmfs: [E] Pdisk e1d2s03 of RG gssio5-hs path /dev/sdin: I/O error on read: sector 270840976 length 4112 err 5.
At the end of the stress test, check the enclosures and
disks for any errors. Check the enclosures
mmlsenclosure all
The system displays output similar to this: [root@gssio1 gpfs0]# mmlsenclosure all
needs
serial number service nodes
------------- ------- ------
SV24819545 no gssio1-ib0.data.net.gpfs.net
SV32300072 no gssio1-ib0.data.net.gpfs.net
mmlsenclosure SV24819545 -L -N all
The system displays output similar to this: [root@gssio1 gpfs0]# mmlsenclosure SV24819545 -L -N all
needs
serial number service nodes
------------- ------- ------
SV24819545 no gssio1-ib0.data.net.gpfs.net,gssio2-ib0.data.net.gpfs.net
component type serial number component id failed value unit properties
-------------- ------------- ------------ ------ ----- ---- ----------
dcm SV24819545 DCM_0A no
dcm SV24819545 DCM_0B no
dcm SV24819545 DCM_1A no
dcm SV24819545 DCM_1B no
dcm SV24819545 DCM_2A no
dcm SV24819545 DCM_2B no
dcm SV24819545 DCM_3A no
dcm SV24819545 DCM_3B no
dcm SV24819545 DCM_4A no
dcm SV24819545 DCM_4B no
component type serial number component id failed value unit properties
-------------- ------------- ------------ ------ ----- ---- ----------
enclosure SV24819545 ONLY no
component type serial number component id failed value unit properties
-------------- ------------- ------------ ------ ----- ---- ----------
esm SV24819545 ESM_A no REPORTER
esm SV24819545 ESM_B no NOT_REPORTER
component type serial number component id failed value unit properties
-------------- ------------- ------------ ------ ----- ---- ----------
fan SV24819545 0_TOP_LEFT no 4890 RPM
fan SV24819545 1_BOT_LEFT no 4940 RPM
fan SV24819545 2_BOT_RGHT no 4890 RPM
fan SV24819545 3_TOP_RGHT no 5040 RPM
component type serial number component id failed value unit properties
-------------- ------------- ------------ ------ ----- ---- ----------
powerSupply SV24819545 0_TOP no
powerSupply SV24819545 1_BOT no
component type serial number component id failed value unit properties
-------------- ------------- ------------ ------ ----- ---- ----------
tempSensor SV24819545 DCM_0A no 46 C
tempSensor SV24819545 DCM_0B no 38 C
tempSensor SV24819545 DCM_1A no 47 C
tempSensor SV24819545 DCM_1B no 40 C
tempSensor SV24819545 DCM_2A no 45 C
tempSensor SV24819545 DCM_2B no 40 C
tempSensor SV24819545 DCM_3A no 45 C
tempSensor SV24819545 DCM_3B no 37 C
tempSensor SV24819545 DCM_4A no 45 C
tempSensor SV24819545 DCM_4B no 40 C
tempSensor SV24819545 ESM_A no 39 C
tempSensor SV24819545 ESM_B no 41 C
tempSensor SV24819545 POWERSUPPLY_BOT no 39 C
tempSensor SV24819545 POWERSUPPLY_TOP no 36 C
component type serial number component id failed value unit properties
-------------- ------------- ------------ ------ ----- ---- ----------
voltageSensor SV24819545 12v no 12 V
voltageSensor SV24819545 ESM_A_1_0v no 0.98 V
voltageSensor SV24819545 ESM_A_1_2v no 1.19 V
voltageSensor SV24819545 ESM_A_3_3v no 3.31 V
voltageSensor SV24819545 ESM_A_5v no 5.04 V
voltageSensor SV24819545 ESM_B_1_0v no 1 V
voltageSensor SV24819545 ESM_B_1_2v no 1.19 V
voltageSensor SV24819545 ESM_B_3_3v no 3.31 V
voltageSensor SV24819545 ESM_B_5v no 5.07 V
Check for failed disks
mmlspdisk all --not-ok
The system displays output similar to this: [root@gssio1]# mmlspdisk all --not-ok
pdisk:
replacementPriority = 7.34
name = "e1d2s01"
device = ""
recoveryGroup = "gssio1"
declusteredArray = "DA1"
state = "failing/noPath/systemDrain/noRGD/noVCD/noData"
capacity = 2000381018112
freeSpace = 1999307276288
fru = "42D0768"
location = "SV12616682-2-1"
WWN = "naa.5000C500262630DF"
server = "gssio1.gpfs.net"
reads = 295
writes = 915
bytesReadInGiB = 0.576
bytesWrittenInGiB = 1.157
IOErrors = 0
IOTimeouts = 0
mediaErrors = 0
checksumErrors = 0
pathErrors = 0
relativePerformance = 1.003
dataBadness = 0.000
rgIndex = 9
userLocation = "Enclosure SV12616682 Drawer 2 Slot 1"
userCondition = "replaceable"
hardware = "IBM-ESXS ST32000444SS BC2B 9WM40AQ10000C1295TH8"
hardwareType = Rotating 7200
nPaths = 0 active 0 total
mmlspdisk displays the details of the failed or failing disk, including
the pdisk name, the enclosure (serial number), and the location of the disk. Replacing a disk
If a disk fails and needs to be replaced, follow the proper disk replacement procedure. Improper disk replacement could greatly increase the possibility of data loss. Use the mmchcarrier command to replace a failed pdisk. This command updates the firmware automatically when replacing a disk. For more information about mmchcarrier, see IBM Spectrum Scale RAID: Administration.
Run gnrhealthcheck
gnrhealthcheck
The system displays output similar to this: [root@gssio1 gpfs0]# gnrhealthcheck
################################################################
# Beginning topology checks.
################################################################
Topology checks successful.
################################################################
# Beginning enclosure checks.
################################################################
Enclosure checks successful.
################################################################
# Beginning recovery group checks.
################################################################
Recovery group checks successful.
################################################################
# Beginning pdisk checks.
################################################################
Pdisk checks successful.
See IBM Spectrum
Scale RAID: Administration for more
information about this script. Collecting data
gsssnap
The configuration and service data collected at the end
of the installation can be very valuable during future problem determination
and troubleshooting. Send the collected service data to your IBM representative. See gsssnap script for more information about this command.
Cleaning up the system
- ssh to any I/O server node
- To delete the file system and the associated NSDs and vdisks,
run:
/opt/ibm/gss/tools/samples/gssdelvdisks - To shut down IBM Spectrum
Scale and
delete the cluster, run:
mmshutdown -a mmdelnode -N all