Establish an IBM Spectrum Scale cluster on the Hadoop cluster
Establish a local IBM Spectrum® Scale cluster on the Hadoop cluster. This local IBM Spectrum Scale cluster accesses the ESS via remote mount. This creates a multi-cluster Scale environment and one IBM® ESS storage can be shared with different groups where the remote mount mode can isolate the storage management from the IBM Spectrum Scale local cluster.
- Ensure that the version of IBM Spectrum Scale on the local cluster is higher than or same as the version on the file system owning the cluster (ESS).
- The maxblocksize value requires to be the same on the local IBM Spectrum Scale cluster and the ESS cluster. The maxblocksize value can be set up during the installation of the local IBM Spectrum Scale cluster to be the same value as the ESS cluster. If the maxblocksize is not set, it defaults to 1 MB for releases prior to IBM Spectrum Scale version 5.0.0, however from 5.0.0 release onwards it is set to 4 MB.
- Ensure that the gpfs.repo is in the /etc/yum.repos.d
directory on all the nodes in the Hadoop cluster. On each node, run:
yum clean all; yum makecache
.For example, the /etc/yum.repos.d/gpfs.repo file contains:[GPFS-5.0.1] name=gpfs-5.0.1 baseurl=http://60.2.0.229/repos/rhel/5.0.1/GPFS_5.0.1 enabled=1 gpgcheck=0
Note: Ensure that the gpfscheck value is set to zero. - Install the IBM Spectrum Scale on your cluster.
See the Manually installing the IBM Spectrum
Scale software packages on Linux® nodes topic in the
IBM
Storage Scale: Concepts,
Planning, and Installation Guide.
For example, on each of the Hadoop nodes, run:
yum -y install gpfs.adv* gpfs.base* gpfs.crypto* gpfs.ext* gpfs.gpl* gpfs.gskit* gpfs.lice* gpfs.msg*
- Build the kernel portability layer on each node by issuing the following command:
/usr/lpp/mmfs/bin/mmbuildgpl
- Follow the Steps for establishing and starting your IBM Spectrum Scale cluster topic in the IBM
Storage Scale: Concepts,
Planning, and Installation Guide for your specific Scale version.Note:
- Do not create NSD (mmcrnsd) or a file system (mmcrfs) because this is a remote mount environment.
- Ensure that the Ambari server is set as the IBM Spectrum Scale quorum node. The IBM Spectrum Scale Master node resides on the Ambari server node and requires to be set as a quorum node.
- Check the maxblocksize value on the local cluster and ESS cluster by
running the following
command:
/usr/lpp/mmfs/bin/mmlsconfig | grep maxblocksize
If maxblocksize value on the local cluster is not set or not the same as the ESS cluster, then on the local cluster, run the following command:/usr/lpp/mmfs/bin/mmchconfigmaxblocksize=<ESSmaxblocksizevalue>
- Start IBM Spectrum Scale by issuing the
mmstartup command.
/usr/lpp/mmfs/bin/mmstartup -a
- Ensure that all the IBM Spectrum Scale nodes are
in active state.
/usr/lpp/mmfs/bin/mmgetstate -a
- Tune the local cluster as an ESS client:
For remote mount mode for Hadoop cluster (1st model: Remote mount with all Hadoop nodes as IBM Spectrum Scale nodes), run the following commands:
On ESS run:scp /usr/lpp/mmfs/samples/gss/gssClientConfig.sh root@<Hadoop_local_scale_cluster_host>:</path-to-gssclient>
On Hadoop local scale cluster host, run:<path-to-gssclient>/gssClientConfig.sh all
However, if the IBM Spectrum Scale clients nodes and the ESS nodes are in the same cluster (3rd model: Single cluster with all Hadoop nodes as IBM Spectrum Scale nodes), then run the gssClientConfig.sh script from the ESS node with
<path-to-gssclient>/gssClientConfig.sh <gpfs-client-node1,gpfs-client-node2,gpfs-client-node3,...>
. For additional information, see the Adding IBM Spectrum Scale nodes to the ESS cluster topic in the Elastic Storage Server: Quick Deployment Guide.After running this script, restart GPFS™ on the affected nodes for the optimized configuration settings to take effect.