Adding CES HDFS nodes into the centralized file system
This topic lists the steps to add the CES HDFS nodes into the same GPFS cluster as the centralized file system.
- Ensure that the centralized file system is already installed, configured and active. For example, the IBM Storage Scale System.
- Create the CES shared root file system which will be used by CES installation.Note: The recommendation for CES shared root is a dedicated file system. A dedicated file system can be created with the mmcrfs command. The CES shared root must reside on GPFS and must be available when it is configured through mmchconfig command.
For more information, see the Setting up Cluster Export Services shared root file system topic in IBM Storage Scale: Administration Guide.
- Change to the installer directory to run the spectrumscale commands:For IBM Storage Scale 5.1.1 and later:
# cd /usr/lpp/mmfs/5.1.1.0/ansible-toolkit
For IBM Storage Scale 5.1.0 and earlier:# cd /usr/lpp/mmfs/5.0.4.2/installer
- Instantiate the installer node (chef zero server).To configure the installer node, issue the following command:
./spectrumscale setup -s InstallNodeIP -i SSHIdentity
The -s argument identifies the IP that the nodes will use to retrieve their configuration. This IP will be the one associated with a device on the installer node. This is automatically validated during the setup phase.
Optionally, you can specify a private SSH key to be used to communicate with the nodes in the cluster definition file, using the -i argument.
In an IBM Storage Scale System cluster, if you want to use the installation toolkit to install GPFS and deploy protocols, you must specify the setup type asess
while setting up the installer node:./spectrumscale setup -s InstallNodeIP -i SSHIdentity -st ess
- Use the installation toolkit to populate the cluster definition file from the centralized
storage.
Re-populate the cluster definition file with the current cluster state by issuing the ./spectrumscale config populate --node Node command.
In a cluster containing IBM Storage Scale System, you must specify the EMS node with the
config populate
command.For example:./spectrumscale config populate --node EMSNode
- Add the nodes that will be used for CES HDFS into the existing centralized file system. The
additional nodes are added into the same GPFS
cluster.
./spectrumscale node add FQDN
Deployment of protocol services is performed on a subset of the cluster nodes that have been designated as protocol nodes using the ./spectrumscale node add FQDN -p command.
NameNodes are protocol nodes and requires the -p option during the node add operation.
DataNodes are not protocol nodes.
For example:
For non-HA# NameNodes (Protocol node) ./spectrumscale node add c902f05x05.gpfs.net -p
For HA# NameNodes (Protocol node) ./spectrumscale node add c902f05x05.gpfs.net -p ./spectrumscale node add c902f05x06.gpfs.net -p # DataNodes ./spectrumscale node add c902f05x07.gpfs.net ./spectrumscale node add c902f05x08.gpfs.net ./spectrumscale node add c902f05x09.gpfs.net ./spectrumscale node add c902f05x10.gpfs.net
- If call home is enabled in the cluster definition file, specify the minimum call home
configuration
parameters.
./spectrumscale callhome config -n CustName -i CustID -e CustEmail -cn CustCountry
For more information, see the Enabling and configuring call home using the installation toolkit topic in the IBM Storage Scale: Concepts, Planning, and Installation Guide.
- Do environment checks before initiating the installation
procedure.
./spectrumscale install -pr
- Start the IBM Storage Scale installation and add
the nodes into the existing cluster.
./spectrumscale install
Enable and deploy CES HDFS
Before you deploy the protocols, there must be a GPFS cluster that has GPFS started with at least one file system for the CES shared root file system. Protocol nodes requires at least two GPFS file systems to be mounted: one for CES shared root and one for data.
- Enable HDFS.
./spectrumscale enable hdfs
- Set the CES IPs.Data is served through these protocols from a pool of addresses designated as Export IP addresses or CES public IP addresses. This example uses 192.0.2.2 and 192.0.2.3.
./spectrumscale config protocols -e 192.0.2.2, 192.0.2.3
Note: For IBM Storage Scale releases earlier to 5.0.5.1, a minimum of two CES IPs are required as input for configuring protocol when HDFS is enabled through the installation toolkit even though the HDFS protocol requires only one IP address.From IBM Storage Scale 5.0.5.1, only one CES-IP is needed for one HDFS cluster during installation toolkit deployment.
- Configure the shared root directory.Get the CES shared root file system that was created from the step in Adding CES HDFS nodes into the centralized file system and configure the protocols to point to a file system that will be used as the shared root using the following command:
./spectrumscale config protocols -f FS_Name -m FS_Mountpoint
For example:./spectrumscale config protocols -f cesSharedRoot -m /gpfs/cesSharedRoot
For more information, see the Defining a shared file system for protocols section in IBM Storage Scale: Concepts, Planning, and Installation Guide.
- Create the NameNodes and DataNodes for a new CES HDFS
cluster.
./spectrumscale config hdfs new -n NAME -nn NAMENODES -dn DATANODES -f FILESYSTEM -d DATADIR
The -f is the gpfs.mnt.dir value and -d DATADIR is the gpfs.data.dir value as seen in the HDFS Transparency configuration files. Therefore, each new HDFS Transparency cluster requires its own -d DATADIR value.
For example:
For non-HA# ./spectrumscale config hdfs new -n myhdfscluster -nn c902f05x05 -dn c902f05x07,c902f05x08,c902f05x09,c902f05x10 -f gpfs -d gpfshdfs
For HA# ./spectrumscale config hdfs new -n myhdfscluster -nn c902f05x05,c902f05x06 -dn c902f05x07,c902f05x08,c902f05x09,c902f05x10 -f gpfs -d gpfshdfs
Where-n NAME, --name NAME HDFS cluster name. -nn NAMENODES, --namenodes NAMENODES NameNode hostnames (comma separated). -dn DATANODES, --datanodes DATANODES DataNode hostnames (comma separated). -f FILESYSTEM, --filesystem FILESYSTEM Spectrum Scale file system name. -d DATADIR, --datadir DATADIR Spectrum Scale data directory name.
Note: The -n NAME is the HDFS cluster name. The CES group contains the HDFS cluster name prefix with hdfs.The -d DATADIR is a unique 32-character name required for each HDFS cluster to be created on the same centralized storage.
To configure multiple HDFS clusters, see Adding a new HDFS cluster into existing HDFS cluster on the same GPFS cluster (Multiple HDFS clusters) section.
- List the configured HDFS cluster by running the following
command:
./spectrumscale config hdfs list
For example:
Single HDFS cluster list:Cluster Name : mycluster NameNodesList : [c902f09x11kvm1],[c902f09x11kvm2] DataNodesList : [c902f09x11kvm3],[c902f09x11kvm4] FileSystem : gpfs1 DataDir : datadir1
Multi-HDFS cluster list:Cluster Name : mycluster1 NameNodesList : [c902f09x11kvm1],[c902f09x11kvm2] DataNodesList : [c902f09x11kvm3],[c902f09x11kvm4] FileSystem : gpfs1 DataDir : datadir1 Cluster Name : mycluster2 NameNodesList : [c902f09x11kvm5],[c902f09x11kvm6] DataNodesList : [c902f09x11kvm7],[c902f09x11kvm8] FileSystem : gpfs1 DataDir : datadir2
Note: Multi-HDFS cluster is not supported in IBM Storage Scale Big Data Analytics Integration Toolkit for HDFS Transparency (Toolkit for HDFS) version 1.0.3.0 under IBM Storage Scale 5.1.1.0. - Do environment checks before initiating the installation
procedure.
./spectrumscale deploy --pr
- Start the IBM Storage Scale installation and the
creation of the CES HDFS nodes.
./spectrumscale deploy
- Verify CES HDFS service after deployment is
completed.
/usr/lpp/mmfs/bin/mmces service list -a
- Check whether the CES HDFS protocol IPs values are configured
properly.
/usr/lpp/mmfs/bin/mmces address list
For more information, see Listing CES HDFS IPs.
- After the CES HDFS nodes are installed, create the HDFS client nodes manually. For more
information, see Apache Hadoop.
For information on the spectrumscale, mmces, and mmhdfs commands, see the IBM Storage Scale: Command and Programming Reference Guide.
Note: If HDFS Transparency is a part of the protocols used in the cluster, ensure that the ACL for GPFS file system is set to -k ALL after all the protocols are installed.mmlsfs to check the -k value.
mmchfs to change the -k value.
Restart all services and IBM Storage Scale to pick up the -k changes.