Installing Cloudera Data Platform Private Cloud Base with IBM Storage Scale
This section describes the steps to create a new CDP Private Cloud Base cluster with the IBM Storage Scale file system specific configuration.
For Cloudera documentation and download information, see Support matrix.
Note: Before implementation ensure that you first read the
entire section because there are deviations from CDP Private Cloud Base installation documentation
when IBM Storage Scale is integrated.

Note: Ensure that CES HDFS cluster configuration is completed and up. For more information, see
CES HDFS. If the NameNode HA, Kerberos and/or TLS are enabled on
the CES HDFS cluster, the CDP Private Base cluster must be setup with the same configuration.
- Install the Cloudera Manager (CM). For more information on the CDP Private Base version,
see the CDP Private Cloud Base Installation Guide.Cloudera has the following two types of installation methods:
- Trial Installation: The trial installation is to install the trial version of CDP Private Cloud Base in a non-production environment for demonstration and proof-of-concept use cases. This installation method is recommended for trial deployments but is not supported for production deployments because it is not designed to scale.
- Production Installation: This topic describes the information for installing CDP Private Cloud Base using the Production Installation method.
- Stop the HDFS Transparency services.On the CES HDFS cluster, stop the HDFS Transparency NameNodes and DataNodes by running the following commands:
# /usr/lpp/mmfs/hadoop/sbin/mmhdfs hdfs-dn stop # /usr/lpp/mmfs/bin/mmces service stop HDFS -a
Note: You need to stop the HDFS Transparency nodes because the Cloudera Manager can only manage HDFS Transparency NameNodes and DataNodes if they are started using the Cloudera Manager. - On the Cloudera Manager node, install Cloudera Manager and ensure that you can log in to the
Cloudera Manager GUI. Perform the following steps to create a new CDP Private Cloud Base cluster:
- Log in to the Cloudera Manager GUI using the following
credentials:
username: admin password: admin
- Upload the CDP Private Cloud Base license.
- Log in to the Cloudera Manager GUI using the following
credentials:
- Optional: Enable Kerberos in Cloudera Manager.
It is recommended to enable Kerberos before you proceed to the next step.
To enable Kerberos, see Kerberos.
- Optional: Enable Auto-TLS in Cloudera Manager.
Ensure that you enable Kerberos in Cloudera Manager before you enable auto-TLS.
To enable TLS, see Enabling TLS.
In
, you could leave the Trusted CA Certificates Location field blank so that Cloudera Manager can auto-generate a new certificate.If you already have a certificate, enter its path.
- Deploy the IBM Storage Scale CSD.
- From the IBM Storage Scale cluster, get the
gpfs.hdfs.cloudera.cdp.csd-<version-number>.noarch.rpm package and copy
it to the Cloudera Manager node. To get the package from the self-extracting installed path, see
Downloads. For example,
/usr/lpp/mmfs/5.1.1.0/hdfs_rpms/rhel7/hdfs_3.1.1.x
- As root, log in to the Cloudera Manager node and install the IBM Storage Scale Cloudera Custom Service Descriptor (CDP CSD) package by running
the following
command:
# rpm -ivh /root/gpfs.hdfs.cloudera.cdp.csd-<version-number>.noarch.rpm
- Restart the Cloudera Manager server by running the following
command:
# systemctl restart cloudera-scm-server.service
- Check for any errors in the /var/log/cloudera-scm-server/cloudera-scm-server.log file.
- From the IBM Storage Scale cluster, get the
gpfs.hdfs.cloudera.cdp.csd-<version-number>.noarch.rpm package and copy
it to the Cloudera Manager node. To get the package from the self-extracting installed path, see
Downloads.
- Add a cluster in Cloudera Manager.
- Ensure the CES HDFS NameNodes and DataNodes are not running.
- Click Add cluster.
- Enter the CDP Private Cloud Base cluster name and click Continue.
- Register Hosts, click Search and then click
Continue.Note:
- Before registering the hosts, ensure that the DNS names across the cluster are resolvable. All the hostnames from CES HDFS and CDP Private Cloud Base nodes must return FQDN values.
- In addition to registering CDP Private Cloud Base hosts, you must register the HDFS Transparency
NameNode and DataNode hosts from your CES HDFS cluster. To find the HDFS Transparency nodes, run the
following command on the CES HDFS
cluster:
# /usr/lpp/mmfs/hadoop/sbin/mmhdfs hdfs status
- For registering the HDFS Transparency hosts to Cloudera Manager, the ssh private key for root or the root password of the CES HDFS nodes must be provided in the Cloudera Manager wizard. After the hosts are registered, you can change the password or remove the private key.
- Select Repository location. It can be a public Cloudera repository or your own local repository. For local repository, select Custom Repository and enter the required details.
- Under CDH and other software, select Use Parcels (Recommended) and click Parcel Repositories and Network Settings.
- Add the relevant parcel information in the Remote Parcel Repository URLs tab and remove all the URLs that are not relevant. Click Close and then click Continue.
- Under Select JDK, select Install a Cloudera-provided version
of OpenJDK and click Continue.Note: Cloudera needs to have the same version of all the common software, including Java™™, on all the managed hosts. Otherwise, the hosts might report bad health.
- Under Enter Login Credentials, enter a common userid/password or the ssh private key for the managed hosts and click Continue.
- Under Install Agents, wait for all the installations to complete successfully and for the Install Parcels window to show up.
- Under Install Parcels, wait for the packages to be downloaded, distributed, unpacked and activated. Click Continue.
- Under Inspect Cluster, click Inspect Network Performance and Inspect Hosts. Click Continue.
- Install services for CDP Private Cloud Base cluster.During the installation of a new CDP Private Cloud Base cluster, you can specify the services that you want to install.
- Select the services that you want to install on your CDP Private Cloud Base cluster. IBM Storage Scale service must be included as part of
this initial cluster creation. Note:
- CDP Private Cloud Base with IBM Storage Scale is supported only with a new installation setup with IBM Storage Scale service as the file system. The minimum services required to be selected are Zookeeper, Yarn and IBM Storage Scale service. If you are planning to use the Ranger, Solr and Atlas services, it is recommended to include them at the time of initial cluster creation.
- Creating a CDP Private Cloud Base cluster without the IBM Storage Scale service and then adding the IBM Storage Scale service later is not supported.
- While creating the new CDP Private Cloud Base cluster, do not include the HDFS service. If you include the HDFS service, unrecoverable errors might occur.
- Do not place any Hadoop services, other than the IBM Storage Scale service, on the CES HDFS cluster hosts. However, it is permitted and recommended to add Gateway roles for other Hadoop services on the CES HDFS cluster hosts.
- If you need Ranger, add Solr and Ranger services together with the IBM Storage Scale service at the time of initial cluster creation. Otherwise, if you want to add Ranger later, a workaround is needed for Solr as mentioned in Solr does not start after adding Ranger.
- If you need Hive, Livy or Oozie, enable the proxyuser settings for HDFS Transparency for these
CDP Private Cloud Base services by following Enable proxyuser settings for HDFS
Transparency.
This step is not needed if the CES HDFS cluster was created using the IBM Storage Scale Install Toolkit.
- Hive on Tez should be installed for HiveServer2 (HS2) for all the Hive tables (managed and external tables). For more information, see Hive on Tez introduction in the Cloudera documentation.
- If you need Oozie, configure dfs.namenode.fs-limits.min-block-size = <dfs.blocksize> on the client side through the Cloudera Manager. For more information, see item 15 on CDP troubleshooting.
- In the IBM Storage Scale service installation
wizard, perform the following:
- Assign the NameNode and DataNode roles based on the actual CES HDFS NameNode and DataNode hosts.
- Assign the Gateway roles to one or more CDP Private Cloud Base nodes. Assigning these roles help in creating the HDFS client config xmls under /etc/hadoop. These xmls are required to run the HDFS client commands.
- Set the following IBM Storage Scale parameters:
- default_fs_name to hdfs://<myceshost>:8020
- webhdfs_url to http://<myceshost>:50070/webhdfs/v1
- transparency.namenode.http.port to 50070. This is default NameNode JMX metrics port.
- transparency.datanode.http.port to 1006. This is
DataNode JMX metrics port. The default value is 9864 if Kerberos is disabled and
1006 if Kerberos is enabled.Note:
- In this example, the hostname corresponding to CES IP configured on HDFS Transparency is <myceshost>.
- 8020/50070 are the default RPC and HTTP ports for NameNodes. If you are not using these default ports, update the parameters accordingly.
- If you are adding Ranger, additional configurations are needed for HDFS Transparency. For information on configuring HDFS Transparency and the required configuration parameters needed for the Ranger service, see Enabling Ranger.
- Save the changes and then proceed to configure other services, followed by starting all the services. Ensure that all the services have started successfully and that there are no errors.
- After Cloudera Private Cloud Base is created, additional Kerberos specific inputs may be
required for the IBM Storage Scale service. Set the
following IBM Storage Scale parameters:
- Go to
- Add the dfs.namenode.kerberos.principal.pattern parameter and set its value
as NameNode principal regular expression. This could be as open as
*
. - Add the hadoop.security.service.user.name.key.pattern parameter and set its
value as
*
.
and add the following custom parameters: - Add the dfs.namenode.kerberos.principal.pattern parameter and set its value
as NameNode principal regular expression. This could be as open as
- spectrumscale_keytab is the actual path of the keytab file configured for HDFS Transparency NameNode. The default value is /etc/security/keytabs/nn.service.keytab. Update this parameter if the default path is not used.
- scale_hdfs_principal_name is the actual Kerberos principal configured for HDFS Transparency NameNode. The default value is nn. Update this parameter if the default path is not used.
- Go to
- Select the services that you want to install on your CDP Private Cloud Base cluster. IBM Storage Scale service must be included as part of
this initial cluster creation.
- Enable NameNode HA.
Enable HA if the CES HDFS cluster NameNode HA is enabled.
To enable NameNode HA, see Enabling NameNode HA.
- Verify the CDP Private Cloud Base with IBM
Storage Scale environment.
To verify the cluster, see Verifying installation.