Architecture

This topic describes the architecture of Cloudera Data Platform (CDP) Private Cloud Base with IBM Storage Scale.

As shown in Figure 1 and Figure 2, CDP Private Cloud Base can be deployed with IBM Storage Scale using Remote mount or single IBM Storage Scale cluster.

The benefits of separation of the Hadoop cluster hosts (master hosts, utility hosts, gateway hosts, or worker hosts) from the storage hosts (HDFS Transparency NameNodes and DataNodes) are as follows:
  • The Hadoop layer and the storage layer can be managed separately and by different teams.
  • As IBM Storage Scale is not installed on the Hadoop cluster hosts, you do not need specific Kernel levels on the Hadoop cluster hosts.
  • Only the IBM Storage Scale hosts must have the same value for uid/gid.
  • Only IBM Storage Scale requires password-less ssh for either a root or a non-root user with sudo privileges on all the nodes
Recommended configuration
  • Storage hosts:
    • NameNode HA (2 NameNodes)
    • DataNode resiliency (3 DataNodes)
    Note: The performance also depends on the network and the number of DataNodes that can drive the bandwidth of the storage (IBM Storage Scale System) and the number of Hadoop worker hosts.
  • Hadoop cluster hosts:
  • The NameNode cannot be colocated with the DataNode or with any other Hadoop services.
Figure 1. Deploying Cloudera Data Platform (CDP) Private Cloud Base with IBM Storage Scale using Remote mount
Deploying Cloudera Data Platform (CDP) Private Cloud Base with IBM Storage Scale using Remote mount
Figure 2. Deploying Cloudera Data Platform (CDP) Private Cloud Base with IBM Storage Scale using single IBM Storage Scale cluster
Deploying Cloudera Data Platform (CDP) Private Cloud Base with IBM Storage Scale using single IBM Storage Scale cluster
Cloudera Data Platform (CDP) consists of CDP Private Cloud Base cluster, IBM Storage Scale CES HDFS Transparency cluster and the shared storage layer.
CDP Private Cloud Base cluster
The CDP Private Cloud Base cluster consists of CDP nodes. One of these nodes hosts the Cloudera Manager where the IBM Storage Scale CSD will be placed into the CM directory for CSD jar files.

For CDP Private Cloud Base node roles recommendations, see Runtime Cluster Hosts and Role Assignments under the CDP Private Cloud Cloudera documentation.

IBM Storage Scale CES HDFS transparency cluster
The IBM Storage Scale CES HDFS Transparency cluster consists of NameNodes (CES protocol node and IBM Storage Scale client) and DataNodes (IBM Storage Scale client). The minimum requirement is to have two IBM Storage Scale HDFS Transparency NameNodes (HA) and three or more IBM Storage Scale HDFS Transparency DataNodes. The NameNodes are a part of the CES protocol nodes while the DataNodes are not a part of the CES protocol nodes. The CES HDFS Transparency nodes also consist of the IBM Storage Scale native clients. The Cloudera Manager Agent (CM agent) is also present in the IBM Storage Scale CES HDFS transparency cluster. The function of the CM agent is to facilitate the management of HDFS transparency NameNodes and HDFS transparency DataNodes from the Cloudera Manager in the CDP Private Cloud Base cluster.

The following figure shows the Cloudera and IBM Storage Scale/HDFS Transparency components on the CES HDFS nodes:

Figure 3. Cloudera and IBM Storage Scale/HDFS Transparency components on CES HDFS nodes
Cloudera and IBM Storage Scale/HDFS Transparency components on CES HDFS nodes
Cloudera and IBM Storage Scale/HDFS Transparency components on the CES HDFS nodes are described in the following list:
  1. Cloudera Manager agent: The Cloudera Manager agent is a python-based agent. It consists of cloudera-manager-agent and cloudera-manager-daemons as its components. The Cloudera Manager agent can be installed through Cloudera Manager or you can also install it manually. If you are installing the Cloudera Manager agent directly on the hosts through Cloudera Manager, you need to provide the password or the ssh-private key of the managed host. You do not need the password or the ssh-private key of the managed host if you are installing manually.
  2. CDP Private Cloud Base parcels: CDP parcels contains the installable for the CDP Private Cloud Base services. Hosts download the parcel using HTTP (wget) from Cloudera Manager.
  3. CDP Private Cloud Base Java™: Cloudera Manager requires to have the same version of Java on all the managed nodes. If the CM agent is installed using CM, CDP Private Cloud Base version of Java will also be installed using CM. For information on the Java level support, see Hardware and software requirements.
  4. Ranger plug in for HDFS: Ranger plug in for HDFS is needed for the NameNode to cache the Ranger policies.
  5. Java for HDFS Transparency: A version of Java must already be installed on HDFS Transparency prior to the node being managed by Cloudera Manager.
  6. Kerberos client: For supported Kerberos distributions, see Kerberos.
IBM Storage Scale cluster
The IBM Storage Scale cluster as shown at the bottom of the Figure 1 can either be IBM Elastic Storage® system or any other shared storage system.

CES HDFS Transparency is remote mounted to IBM Storage Scale System as shown in Figure 1.

For information on Dual network deployment, see Dual-network deployment.
Note: If you plan to use object protocol, select the single IBM Storage Scale cluster architecture as shown in Figure 2. For more information, see the Limitations of protocols on remotely mounted file systems topic in the IBM Storage Scale: Administration Guide.