Alternative architectures

This topic describes the alternative architectures that you can use when you do not require high availability (HA) for HDFS Transparency NameNode and when you want Hadoop services to be installed on the HDFS Transparency DataNode while knowing the limitations for these use cases.

  • Non-HA HDFS Transparency NameNode architecture: CDP Private Cloud Base with IBM Storage Scale supports both NameNode HA and non-HA modes. You must use the non-HA NameNode option only for dev, test and non-production use cases.
    Figure 1. Non-HA HDFS Transparency NameNode architecture
    Non-HA HDFS Transparency NameNode architecture
  • HDFS Transparency DataNode colocation architecture: A DataNode can have other Hadoop services colocated within the same node. Cloudera recommends that the DataNode (Worker) have specific services installed onto it. For a list of services that must be installed, see the Worker Hosts column in Cloudera Runtime Cluster Hosts and Role Assignments documentation.
    Note: The NameNode cannot be colocated with the DataNode.
    The HDFS Transparency DataNode colocation architecture has the following limitations:
    • Because IBM Storage Scale is installed on the Hadoop cluster hosts, it is not possible to manage the Hadoop cluster hosts and the storage hosts separately.
    • Requires specific Kernel levels on the Hadoop cluster hosts.
    • The IBM Storage Scale hosts must have the same value for all the uid/gid.
    • IBM Storage Scale requires password-less ssh for either a root or a non-root user with sudo privileges on all nodes.
    Figure 2. HA and DataNode colocation architecture
    HA and DataNode co-location architecture
    Figure 3. Non-HA and DataNode colocation architecture
    Non-HA and DataNode colocation architecture