Open source CDH (Cloudera Distributed Hadoop) integration
CDH (Cloudera Distributed Hadoop) is Cloudera’s open source platform distribution, which includes Apache Hadoop and is built specifically to meet enterprise demands. You can integrate CDH with IBM® Spectrum Conductor by configure an existing instance group in IBM Spectrum Conductor 2.5.0, so that it can work with a CDH 6.3.2 cluster that has Kerberos authentication enabled. Use this information as a guide and example of integrating the two products.
Prerequisites for integrating CDH with IBM Spectrum Conductor
In this CDH and IBM Spectrum
Conductor integration example, the first
part involves setting up the environment with the following prerequisites:
- Install a CDH 6.3.2 cluster by following the Cloudera installation documentation.
- Configure a Kerberos Key Distribution Center (KDC). You can refer to an example of configuring a Kerberos KDC on a Red Hat server.
- Enable Kerberos authentication for your CDH cluster by following Cloudera’s documentation for enabling Kerberos authentication for CDH.
- Enable Kerberos user authentication for your IBM Spectrum Conductor cluster.
- Create an instance group with Spark version 2.4.3 and Jupyter notebook.
Once you have the prerequisites, the next step is to configure the created instance group to work with the CDH cluster, and then verify the integration (either by using a notebook or by submitting a Spark batch application).