Table of contents

Installing Execution Engine for Apache Hadoop

A project administrator can install Execution Engine for Apache Hadoop on IBM® Cloud Pak for Data.Execution Engine for Apache Hadoop can only be installed on-premises.

Permissions you need for this task
You must be an administrator of the OpenShift® project (Kubernetes namespace) where you will deploy Execution Engine for Apache Hadoop.
Information you need to complete this task
  • Execution Engine for Apache Hadoop needs only the restricted security context constraint (SCC).
  • Execution Engine for Apache Hadoop must be installed in the same project as Cloud Pak for Data.
  • Execution Engine for Apache Hadoop uses the following storage classes. If you don't use these storage classes on your cluster, ensure that you have a storage class with an equivalent definition:
    • OpenShift Container Storage: ocs-storagecluster-cephfs
    • NFS: managed-nfs-storage
    • Portworx: portworx-shared-gp3

Before you begin

Ensure that the cluster meets the minimum requirements for installing Execution Engine for Apache Hadoop. For details, see System requirements.

Additionally, ensure that a cluster administrator completed the required Pre-installation tasks for your environment. Specifically, verify that a cluster administrator completed the following tasks:

  1. Cloud Pak for Data is installed. For details, see Installing Cloud Pak for Data.
  2. For environments that use a private container registry, such as air-gapped environments, the Execution Engine for Apache Hadoop software images are mirrored to the private container registry. For details, see Mirroring images to your container registry.
  3. The cluster is configured to pull the Execution Engine for Apache Hadoop software images. For details, see Configuring your cluster to pull images.
  4. The Execution Engine for Apache Hadoop operator subscription exists. For details, see Creating operator subscriptions.

If these tasks are not complete, the Execution Engine for Apache Hadoop installation will fail.

Prerequisite services

Before you install Execution Engine for Apache Hadoop, ensure that the following services are installed and running:

Procedure

Complete the following tasks to install Execution Engine for Apache Hadoop:

  1. Installing the service
  2. Verifying the installation
  3. What to do next

Installing the service

To install Execution Engine for Apache Hadoop:

  1. Log in to Red Hat® OpenShift Container Platform as a user with sufficient permissions to complete the task:
    oc login OpenShift_URL:port
  2. Create a Hadoop custom resource to install Execution Engine for Apache Hadoop. Follow the appropriate guidance for your environment.
    • Create a custom resource with the following format.

      cat <<EOF |oc apply -f -
      apiVersion: hadoop.cpd.ibm.com/v1
      kind: Hadoop
      metadata:
        name: hadoop-cr     # This is the recommended name, but you can change it
        namespace: project-name     # Replace with the project where you will install Execution Engine for Apache Hadoop
      spec:
        docker_registry_prefix: cp.icr.io/cp/cpd
        size: small
        license:
          accept: true
          license: Enterprise|Standard|WatsonStudioPremium     # Specify the license you purchased
        version: 4.0.0
        storageVendor: ocs
        storageClass: ocs-storagecluster-cephfs          #if you use a different storage class, replace it with the appropriate storage class                      
      EOF
    • Create a custom resource with the following format.

      cat <<EOF |oc apply -f -
      apiVersion: hadoop.cpd.ibm.com/v1
      kind: Hadoop
      metadata:
        name: hadoop-cr     # This is the recommended name, but you can change it
        namespace: project-name     # Replace with the project where you will install Execution Engine for Apache Hadoop
      spec:
        docker_registry_prefix: cp.icr.io/cp/cpd
        size: small
        license:
          accept: true
          license: Enterprise|Standard|WatsonStudioPremium     # Specify the license you purchased
        version: 4.0.0
        storageVendor: portworx
        storageClass: portworx-shared-g3          #if you use a different storage class, replace it with the appropriate storage class                    
      EOF
    • Create a custom resource with the following format.

      cat <<EOF |oc apply -f -
      apiVersion: hadoop.cpd.ibm.com/v1
      kind: Hadoop
      metadata:
        name: hadoop-cr     # This is the recommended name, but you can change it
        namespace: project-name     # Replace with the project where you will install Execution Engine for Apache Hadoop
      spec:
        docker_registry_prefix: cp.icr.io/cp/cpd
        size: small
        license:
          accept: true
          license: Enterprise|Standard|WatsonStudioPremium     # Specify the license you purchased
        version: 4.0.0
        storageVendor: ""
        storageClass: managed-nfs-storage          #if you use a different storage class, replace it with the appropriate storage class                   
      EOF

Verifying the installation

When you create the custom resource, the Execution Engine for Apache Hadoop operator processes the contents of the custom resource and starts up the microservices that comprise Execution Engine for Apache Hadoop, including Hadoop. (The Hadoop microservice is defined by the hadoop-cr custom resource.) Execution Engine for Apache Hadoop is installed when the Hadoop status is Completed.

To check the status of the installation:

  1. Change to the project where you installed Execution Engine for Apache Hadoop:
    oc project project-name
  2. Get the status of Execution Engine for Apache Hadoop (hadoop-cr):
    oc get Hadoop hadoop-cr -o jsonpath='{.status.hadoopStatus} {"\n"}'

    Execution Engine for Apache Hadoop is ready when the command returns Completed

What to do next

Complete the following tasks in order before users can access the service:
  1. Install Execution Engine for Apache Hadoop on a Hadoop cluster.
  2. Install Execution Engine for Apache Hadoop on a Spectrum Conductor cluster.
  3. See Execution Engine for Apache Hadoop to get started.