Installing the Execution Engine for Apache Hadoop service

A project administrator can install Execution Engine for Apache Hadoop on IBM® Cloud Pak for Data.Execution Engine for Apache Hadoop can only be installed on-premises.

Permissions you need for this task: You must be an administrator of the OpenShift® project (Kubernetes namespace) where you will deploy Execution Engine for Apache Hadoop.

Information you need to complete this task

Execution Engine for Apache Hadoop needs only the restricted security context constraint (SCC).
Execution Engine for Apache Hadoop must be installed in the same project as Cloud Pak for Data.
Execution Engine for Apache Hadoop uses the following storage classes. If you don't use these storage classes on your cluster, ensure that you have a storage class with an equivalent definition:
- OpenShift Container Storage: ocs-storagecluster-cephfs
- IBM Spectrum®: ibm-spectrum-scale-sc
- NFS: managed-nfs-storage
- Portworx: portworx-shared-gp3

Before you begin

Ensure that the cluster meets the minimum requirements for installing Execution Engine for Apache Hadoop. For details, see System requirements.

Additionally, ensure that a cluster administrator completed the required Pre-installation tasks for your environment. Specifically, verify that a cluster administrator completed the following tasks:

Cloud Pak for Data is installed. For details, see Installing Cloud Pak for Data.
For environments that use a private container registry, such as air-gapped environments, the Execution Engine for Apache Hadoop software images are mirrored to the private container registry. For details, see Mirroring images to your container registry.
The cluster is configured to pull the Execution Engine for Apache Hadoop software images. For details, see Configuring your cluster to pull images.
The Execution Engine for Apache Hadoop catalog source exists. For details, see Creating catalog sources.
The Execution Engine for Apache Hadoop operator subscription exists. For details, see Creating operator subscriptions.

If these tasks are not complete, the Execution Engine for Apache Hadoop installation will fail.

Prerequisite services

Before you install Execution Engine for Apache Hadoop, ensure that the following services are installed and running:

Watson™ Studio

Procedure

Complete the following tasks to install Execution Engine for Apache Hadoop:

Installing the service
Verifying the installation
What to do next

Installing the service

To install Execution Engine for Apache Hadoop:

Log in to Red Hat® OpenShift Container Platform as a user with sufficient permissions to complete the task:
```
oc login OpenShift_URL:port
```

Create a Hadoop custom resource to install Execution Engine for Apache Hadoop. Follow the appropriate guidance for your environment.

Important: By creating a Hadoop custom resource with spec.license.accept: true, you are accepting the license terms for Execution Engine for Apache Hadoop. You can find links to the relevant licenses in IBM Cloud Pak for Data License Information.

The cluster uses Red Hat OpenShift Container Storage

Create a custom resource with the following format.

cat <<EOF |oc apply -f -
apiVersion: hadoop.cpd.ibm.com/v1
kind: Hadoop
metadata:
  name: hadoop-cr     # This is the recommended name, but you can change it
  namespace: project-name     # Replace with the project where you will install Execution Engine for Apache Hadoop
spec:
  docker_registry_prefix: cp.icr.io/cp/cpd
  size: small
  license:
    accept: true
    license: Enterprise | Standard | WatsonStudioPremium     # Specify the license you purchased.
  version: 4.0.9
  storageVendor: ocs
  storageClass: ocs-storagecluster-cephfs          #if you use a different storage class, replace it with the appropriate storage class                      
EOF

The cluster uses Portworx

Create a custom resource with the following format.

cat <<EOF |oc apply -f -
apiVersion: hadoop.cpd.ibm.com/v1
kind: Hadoop
metadata:
  name: hadoop-cr     # This is the recommended name, but you can change it
  namespace: project-name     # Replace with the project where you will install Execution Engine for Apache Hadoop
spec:
  docker_registry_prefix: cp.icr.io/cp/cpd
  size: small
  license:
    accept: true
    license: Enterprise | Standard | WatsonStudioPremium     # Specify the license you purchased
  version: 4.0.9
  storageVendor: portworx
  storageClass: portworx-shared-gp3          #if you use a different storage class, replace it with the appropriate storage class                    
EOF

The cluster uses NFS

Create a custom resource with the following format.

cat <<EOF |oc apply -f -
apiVersion: hadoop.cpd.ibm.com/v1
kind: Hadoop
metadata:
  name: hadoop-cr     # This is the recommended name, but you can change it
  namespace: project-name     # Replace with the project where you will install Execution Engine for Apache Hadoop
spec:
  docker_registry_prefix: cp.icr.io/cp/cpd
  size: small
  license:
    accept: true
    license: Enterprise | Standard | WatsonStudioPremium     # Specify the license you purchased
  version: 4.0.9
  storageVendor: ""
  storageClass: managed-nfs-storage          #if you use a different storage class, replace it with the appropriate storage class                   
EOF

Verifying the installation

When you create the custom resource, the Execution Engine for Apache Hadoop operator processes the contents of the custom resource and starts up the microservices that comprise Execution Engine for Apache Hadoop, including Hadoop. (The Hadoop microservice is defined by the hadoop-cr custom resource.) Execution Engine for Apache Hadoop is installed when the Hadoop status is Completed.

To check the status of the installation:

Change to the project where you installed Execution Engine for Apache Hadoop:
```
oc project project-name
```
Get the status of Execution Engine for Apache Hadoop (hadoop-cr):
```
oc get Hadoop hadoop-cr -o jsonpath='{.status.hadoopStatus} {"\n"}'
```
Execution Engine for Apache Hadoop is ready when the command returns Completed

What to do next

Complete the following tasks in order before users can access the service:

Install Execution Engine for Apache Hadoop on a Hadoop cluster.
Install Execution Engine for Apache Hadoop on a Spectrum Conductor cluster.
See Execution Engine for Apache Hadoop to get started.