Installing the Execution Engine for Apache Hadoop service
A project administrator can install Execution Engine for Apache Hadoop on IBM® Cloud Pak for Data.Execution Engine for Apache Hadoop can only be installed on-premises.
- Permissions you need for this task
- You must be an administrator of the OpenShift® project (Kubernetes namespace) where you will deploy Execution Engine for Apache Hadoop.
- Information you need to complete this task
-
- Execution Engine for Apache Hadoop
needs only the
restrictedsecurity context constraint (SCC). - Execution Engine for Apache Hadoop must be installed in the same project as Cloud Pak for Data.
- Execution Engine for Apache Hadoop
uses the following storage classes. If you don't use these storage classes on your cluster, ensure
that you have a storage class with an equivalent definition:
- OpenShift Container
Storage:
ocs-storagecluster-cephfs - IBM Spectrum®:
ibm-spectrum-scale-sc - NFS:
managed-nfs-storage - Portworx:
portworx-shared-gp3
- OpenShift Container
Storage:
- Execution Engine for Apache Hadoop
needs only the
Before you begin
Ensure that the cluster meets the minimum requirements for installing Execution Engine for Apache Hadoop. For details, see System requirements.
Additionally, ensure that a cluster administrator completed the required Pre-installation tasks for your environment. Specifically, verify that a cluster administrator completed the following tasks:
- Cloud Pak for Data is installed. For details, see Installing Cloud Pak for Data.
- For environments that use a private container registry, such as air-gapped environments, the Execution Engine for Apache Hadoop software images are mirrored to the private container registry. For details, see Mirroring images to your container registry.
- The cluster is configured to pull the Execution Engine for Apache Hadoop software images. For details, see Configuring your cluster to pull images.
- The Execution Engine for Apache Hadoop catalog source exists. For details, see Creating catalog sources.
- The Execution Engine for Apache Hadoop operator subscription exists. For details, see Creating operator subscriptions.
If these tasks are not complete, the Execution Engine for Apache Hadoop installation will fail.
Prerequisite services
Before you install Execution Engine for Apache Hadoop, ensure that the following services are installed and running:
Procedure
Complete the following tasks to install Execution Engine for Apache Hadoop:
Installing the service
To install Execution Engine for Apache Hadoop:
- Log in to Red Hat® OpenShift Container Platform as a user with sufficient permissions to
complete the
task:
oc login OpenShift_URL:port - Create a Hadoop custom resource to install Execution Engine for Apache Hadoop. Follow the appropriate guidance
for your environment.Important: By creating a Hadoop custom resource with
spec.license.accept: true, you are accepting the license terms for Execution Engine for Apache Hadoop. You can find links to the relevant licenses in IBM Cloud Pak for Data License Information.
The cluster uses Red Hat OpenShift Container Storage
Create a custom resource with the following format.
cat <<EOF |oc apply -f - apiVersion: hadoop.cpd.ibm.com/v1 kind: Hadoop metadata: name: hadoop-cr # This is the recommended name, but you can change it namespace: project-name # Replace with the project where you will install Execution Engine for Apache Hadoop spec: docker_registry_prefix: cp.icr.io/cp/cpd size: small license: accept: true license: Enterprise | Standard | WatsonStudioPremium # Specify the license you purchased. version: 4.0.9 storageVendor: ocs storageClass: ocs-storagecluster-cephfs #if you use a different storage class, replace it with the appropriate storage class EOF
The cluster uses Portworx
Create a custom resource with the following format.
cat <<EOF |oc apply -f - apiVersion: hadoop.cpd.ibm.com/v1 kind: Hadoop metadata: name: hadoop-cr # This is the recommended name, but you can change it namespace: project-name # Replace with the project where you will install Execution Engine for Apache Hadoop spec: docker_registry_prefix: cp.icr.io/cp/cpd size: small license: accept: true license: Enterprise | Standard | WatsonStudioPremium # Specify the license you purchased version: 4.0.9 storageVendor: portworx storageClass: portworx-shared-gp3 #if you use a different storage class, replace it with the appropriate storage class EOF
The cluster uses NFS
Create a custom resource with the following format.
cat <<EOF |oc apply -f - apiVersion: hadoop.cpd.ibm.com/v1 kind: Hadoop metadata: name: hadoop-cr # This is the recommended name, but you can change it namespace: project-name # Replace with the project where you will install Execution Engine for Apache Hadoop spec: docker_registry_prefix: cp.icr.io/cp/cpd size: small license: accept: true license: Enterprise | Standard | WatsonStudioPremium # Specify the license you purchased version: 4.0.9 storageVendor: "" storageClass: managed-nfs-storage #if you use a different storage class, replace it with the appropriate storage class EOF
Verifying the installation
When you create the custom resource, the Execution Engine for Apache Hadoop operator processes the contents of the custom resource and starts up the
microservices that comprise Execution Engine for Apache Hadoop, including Hadoop. (The Hadoop microservice is defined by the hadoop-cr custom
resource.) Execution Engine for Apache Hadoop is installed when the Hadoop status is Completed.
To check the status of the installation:
- Change to the project where you installed Execution Engine for Apache Hadoop:
oc project project-name - Get the status of Execution Engine for Apache Hadoop (hadoop-cr):
oc get Hadoop hadoop-cr -o jsonpath='{.status.hadoopStatus} {"\n"}'Execution Engine for Apache Hadoop is ready when the command returns
Completed
What to do next
Complete the following tasks in order before users can access the service:- Install Execution Engine for Apache Hadoop on a Hadoop cluster.
- Install Execution Engine for Apache Hadoop on a Spectrum Conductor cluster.
- See Execution Engine for Apache Hadoop to get started.