Administering Cloud Pak for Data clusters

After the Execution Engine for Apache Hadoop service is installed, one of the administrative tasks that must be done is to register the remote clusters: Hadoop or Spectrum Conductor type cluster. Registering the remote cluster integrates Cloud Pak for Data clusters. For Hadoop clusters, data scientists can then access data and submit jobswith high avability, and for the Spectrum Conductor cluster, data scientists can create notebooks and run notebook jobs.

For the Cloud Pak for Data cluster to communicate with the remote cluster, the OpenShift DNS operator of the Cloud Pak for Data cluster must be able to resolve:

Registering remote clusters

The Execution Engine for Apache Hadoop service integrates Cloud Pak for Data clusters to securely access data and submit jobs on the Hadoop cluster with high availability.

The remote cluster admin must first install the Execution Engine for Apache Hadoop service on the edge nodes of the remote cluster for either a Hadoop cluster or a Spectrum Conductor cluster. Then add the Cloud Pak for Data cluster to the service that is installed on each of the edge nodes, and provide the secure URLs and the service user for the service.

A Cloud Pak for Data administrator can then perform the following tasks:

  1. Register a remote cluster
  2. View details about the registered cluster
  3. Push runtime images to the registered remote cluster
  4. Handle high availability for the Execution Engine for Apache Hadoop service

Register a remote cluster

Sign in as the administrator, click the menu icon ( The menu icon) and click Administration > Platform configuration > Systems integration to register your remote clusters. Click New integration. Assign the registration a name and provide the Service URLs and Service User ID that you received from the remote cluster admin.

Tip

The following troubleshooting steps can be performed if the registration fails:

  • Ensure that the URL provided during the registration is correct. Refer to the Managing access for Watson Studio section in Administering Apache Hadoop clusters.
  • Contact the Hadoop admin who installed the service on the Hadoop cluster and ensure that the service user ID that was provided during the registration is correct.
  • Ensure that the Openshift DNS operator is configured to successfully resolve the hostname in the URL provided during the registration.
  • Contact the Openshift administrator to inspect the logs of the utils-api pod for further diagnostics information.

Deleting a remote cluster registration

If you need to delete a registration, be aware that user assets that depend on this registration will no longer work properly. This includes connectors, environments, jobs, and notebooks that depend on the environments.

If a registration is later created with the same ID, users must still re-create the environment, and update all jobs and notebooks to reference the newly created environment in order for the assets to work properly. If the registration is created with a different ID, users must update connections to ensure that the URL that is referenced is correct in addition to updates needed for the jobs and notebooks.

If you need to refresh the registration, such as if you re-installed on the remote cluster for Execution Engine for Apache Hadoop, select the registration. Refresh the certificates first, and then wait a few minutes to allow the dependent pod to be re-created. Refresh the endpoints to ensure all configurations are refreshed.

View details about the registered cluster

In the Details page of each registration, you can view the endpoints, the edge nodes in the high availability setup, and runtimes.

Push runtime images to the registered remote cluster

Data scientists can leverage the Python packages and custom libraries that are installed in a Jupyter Python environment when they’re working with models on remote cluster. These remote images can be used to run notebooks, notebook jobs, and Python script jobs (for Hadoop clusters only)

Restriction

This feature is not supported for RStudio and Jupyter with GPU images.

Requirement

The Watson Studio and Hadoop cluster must be on the same platform architecture for working with the runtime images.

To push a runtime image to the remote cluster from its Details page, an admin uses the push operation to instantiate the process.

Note: The Conda version of the image being pushed must match one of the available Anaconda instances that has been defined.

Pushing the image can take a long time. If the node goes down, retry and push the image again.

If you modified any of the runtime images locally, you can update it on the remote cluster by clicking Replace Image next to it.

Runtimes can have the following statuses:

Handle high availability for the Execution Engine for Apache Hadoop service

Edge node failure

If there’s an edge node failure in the remote environment, the following activities occur:

Load Balancing with multiple Execution Engine for Apache Hadoop Edge nodes

Hadoop clusters only:

Spectrum and Hadoop clusters:

Similar to Livy, JEG sessions are allocated with sticky sessions and follow an active and passive approach. All JEG sessions are run on the same Execution Engine edge node until a failure is detected, at which point all new sessions are allocated on the next available Execution Engine edge node.