Troubleshooting Hadoop environments

Use these solutions to resolve issues you might experience when using Hadoop environments.

Restarting Execution Engine for Apache Hadoop services from Cloudera Manager or Ambari

If you must restart the Execution Engine for Apache Hadoop services from Cloudera Manager (CDH) or Ambari (HDP), restart all of the Execution Engine for Apache Hadoop services by running the following commands:

  cd /opt/ibm/dsxhi/bin
  ./stop.py
  ./start.py

Exporting and importing projects

If you export a project that contains a reference to a Hadoop-integrated system, and you're importing the project to a different Cloud Pak for Data cluster, you might have an issue in which notebooks, connections, and refinery jobs fail.

The reason is that when the project is exported, the Hadoop registration that is defined globally is not included as part of the export because this is a global property. Perform the following steps to ensure that the imported project works properly. These steps are required, as Hadoop integration information is not part of a project export, and it is defined globally as part of the Cloud Pak for Data cluster.

This scenario can also occur when the admin deletes an entry in the Hadoop Integration page that is also referenced by a user's environment. If the admin adds back the same entry, you must still take the following steps to correct the issues.

Issues and workaround

The importing and exporting project issue affects:

To resolve this issue, see Workaround. After you complete the workaround, you might need to do additional steps. The following sections include additional information.

Environments

Environments don't show the entire Hadoop details.

Notebooks

Although you can still view the content of the notebook, when you try to launch it in Edit mode, it fails. In addition, you cannot delete the active runtime. Deleting the runtime environment also fails.

When the invalid environment is deleted, the Assets page indicates with an icon that the environment was removed from the notebook.

  1. From the Action button, select Change Environment, and then select the environment that was recently created in the Environments workaround.
  2. Click Associate, and then run the notebook to validate that the workaround is successful.

Jobs

A job will fail with a Failed to find remote host for id error.

This applies to Data Refinery and notebook jobs. When the invalid environment is deleted, the jobs UI will indicate that the job has a missing Environment template.

  1. Click Edit next to Environment template, and in the Environment template tab, select the new environment, and click Submit.
  2. Run the job.

Connectors

A connected data fails with a strange error. Do one of the following tasks:

  • If the Cloud Pak for Data admin created the Hadoop integration registration entry using the same name as previously defined, then no changes for Connection are needed.
  • If the Cloud Pak for Data admin changed the Hadoop integration registration name, then you must navigate to the connection and launch the Edit connection page, and update the HDFS/ Hive URLs based on the renamed registration entry.

Workaround

Use the following workaround to resolve each issue for environments, notebooks, jobs, and connectors:

  1. The Cloud Pak for Data admin must register the same system in the Hadoop Integration page. It is recommended to use the same name for this registration.
  2. Users must create a new environment template that references the new Hadoop registration entry.
  3. Users must delete the invalid environment.
  4. Users must update their job or notebook to reference the new environment.

Error when importing dist-keras in a remote Execution Engine for Apache Hadoop session

The dist-keras package is not supported on Python 3.7 on Power PC Hadoop clusters.

Important: The dist-keras library will no longer be supported as of Cloud Pak for Data version 4.0.

If you are pushing the Python 3.7 Jupyter image to an Execution Engine for Apache Hadoop registered system via Platform configurations, installation of the dist-keras into the image fails on Power machines. This will produce a warning like the following in the image push logs:

Attempting to install HI addon libs to active environment ...
  ==> Target env: /opt/conda/envs/Python-3.7-main ...
  ====> Installing conda packages ...
  ====> Installing pip packages ...
  ==> WARNING: HI addons could not be installed:

  ----------------------------------------------
Collecting package metadata: ...working... done
Solving environment: ...working... done
.
.
.
    File "/opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages/typing.py", line 1003, in __new__
      self._abc_registry = extra._abc_registry
  AttributeError: type object 'Callable' has no attribute '_abc_registry'
.
.
.
  ----------------------------------------------

      A Hadoop admin may need to manually install some libraries
      into the remote image after it is pushed ...

While the image push operation should continue and eventually succeed, attempts to import distkeras in a remote Execution Engine for Apache Hadoop session (Livy or JEG) fails with an error such as No module named 'distkeras'.

Aside from the lack of support for dist-keras, the pushed Python 3.7 image for Power can be used in remote Execution Engine for Apache Hadoop sessions just like any other pushed image.

Setting up Hadoop when a remote system is reinstalled

This is a scenario in which you need to reinstall Execution Engine Apache Hadoop (dsxhi) rpm on your Hadoop system. After that system is reinstalled, there are additional steps that are needed to ensure Hadoop connections continue to work properly.

Hadoop System

If you added a new exposed endpoint for Hadoop, it is recommended that you reregister the Cloud Pak for Data cluster that has been registered with this Hadoop system. The Cloud Pak for Data registration is maintained if you use ./uninstall.py and ./install.py to reinstall the application. This step is not needed if you did a yum erase dsxhi, and yum install dsxhi-*rpm and ran the installation. The latter option clears the registration data.

Use ./manage_known_dsx.py -l  to list, and then use ./manage_known_dsx.py -r  <host> option to refresh the registration.

Cloud Pak for Data

The Cloud Pak for Data admin needs to also refresh the Hadoop registration.

Important: Do not delete the existing registration.

If the registration is deleted, then there are a few user tasks that need to happen. For more information, see Exporting and importing projects.

  1. Navigate to the Hadoop Integration panel and select the system that was updated.
  2. In the details page, click Update Certificate. If the updated fails the first time, the certificate was likely not updated. Try again.

Errors when refining data on a Hadoop cluster

Use the following information to troubleshoot errors when you refine data on the Hadoop cluster.

Error: "Verify that the connection URL and the Hadoop environment URL for Livy spark2 reference the same path"
This error might occur after you upgrade Cloud Pak for Data.

To fix this error, verify that the connection URL and the Hadoop environment URL for Livy spark2 reference the same path. The administrator can confirm the URLs from Administration > Platform configuration > Systems integration. If the URLs are not the same, update the connection URL and the certificate information.

Error: "Connection type: '<type>' is not supported for data shaping on Hadoop environment"
Only Hadoop Execution Engine connections are supported for running Data Refinery jobs in a Hadoop environment. See the list of Hadoop Execution Engine connections in Refining data on the Hadoop cluster.

Error: "Format '<format>' is not supported for HDFS read/write"
You selected an unsupported data format for refining HDFS data. See HDFS via Execution Engine for Hadoop connection for the list of supported data formats.