Getting started with Execution Engine for Apache Hadoop

Use the Execution Engine for Apache Hadoop, which integrates with the Watson Studio service to build and train models on a Hadoop cluster. If you have data in a Hive or HDFS storage system on a Hadoop cluster, you can work with that data directly on the Hadoop cluster.

With Execution Engine for Apache Hadoop, you can do the following tasks: run data refinery and jobs, browse remote Hadoop data through connections and run a notebook session on the remote Hadoop system.

Checking whether the service is installed

An administrator must install Execution Engine for Apache Hadoop.

To check whether the service is installed:

  1. From the navigation menu, select Services > Services catalog.
  2. Search for Execution Engine for Apache Hadoop.

If the service is installed and ready to use, the tile in the catalog shows Ready to use.

Accessing the service

Cloud Pak for Data watsonx™ Depending on the other services that are installed, Execution Engine for Apache Hadoop is available from either of the IBM Cloud Pak® for Data experience or the IBM watsonx experience.

Follow these steps to access the Execution Engine for Apache Hadoop service:

  1. From the navigation menu, select Services>Services catalog.
  2. Select Execution Engine for Apache Hadoop.
  3. Select a region and a pricing plan.
  4. Enter the details for configuring your resource:
    1. Enter a service name
    2. Select a resource group.
    3. Optional: Enter the tags for your service.
  5. Review the summary and click Create.

Learn more

To learn more about Execution Engine for Apache Hadoop, see the following topics based on the experience that is available in your environment:

IBM Cloud Pak for Data documentation
Machine learning models on a remote Apache Hadoop cluster in Jupiter Python
Using Hadoop utilities
Execution Engine for Apache Hadoop environments
IBM watsonx documentation
Machine learning models on a remote Apache Hadoop cluster in Jupiter Python
Using Hadoop utilities
Execution Engine for Apache Hadoop environments