Table of contents

Execution Engine for Apache Hadoop

The Execution Engine for Apache Hadoop service integrates the Watson Studio service with your remote Apache Hadoop cluster. The Watson Studio service must be installed before the Execution Engine for Apache Hadoop service is installed.

The service can be configured for high availability and when it’s installed, data scientists can use tools, such as Data Refinery, Jupyter notebooks, and RStudio to build models. Data scientists can also leverage the distributed computing power on Hadoop with secure access to the data without needing to move the data out of the Hadoop cluster.

Architecture

The Execution Engine for Apache Hadoop service includes:

  • Services that establish the integration between Watson Studio and Hadoop.
  • Authenticates requests.
  • Provides remote access to Spark.

The service requires a service user that has the necessary privileges to submit requests on behalf of the Watson Studio users to WebHDFS, WebHCAT, Spark and YARN. It also generates a secure URL for each Watson Studio cluster that needs to be integrated with the Hadoop cluster.

Learn more