Execution Engine for Apache Hadoop
Version: 5.1.3
Experience: Cloud Pak for Data watsonx™
Description
The Execution Engine for Apache Hadoop service integrates the Watson Studio service with your remote Apache Hadoop cluster.
Data scientists can use this service for the following tasks.
- Browse remote Hadoop data through connections.
- Cleanse and transform remote Hadoop data with Data Refinery.
- Run Data Refinery and jobs on the Hadoop Spark cluster.
- Run a notebook session on the remote Hadoop system.
- Access Hadoop systems with basic utilities from RStudio® and Jupyter Notebooks.
The Execution Engine for Apache Hadoop service includes the following capabilities:
- Services that establish secure connections between Watson Studio and Hadoop.
- Integration with Hadoop for Data Refinery and Notebooks.
- A high availability configuration to the remote Hadoop system.
- Utilities that connect Watson Studio and Hadoop.
The service requires a service user who has the necessary privileges to submit requests on behalf of the Watson Studio users to WebHDFS, Spark, and YARN. The service generates a secure URL for each Watson Studio cluster that is integrated with the Hadoop cluster.
Hadoop ecosystem services
| Service | Purpose |
|---|---|
| WebHDFS | Browse and preview HDFS data. |
| Jupyter Enterprise Gateway | Submit jobs to JEG on the Hadoop cluster. |
| Livy for Spark | Submit jobs to Spark on the Hadoop cluster. |
Licensing information
This service is included in the following licenses:
- IBM Cloud Pak® for Data Enterprise Edition
- IBM Cloud Pak for Data Standard Edition
- IBM® watsonx.ai™
For more information, see Licenses and entitlements.
Quick links
- Install: Install the service
- Use: Work with the service
- What's new: See a list of new features
- Known issues: View limitations
- Administer: Manage and maintain the service
Integrated services
| Service | Capability |
|---|---|
| Watson Studio | Prepare, analyze, and model data in a collaborative environment with tools for data scientists, developers, and domain experts. |
| Service | Capability |
|---|---|
| Watson Machine Learning | Build, train, and deploy machine learning models with a full range of tools. |
| RStudio Server Runtimes | Access the RStudio IDE. |