Execution Engine for Apache Hadoop on Cloud Pak for Data
Important: IBM® Cloud Pak for
Data
Version 4.5 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for
Data Version 4.X.
Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.5 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.
Version: 4.5.3 Included IBM
Description
The Execution Engine for Apache Hadoop services integrates the Watson Studio service with your remote Apache Hadoop cluster.
Data scientists can use this service for the following tasks:
- Browse remote Hadoop data through connections
- Cleanse and transform remote Hadoop data with Data Refinery
- Run Data Refinery and jobs on the Hadoop Spark cluster
- Run a notebook session on the remote Hadoop system
- Access Hadoop systems with basic utilities from RStudio and Jupyter notebooks
The Execution Engine for Apache Hadoop includes:
- Services that establish secure connections between Watson Studio and Hadoop
- Integration with Hadoop for Refinery and Notebook
- A high availability configuration to the remote Hadoop system
- Utilities that connect Watson Studio and Hadoop
The service requires a service user who has the necessary privileges to submit requests on behalf of the Watson Studio users to WebHDFS, WebHCAT, Spark, and YARN. The service generates a secure URL for each Watson Studio cluster that is integrated with the Hadoop cluster.
Quick links
- Install: Install the service
- Use: Work with the service
- What's new: See a list of new features
- Known issues: View limitations
- Administer: Manage and maintain the service
Integrated services
| Service | Capability |
|---|---|
| Watson Studio | Prepare, analyze, and model data in a collaborative environment with tools for data scientists, developers, and domain experts. |
| Service | Capability |
|---|---|
| Watson Machine Learning | Build, train, and deploy machine learning models with a full range of tools. |
| RStudio® Server with R 3.6 | Access the RStudio IDE. |