Optional: Preparing to use HDFS

You prepare for HDFS only if you want processing jobs to store data in a Hadoop Distributed File System (HDFS) data lake.

About this task

Note: (Deprecated) Support for Hadoop file system (HDFS) is removed for Operational Decision Manager events and deprecated for events from other components.

HDFS is required if you enable the processing job for ingestion of raw events, typically for further reuse of your business data.

You can install IBM Business Automation Insights with no HDFS storage and enable it later, as described in Advanced updates.

For reference information about system requirements and other software compatibilities, generate the software compatibility report from the IBM Support site.

Enter IBM Cloud Pak for Automation as the product name.
Select a version number.
Select an operating system.
From the Product components list, make sure that Business Automation Insights is selected.

You can also use the IBM Cloud Pak for Automation detailed system requirements page.

Storage bucket

IBM Business Automation Insights requires a dedicated storage bucket for processing jobs to store data in HDFS.

Permissions

Processing jobs access HDFS through a user which by default is named bai. You can customize this name. Moreover, when Kerberos is enabled with HDFS, processing jobs access HDFS through the name of the Kerberos principal. Therefore, depending on your work case, make sure that the following prerequisites are met.

The default bai user, or your custom user, or a Kerberos user name exists on your Hadoop distribution file system (HDFS) system.
A /user/<user_name> directory exists, where <user_name> is bai, or a custom name, or the Kerberos username.
The user has write access to that directory.

Configuration parameters are described in Business Automation Insights parameters, in particular Flink parameters.

Procedure

Create a user identifier, for example bai.

IBM Business Automation Insights uses this user identifier to write events to HDFS.
Create a directory.

The directory path must end with /user/<user_name>, for example /user/bai.
Give the bai user write permission to that directory.