Configuring flat file service for watsonx BI

Flat file service is an optional feature that you can use to upload files and use them as a source of data in Conversations.

Before you begin

You must meet the following requirements to configure flat file service
  • Install watsonx.data™
  • Install watsonx BI. During installation, make sure to set the enableFlatFile parameter to True.
Important: To configure flat file service, you must have a Presto engine. Before you proceed, review the scaling guidance for watsonx.data Presto to make sure that you have the necessary sizing.

About this task

By configuring flat file service, you can upload data to watsonx.data and use it in watsonx BI.

To complete this task, you must:
  • Have permission to edit secrets
  • Have the Admin role in watsonx.data

Procedure

Setting a namespace variable

Make sure that you are logged in to your Red Hat® OpenShift® CLI and that you have permission to edit secrets, then define the namespace variable for your instance:
PROJECT_CPD_INST_OPERANDS=<operands project>

Creating a MinIO storage bucket

Use the following command to create the wxbiflatfiles MinIO bucket:
MINIOPOD=$(oc get pod -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep ibm-lh-lakehouse-minio)
oc exec $MINIOPOD -n ${PROJECT_CPD_INST_OPERANDS} -- mc mb ibm-lh/data/wxbiflatfiles

Creating the wxbia-wxdata-readers group

To easily grant access to multiple users, create a user group for watsonx BI users in watsonx.data.
  1. In the IBM® Software Hub navigation menu, click Access control > User groups > New user group.
  2. Name the group wxbia-wxdata-readers, provide a description of watsonx.data readers group for watsonx BI. Continue to the next screen, and then grant the group the Data Product Consumer role.
  3. Navigate to the IBM Cloud Pak for Data experience. In the navigation menu, click Services > Instances , then click the name of your watsonx.data instance.
  4. Expand the Actions dropdown menu, then click Manage access > Add users.
  5. Add the wxbia-wxdata-readers group and assign it the User role.

Creating a Presto engine

Complete the following steps to create and configure a Presto engine:
  1. Go to IBM Cloud Pak for Data, then click Services > Instances > lakehouse > Open.
  2. In the navigation menu, select Infrastructure manager > Add component > IBM Presto.
  3. Select Presto (Java) as the engine type.
  4. Provide a display name of your choosing, then select Starter as the Engine configuration.
  5. Clear all checkboxes, then click Create.

The Presto engine might take up to 15 minutes to fully provision. While the engine is provisioning, proceed to the next section.

Associating the Apache Hive catalog and MinIO storage mapping

Complete the following steps to associate your watsonx BI catalog with your MinIO storage mapping.
  1. In the watsonx.data Infrastructure Manager console, click Add component > MinIO.
  2. In the Configuration step, provide the following values:
    Display name
    A name to help you identify this component.
    Bucket name
    wxbiflatfiles
    Endpoint
    http://ibm-lh-lakehouse-minio-svc.<PROJECT_CPD_INST_OPERANDS>.svc.cluster.local:9000
    Be sure to replace PROJECT_CPD_INST_OPERANDS with the namespace for your instance.
    Access key
    Your MinIO access key ID.
    Use the following command to get your MinIO access key:
    oc get secret/ibm-lh-config-secret \
    -n ${PROJECT_CPD_INST_OPERANDS} \
    -o jsonpath='{.data.LH_S3_ACCESS_KEY}'|base64 -d
    Secret access key
    Your MinIO Secret access key.
    Use the following command to get your MinIO secret access key:
    oc get secret/ibm-lh-config-secret \
    -n ${PROJECT_CPD_INST_OPERANDS} \
    -o jsonpath='{.data.LH_S3_SECRET_KEY}'|base64 -d
  3. Ensure the connection is successful before you continue by clicking Test connection.
  4. Toggle the Associate catalog switch to the on position, and then select Apache Hive as the Catalog type.
  5. Name the catalog wxbiflatfiles, and then associate the catalog.

Associating the Apache Hive catalog and Presto engine

Complete the following steps in the watsonx.data Infrastructure Manager console to associate your Apache Hive catalog with the Presto engine:
  1. Verify that your Presto engine is provisioned and ready to use.
  2. Hover over the wxbiflatfiles catalog and click Manage associations.
  3. Select the Presto engine that you previously created, then click Save and restart engine.

Granting user access

You must add both an Admin user and the wxbia-wxdata-readers group to the Presto engine, Apache Hive catalog, and MinIO storage object. The Admin user can be an watsonx.data Admin user, or any other user who has access to your watsonx.data lakehouse instance.

When you open the Presto engine, Apache Hive catalog, or MinIO storage object, check to see whether your desired Admin user has access. If your desired Admin user already has the Admin role, grant access to the wxbia-wxdata-readers group, only.

Grant access to the Presto engine:
  1. In the watsonx.data Infrastructure Manager console, click the Presto engine.
  2. Click Access control > Add access +.
  3. Select the wxbia-wxdata-readers group and assign the role of User, then click Add.
  4. Check whether your desired Admin user has the Admin role. If it doesn't, select the desired user and assign the role of Admin.
Grant access to the Apache Hive catalog:
  1. In the watsonx.data Infrastructure Manager console, click the wxbiflatfiles Apache Hive catalog.
  2. Click Access control > Add access +.
  3. Select the wxbia-wxdata-readers group and assign the role of User, then click Add.
  4. Check whether your desired Admin user has the Admin role. If it doesn't, select the desired user and assign the role of Admin.
Grant access to the MinIO storage object:
  1. In the watsonx.data Infrastructure Manager console, click the MinIO storage object that is associated with your wxbiflatfiles Apache Hive catalog.
  2. Click Access control > Add access +.
  3. Select the wxbia-wxdata-readers group and assign the role of Reader, then click Add.
  4. Check whether your desired Admin user has the Admin role. If it doesn't, select the desired user and assign the role of Admin.

Updating the watsonx.data configuration

After you have all the necessary information, update the secrets on your cluster.
  1. Define the following variables by using the username and password for the Admin user that you assigned in the previous section:
    WATSONX_DATA_USER=<YOUR_ADMIN_USERNAME> 
    WATSONX_DATA_PASS=<YOUR_ADMIN_PASSWORD>
  2. Use the following command to update the secret:
    
    SERVICENAME=$(oc get svc -o name -n ${PROJECT_CPD_INST_OPERANDS} | grep ibm-lh-lakehouse-presto -m 1) 
    
    NAME=$(oc get ${SERVICENAME} -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath='{.metadata.name}') 
    PORT=$(oc get ${SERVICENAME} -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath='{.spec.ports[?(@.name=="tls")].port}')  
    
    WATSONX_DATA_URL=jdbc:presto://${NAME}.${PROJECT_CPD_INST_OPERANDS}.svc.cluster.local:${PORT}?SSL=true 
    URL_ENC=$(echo -n ${WATSONX_DATA_URL} |base64 -w0) 
    PASS_ENC=$(echo -n ${WATSONX_DATA_PASS} |base64 -w0) 
    USER_ENC=$(echo -n ${WATSONX_DATA_USER} |base64 -w0)  
    
    oc patch secret wxbia-cpd-watsonx-data-creds -n ${PROJECT_CPD_INST_OPERANDS} -p '{"data":{"WATSONX_DATA_PASS": "'$PASS_ENC'", "WATSONX_DATA_URL": "'$URL_ENC'", "WATSONX_DATA_USER": "'$USER_ENC'"}}'
    The secret is successfully updated when the oc patch secret command returns:
    secret/wxbia-cpd-watsonx-data-creds patch
  3. Restart the flat file deployment to pick up the new secret values by running:
    oc rollout restart -n ${PROJECT_CPD_INST_OPERANDS} deployments/wxbia-cpd-flatfile
After approximately two minutes, the following command returns and flat file service is ready to use:
delpoyment.apps/wxbia-cpd-flatfile restarted

What to do next

After you configure flat file service, you are ready to upload files. For more information, see Uploading a file in the watsonx BI product documentation.