Table of contents

Configuring the batch processor in Watson OpenScale

After you prepare the Spark engine, you can configure Watson OpenScale to work in batch mode. You must configure a custom machine learning engine that connects to an Apache Hive or JDBC database and an Apache Spark analytics engine.

Prerequisites

Ensure that you have completed the steps for either Preparing the batch processing environment in IBM Analytics Engine or Preparing the batch processing environment on the Hadoop Ecosystem and, for the Hadoop Ecosystem implementation ensure that the Watson OpenScale Spark Manager Application is running.

Step 1: Create the machine learning provider for a custom environment.

  1. Click the configure icon configuration icon is shown to go to the system setup and create a new dedicated batch provider.
  2. Because batch processing only supports a custom environment, you must create a new machine learning provider and select the Custom Environment type. Provide credentials and, for the environment type, select Production.

Step 2: Use the cluster details, to create and save the database connection and Spark engine.

Because Watson OpenScale batch requires both a database and a Spark engine, you must configure connections to these resources. After you create a new custom machine learning provider, you must set up batch support.

  1. From the Batch support tab, click Add batch support.
  2. Enter your Hive or JDBC connection information.
  3. After you configure the connection, you must create a connection to a Spark engine. You’ll need a Spark endpoint and connection credentials.

Step 3: Create the subscription for the batch deployment.

  1. Return to the Watson OpenScale Insights Dashboard to add a new deployment. Click Add to dashboard.
  2. Add the custom endpoint from the batch processing provider that you created and type a deployment name. Select the deployment type Batch and click Configure.

    The select a model deployment window is shown with a custom machine learning provider chosen and the deployment type of batch

  3. Now you’ll need to configure monitors. These configuration steps are the same for other types of deployments with the following exceptions:

    • In the Model input section, the data type defaults to Numeric/categorical.
    • You must select a Spark engine.
    • Payload logging is required for drift monitoring and fairness monitoring. If you only want to monitor quality, you can skip this information.
  4. Click Configure monitors.
  5. From the Model details tab, from the Algorithm type box, choose the algorithm type, and click Save and continue.
  6. In the Analytics engine section, click the Edit icon, choose the Spark engine, and click Save and continue.
  7. In the Payload data section, select the Connection type and type the Payload database, Payload schema, and Payload table information, and then click Save and continue.

Step 4: Run a notebook that requires sample scoring records to generate a configuration JSON file.

Because no scoring is involved with batch processing, you must run a notebook with sample scored test data that’s used to infer the different input and output schema and to create test records. To do this, you must run a custom notebook that is available for downloading. The notebook produces the following things:

  • Configuration JSON file that you need to upload in the Model details section of the user interface
  • DDLs to create the various tables
  • Perturbations scoring response .tar file for your explainability configuration (if explainability is enabled)
  • Drift archive (if drift is enabled)
  1. Run the custom configuration notebook, which produces the JSON file that you need for the following step.
  2. In the Connect to the training data section, load the JSON file to the drop zone and click Save and continue.

Step 5: Enable the quality monitor by providing the feedback table details.

In addition to other required monitor information, you must supply information such as the Connection type name, the Feedback database, the Feedback schema, and the Feedback table. The quality monitor requires this information to function properly. If no sample size is given, all the records are evaluated. For more information, see Prepare models for monitoring.

  1. In the Evaluations section, click Quality.
  2. In the Feedback data section, click the Edit icon.
  3. In the Feedback data section, select the Connection type and type the Feedback database, Feedback schema, and Feedback table information, and then click Next.
  4. Review the Label column and the Prediction column and click Next.
  5. Enter the Quality threshold and Sample size information and click Save.

Step 6: Enable the drift monitor by providing the payload table and drifted data table details.

In addition to other required monitor information, you must supply information such as the Data warehouse connection name, the Payload database, the Payload schema, the Payload transaction table, the Drift database, the Drift schema, and the Drift transaction table. The drift monitor requires this information to function properly. If no sample size is given, all the records are evaluated. For more information, see Prepare models for monitoring.

  1. In the Evaluations section, click Drift.
  2. In the Drift model section, click the Edit icon.
  3. The Training option defaults to Train in a data science notebook. Click Next.
  4. Run the custom configuration notebook, which produces the drift archive that you need for the following step.
  5. In the Upload the drift detection model section, load the drift archive to the drop zone, and click Next.
  6. In the Drifted data section, select the Connection type and type the Drift database, Drift schema, and Drift table information, and then click Next.
  7. Enter the Drift threshold and click Next.
  8. Enter the Sample size and click Save.
  9. Click Go to model summary.
  10. From the Actions menu, click Evaluate now.
  11. From the Evaluate now panel, click Evaluate now.

Step 7: Enable the explainability monitor by providing the explanation table details.

In addition to other required monitor information, you must supply information such as the Data warehouse connection name, the Explanation database, the Explanation results table, and the Explanation queue table. For more information, see Prepare models for monitoring.

  1. In the Evaluations section, click Explainability.
  2. In the Explanation data section, click the Edit icon.
  3. In the Explanation data section, select the Data warehouse connection name, the Explanation database, the Explanation results table, and the Explanation queue table information. If you use a JDBC connection, also select the Explanation schema.
  4. In Sample records, upload your perturbations scoring response .tar file, then click Save.

Step 8: Enable the fairness monitor by providing the fairness feature information.

If you enable fairness and add fairness configuration details in your notebook, then fairness is automatically configured when you upload the JSON file in step 4. You can also enable fairness by using the Watson OpenScale dashboard. In addition to other required monitor information, you must supply information such as the Fairness feature, the Monitored group, and the Fairness threshold. For more information, see Prepare models for monitoring. If you choose to enable fairness and add fairness configuration details in by using the Watson OpenScale dashboard rather than by adding this information in your notebook, complete the following steps:

  1. In the Evaluations section, click Fairness.
  2. In the Fairness data section, click the Edit icon.
  3. In the Fairness data section, click Add feature.
  4. Select the Fairness feature, the Monitored group, the Fairness threshold, the Favorable outcome, and the Unfavorable outcome information. You can also optionally provide Minimum number of records. Click Save.

Step 9: Run a notebook that provides drift analysis.

To view drift analysis, you must process transactions by using a custom notebook for analyzing payload transactions that cause drift. You can download the notebook and the code snippet that you need to populate the notebook on the Drift monitor visualization window.

Next steps

You are ready to view insights. For more information, see the following topics: