Configuring the batch processor in Watson OpenScale

After you prepare the Spark engine, you can configure model evaluation to work in batch mode. You must configure a custom machine learning engine that connects to an Apache Spark analytics engine and connects to a database with Apache Hive or a Db2 database with JDBC.

Prerequisites

Ensure that the steps are completed for either Preparing the batch processing environment in IBM Analytics Engine Powered by Apache Spark or Preparing the batch processing environment on the Hadoop Ecosystem. For the Hadoop Ecosystem implementation, ensure that the Watson OpenScale Spark Manager Application is running.

Step 1: Run a notebook to generate required artifacts for monitoring.

You must run a custom notebook that generates configuration artifacts that you need to configure monitors in Watson OpenScale. You can run a notebook that generates multiple artifacts with detailed configuration steps for Hive or JDBC. You can also run a notebook that generates a configuration package that you can download to use the artifacts for Hive or JDBC. These notebooks generate the following artifacts:

Step 2: Create the machine learning provider for a custom environment.

  1. Click the configure icon configuration icon is shown to go to the system setup and create a new dedicated batch provider.
  2. Because batch processing supports only a custom environment, you must create a new machine learning provider and select the Custom Environment type. Provide credentials and for the environment type, select Production.

Step 3: Use the cluster details to create and save the database connection and Spark engine.

Because Watson OpenScale batch requires both a database and a Spark engine, you must configure connections to these resources. After you create a new custom machine learning provider, you must set up batch support.

  1. From the Batch support tab, click Add batch support.
  2. Enter your Hive or JDBC connection information.
    Watson OpenScale supports only JDBC connections to Db2 databases.
  3. After you configure the connection, you must create a connection to a Spark engine. This step requires a Spark endpoint and connection credentials.

Step 4: Create the subscription for the batch deployment.

  1. Return to the Watson OpenScale Insights Dashboard to add a new deployment. Click Add to dashboard.
  2. Select the machine learning provider that you created.
  3. Select the Self-managed deployment type.
  4. Enter a deployment name and click Configure.

    The select a model deployment window is shown with a custom machine learning provider chosen and the deployment type of batch

  5. Click Configure monitors.

  6. Click the Edit icon on the Model input pane.
    You must use the Numeric/categorical data type as the default option.
  7. Select an algorithm type and click Save and continue.
  8. Click the Edit icon on the Analytics engine pane and select a Spark engine.
    Specify your settings and click Save and continue.
  9. In the Payload data section, select Create new table, Use existing table, or Do not use.
    If you select Do not use, click Next. You can't configure fairness, drift, or explainability evaluations if you don't use a payload table.
    If you want to use a payload table, specify the payload data details and click Next.

  10. In the Feedback data section, select Create new table, Use existing table, or Do not use.
    If you select Do not use, click Next. You can't configure quality evaluations if you don't use a feedback table.
    If you want to use a feedback table, specify the feedback data details and click Next.

  11. In the Connect to the training data section, upload the configuration JSON file, and click Save.

Step 5: Enable the quality monitor

  1. In the Evaluations section, click Quality.
  2. Click the Edit icon on the Quality thresholds pane and specify thresholds for your quality metrics.
  3. Optional: Click the Edit icon on the Sample size pane and specify the Minimum sample size.
    If you don't specify a sample size, all of the model records are evaluated.

Step 6: Enable the drift monitor.

In addition to other required monitor information, you must supply information such as the Data warehouse connection name, the Payload database, the Payload schema, the Payload transaction table, the Drift database, the Drift schema, and the Drift transaction table. The drift monitor requires this information to function properly. For more information, see Prepare models for monitoring.

  1. In the Evaluations section, click Drift.
  2. In the Drift model section, click the Edit icon.
  3. The Training option defaults to Train in a data science notebook. Click Next.
  4. Run the custom configuration notebook, which produces the drift archive that you need for the following step.
  5. In the Upload the drift detection model section, load the drift archive to the drop zone, and click Next.
  6. In the Drifted data section, select the Connection type and type the Drift database, Drift schema, and Drift table information, and then click Next.
  7. Enter the Drift threshold and click Next.
  8. Enter the Sample size and click Save.
  9. Click Go to model summary.
  10. From the Actions menu, click Evaluate now.
  11. From the Evaluate now panel, click Evaluate now.

Step 7: Enable the explainability monitor.

In addition to other required monitor information, you must supply information such as the Data warehouse connection name, the Explanation database, the Explanation results table, and the Explanation queue table. For more information, see Prepare models for monitoring.

  1. In the Evaluations section, click Explainability.
  2. In the Explanation data section, click the Edit icon.
  3. In the Explanation data section, select the Data warehouse connection name, the Explanation database, the Explanation results table, and the Explanation queue table information. If you use a JDBC connection, also select the Explanation schema.
  4. In Sample records, upload your perturbations scoring response .tar file, then click Save.

Step 8: Enable the fairness monitor.

If you enable fairness and add fairness configuration details in your notebook, then fairness is automatically configured when you upload the JSON file in step 4. You can also enable fairness by using the Watson OpenScale dashboard. In addition to other required monitor information, you must supply information such as the Fairness feature, the Monitored group, and the Fairness threshold. For more information, see Prepare models for monitoring. If you choose to enable fairness and add fairness configuration details in by using the Watson OpenScale dashboard rather than by adding this information in your notebook, complete the following steps:

  1. In the Evaluations section, click Fairness.
  2. In the Fairness data section, click the Edit icon.
  3. In the Fairness data section, click Add feature.
  4. Select the Fairness feature, the Monitored group, the Fairness threshold, the Favorable outcome, and the Unfavorable outcome information. You can also optionally provide Minimum number of records. Click Save.

Step 9: Run a notebook that provides drift analysis.

To view drift analysis, you must process transactions by using a custom notebook for analyzing payload transactions that cause drift. You can download the notebook and the code snippet that you need to populate the notebook on the Drift monitor visualization window.

Next steps

You are ready to view insights. For more information, see the following topics:

Parent topic: Batch processing overview