Metrics computation using Python SDK

The Watson OpenScale Python SDK is a Python library that you can use to work directly with the Watson OpenScale service on Cloud Pak for Data. You can use the Python SDK to directly configure a logging database, bind your machine learning engine, and select and monitor deployments.

For Cloud Pak for Data version 4.0.7 and later, Watson OpenScale supports the computation of the following fairness metrics and explanation algorithms for model predictions with the Watson OpenScale Python SDK. The following metrics and algorithms can either be computed in a notebook runtime environment or offloaded as Spark jobs against IBM Analytics Engine:

FairScore transformer

You can use FairScore transformer as a post-processing bias mitigation technique that transforms probability estimates or scores of probabilistic binary classification models regarding fairness goals like statistical parity or equalized odds. To use Fair Score Transformer in Watson OpenScale, you must train a Fair score transformer and then use it to transform scores.

Likelihood compensation

Likelihood compensation (LC) is a framework for explaining the deviations of the prediction of a black box model from the ground truth. With test data and the predict function of a black box model, LC can identify the anomalies in the test data and explain what caused the sample to become an anomaly. The LC explanation is provided as deltas, which when added to the original test data or anomaly, converges the prediction of the model to the ground truth. LC provides local explanations and is supported only for regression models.

Protodash explainer

Protodash explainer identifies similar representatives of input data from a reference set, which is typically training data, that need explanation. It uses input data from datapoints that you want to explain regarding instances in a reference data set that belong to the same feature space. This method tries to minimize the maximum mean discrepancy (MMD) between the datapoints that you want to explain and a predetermined number of instances from the training data set that it selects. It tries to select training instances that have the same distribution as the datapoints that you want to explain to help you better understand your model predictions. Protodash explainer is supported only for structured classification models.

SHAP explainer

SHapley Additive exPlanations (SHAP) is a game theoretic approach that explains the output of any machine learning model. It connects optimal credit allocation with local explanations by using Shapley values and their related extensions.

SHAP assigns each model feature an importance value for a particular prediction, which is called a Shapley value. The Shapley value is the average marginal contribution of a feature value across all possible groups of features. The SHAP values of all the input features are always the sum of the difference between baseline or expected model output and the current model output for the prediction that is being explained. The baseline model output can be based on the summary of the training data or any subset of data that explanations must be generated for.

The Shapley values of a set of transactions can be combined to get global explanations that provide an overview of which features of a model are most important.

Smoothed empirical differential

Smoothed empirical differential (SED) is fairness metric that you can use to describe fairness for your model predictions. SED quantifies the differential in the probability of favorable and unfavorable outcomes between intersecting groups divided by features. All intersecting groups are equal, so there are no unprivileged or privileged groups. This calculation produces a SED value that is the minimum ratio of Dirichlet smoothed probability for favorable and unfavorable outcomes between intersecting groups in the data set. The value is in the range 0-1, excluding 0 and 1, and a larger value specifies a better outcome.

Statistical parity difference

Statistical parity difference is a fairness metric that you can use to describe fairness for your model predictions. It is the difference between the ratio of favorable outcomes in unprivileged and privileged groups. This metric can be computed from either the input data set or the output of the data set from a classifier or predicted data set. A value of 0 implies that both groups receive equal benefit, a value less than 0 implies higher benefit for the privileged group, and a value greater than 0 implies higher benefit for the unprivileged group.

You can compute these metrics and algorithms with Watson OpenScale Python SDK version 3.0.14 or later. For more information, see the Watson OpenScale Python SDK documentation.

You can also use sample notebooks to compute fairness metrics and explainability.

Parent topic: Watson OpenScale