Fairness metrics overview

Use IBM Watson OpenScale fairness monitoring to determine whether outcomes that are produced by your model are fair or not for monitored group. When fairness monitoring is enabled, it generates a set of metrics every hour by default. You can generate these metrics on demand by clicking the Check fairness now button or by using the Python client.

Watson OpenScale automatically identifies whether any known protected attributes are present in a model. When Watson OpenScale detects these attributes, it automatically recommends configuring bias monitors for each attribute present, to ensure that bias against these potentially sensitive attributes is tracked in production.

Currently, Watson OpenScale detects and recommends monitors for the following protected attributes:

In addition to detecting protected attributes, Watson OpenScale recommends which values within each attribute should be set as the monitored and the reference values. For example, Watson OpenScale recommends that within the Sex attribute, the bias monitor be configured such that Female and Non-Binary are the monitored values, and Male is the reference value. If you want to change any of the recommendations, you can edit them via the bias configuration panel.

Recommended bias monitors help to speed up configuration and ensure that you are checking your AI models for fairness against sensitive attributes. As regulators begin to turn a sharper eye on algorithmic bias, it is becoming more critical that organizations have a clear understanding of how their models are performing, and whether they are producing unfair outcomes for certain groups.

Understanding Fairness

Watson OpenScale checks your deployed model for bias at runtime. To detect bias for a deployed model, you must define fairness attributes, such as Age or Sex, as detailed in the Configuring the Fairness monitor section.

It is mandatory to specify the output schema for a model or function in Machine Learning, for bias checking to be enabled in Watson OpenScale. The output schema can be specified using the client.repository.ModelMetaNames.OUTPUT_DATA_SCHEMA property in the metadata part of the store_model API. For more information, see the IBM Watson Machine Learning client documentation.

How it works

Before configuring the Fairness monitor, there a few key concepts that are critical to understand:

The Watson OpenScale algorithm computes bias on an hourly basis, using the last N records present in the payload logging table; the value of N is specified when configuring Fairness. The algorithm perturbs these last N records to generate additional data.

The perturbation is done by changing the value of the fairness attribute from Reference to Monitored, or vice-versa. The perturbed data is then sent to the model to evaluate its behavior. The algorithm looks at the last N records in the payload table, and the behavior of the model on the perturbed data, to decide if the model is acting in a biased manner.

A model is deemed to be biased if, across this combined dataset, the percentage of Favorable outcomes for the Monitored class is less than the percentage of Favorable outcomes for the Reference class, by some threshold value. This threshold value is to be specified when configuring Fairness.

Fairness values can be more than 100%. This means that the Monitored group received more favorable outcomes than the Reference group. In addition, if no new scoring requests are sent, then the Fairness value will remain constant.

Balanced data and perfect equality

For balanced data sets the following concepts apply:

If the monitored feature is SEX and the monitored group is FEMALE, all MALE transactions are duplicated as FEMALE transactions. Other features values remain unchanged. These new synthesized FEMALE transactions are added to the set of original FEMALE monitored group transactions.

Do the math

The fairness metric used in Watson OpenScale is disparate impact, which is a measure of how the rate at which an unprivileged group receives a certain outcome or result compares with the rate at which a privileged group receives that same outcome or result.

The following mathematical formula is used for calculating disparate impact:

                     (num_positives(privileged=False) / num_instances(privileged=False))
Disparate impact =   ______________________________________________________________________

                     (num_positives(privileged=True) / num_instances(privileged=True))

where num_positives is the number of individuals in the group (either privileged=False, i.e. unprivileged, or privileged=True, i.e. privileged) who received a positive outcome, and num_instances is the total number of individuals in the group.

The resulting number will be a percentage, which is the percentage that the rate at which unprivileged group receives the positive outcome is of the rate at which the privileged group receives the positive outcome. For instance, if a credit risk model assigns the “no risk” prediction to 80% of unprivileged applicants and to 100% of privileged applicants, that model would have a disparate impact (presented as the fairness score in Watson OpenScale) of 80%.

In Watson OpenScale, the positive outcomes are designated as the favorable outcomes, and the negative outcomes are designated as the unfavorable outcomes. The privileged group is designated as the reference group, and the unprivileged group is designated as the monitored group.

The following mathematical formula is used for calculating perfect equality:

Perfect equality =   Percentage of favorable outcomes for all reference transactions, 
                     including the synthesized transactions from the monitored group

For example, if the monitored feature is SEX and the monitored group is FEMALE, the following formula shows the equation for perfect equality:

Perfect equality for `SEX` =  Percentage of favorable outcomes for `MALE` transactions, 
                                 including the synthesized transactions that were initially `FEMALE` but changed to `MALE`

Bias visualization

When potential bias is detected, Watson OpenScale performs several functions to confirm whether the bias is real. Watson OpenScale perturbs the data by flipping the monitored value to the reference value and then running this new record through the model. It then surfaces the resulting output as the debiased output. Watson OpenScale also trains a shadow debiased model that it then uses to detect when a model is going to make a biased prediction.

Two different datasets are used for computing fairness and accuracy. Fairness is computed by using the payload + perturbed data. Accuracy is computed by using the feedback data. To compute accuracy, Watson OpenScale needs manually labelled data, which is only present in feedback table.

The results of these determinations are available in the bias visualization, which includes the following views. (You only see the views if there is data to support

Example

Consider a data point where, for Sex=Male (Reference value), the model predicts an Favorable outcome, but when the record is perturbed by changing Sex to Female (Monitored value), while keeping all other feature values the same, the model predicts an Unfavorable outcome. A model overall is said to exhibit bias if there are sufficient data points (across the last N records in the payload table, plus the perturbed data) where the model was acting in a biased manner.

Supported models

Watson OpenScale supports bias detection only for those models and Python functions which expect some kind of structured data in its feature vector.

Fairness metrics are calculated based on the scoring payload data.

For proper monitoring purpose, every scoring request should be logged in Watson OpenScale as well. Payload data logging is automated for IBM Watson Machine Learning engines.

For other machine learning engines, the payload data can be provided either by using the Python client or the REST API.

For machine learning engines other than IBM Watson Machine Learning, fairness monitoring creates additional scoring requests on the monitored deployment.

You can review the following information:

Supported fairness metrics

The following fairness metrics are supported by Watson OpenScale:

The following protected attributes are supported by Watson OpenScale:

Supported fairness details

The following details for fairness metrics are supported by Watson OpenScale:

How is model bias mitigated by using Watson OpenScale?

The debiasing capability in Watson OpenScale is enterprise grade. It is robust, scalable and can handle a wide variety of models. Debiasing in Watson OpenScale consists of a two-step process: Learning Phase: Learning customer model behavior to understand when it acts in a biased manner.

Application Phase: Identifying whether the customer’s model acts in a biased manner on a specific data point and, if needed, fixing the bias. For more information, see Understanding how debiasing works and Debiasing options.

Is it possible to check for model bias on sensitive attributes, such as race and sex, even when the model is not trained on them?

Yes. Recently, Watson OpenScale delivered a ground-breaking feature called “Indirect Bias detection.” Use it to detect whether the model is exhibiting bias indirectly for sensitive attributes, even though the model is not trained on these attributes. For more information, see Understanding how debiasing works.

Is it possible to mitigate bias for regression-based models?

Yes. You can use Watson OpenScale to mitigate bias on regression-based models. No additional configuration is needed from you to use this feature. Bias mitigation for regression models is done out-of-box when the model exhibits bias.

What are the different methods of debiasing in Watson OpenScale?

You can use both Active Debiasing and Passive Debiasing for debiasing. For more information, see Debiasing options.