Creating custom evaluations and metrics

To create custom evaluations, select a set of custom metrics to quantitatively track your model deployment and business application. You can define these custom metrics and use them alongside metrics that are generated by other types of evaluations.

You can use one of the following methods to manage custom evaluations and metrics:

Managing custom metrics with the Python SDK

To manage custom metrics with the Python SDK, you must perform the following tasks:

  1. Register the custom monitor with metrics definition.
  2. Enable the custom monitor.
  3. Store metric values.

The following advanced tutorial shows how to do this:

You can disable and re-enable custom monitoring at any time. You can remove the custom monitor if you do not need it anymore.

For more information, see the Python SDK documentation.

Register the custom monitor with metrics definition

Before you can start by using custom metrics, you must register the custom monitor, which is the processor that tracks the metrics. You also must define the metrics themselves.

  1. Use the get_definition(monitor_name) method to import the Metric and Tag objects.
  2. Use the metrics method to define the metrics, which require name, thresholds, and type values.
  3. Use the tags method to define metadata.

The following code is from the working sample notebook that was previously mentioned:

def get_definition(monitor_name):
    monitor_definitions = wos_client.monitor_definitions.list().result.monitor_definitions

    for definition in monitor_definitions:
        if monitor_name == definition.entity.name:
            return definition

    return None


monitor_name = 'my model performance'
metrics = [MonitorMetricRequest(name='sensitivity',
                                thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.8)]),
          MonitorMetricRequest(name='specificity',
                                thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.75)])]
tags = [MonitorTagRequest(name='region', description='customer geographical region')]

existing_definition = get_definition(monitor_name)

if existing_definition is None:
    custom_monitor_details = wos_client.monitor_definitions.add(name=monitor_name, metrics=metrics, tags=tags, background_mode=False).result
else:
    custom_monitor_details = existing_definition

To check how you're doing, run the client.data_mart.monitors.list() command to see whether your newly created monitor and metrics are configured properly.

You can also get the monitor ID by running the following command:

custom_monitor_id = custom_monitor_details.metadata.id

print(custom_monitor_id)

For a more detailed look, run the following command:

custom_monitor_details = wos_client.monitor_definitions.get(monitor_definition_id=custom_monitor_id).result
print('Monitor definition details:', custom_monitor_details)

Enable the custom monitor

Next, you must enable the custom monitor for subscription. This activates the monitor and sets the thresholds.

  1. Use the target method to import the Threshold object.
  2. Use the thresholds method to set the metric lower_limit value. Supply the metric_id value as one of the parameters. If you don't remember, you can always use the custom_monitor_details command to get the details as shown in the previous example.

The following code is from the working sample notebook that was previously mentioned:

target = Target(
        target_type=TargetTypes.SUBSCRIPTION,
        target_id=subscription_id
    )

thresholds = [MetricThresholdOverride(metric_id='sensitivity', type = MetricThresholdTypes.LOWER_LIMIT, value=0.9)]

custom_monitor_instance_details = wos_client.monitor_instances.create(
            data_mart_id=data_mart_id,
            background_mode=False,
            monitor_definition_id=custom_monitor_id,
            target=target
).result

To check on your configuration details, use the subscription.monitoring.get_details(monitor_uid=monitor_uid) command.

Store metric values

You must store, or save, your custom metrics to the region where your service instance exists.

  1. Use the metrics method to set which metrics you are storing.
  2. Use the subscription.monitoring.store_metrics method to commit the metrics.

The following code is from the working sample notebook that was previously mentioned:

from datetime import datetime, timezone, timedelta
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import MonitorMeasurementRequest
custom_monitoring_run_id = "11122223333111abc"
measurement_request = [MonitorMeasurementRequest(timestamp=datetime.now(timezone.utc),
                                                 metrics=[{"specificity": 0.78, "sensitivity": 0.67, "region": "us-south"}], run_id=custom_monitoring_run_id)]
print(measurement_request[0])

published_measurement_response = wos_client.monitor_instances.measurements.add(
    monitor_instance_id=custom_monitor_instance_id,
    monitor_measurement_request=measurement_request).result
published_measurement_id = published_measurement_response[0]["measurement_id"]
print(published_measurement_response)

To list all custom monitors, run the following command:

published_measurement = wos_client.monitor_instances.measurements.get(monitor_instance_id=custom_monitor_instance_id, measurement_id=published_measurement_id).result
print(published_measurement)

Managing custom metrics with the custom metrics provider and Python SDK

To manage custom metrics with the Python SDK, you must perform the following tasks:

  1. Register the custom monitor with metrics definition.

  2. Create the custom metrics provider.

  3. Enable the custom monitor.

Create the custom metrics provider

Define the custom metric provider endpoint details with the authentication information. OpenScale generates the token and invokes the REST endpoint with the token during the runtime

The following code is from the working sample notebook

wos_client.integrated_systems.add(name="Custom Metrics Provider",
    description="Custom Metrics Provider", type="custom_metrics_provider",
    credentials=  {
        "auth_type":"bearer",
        "token_info": {
           "url":  IAM_URL,
           "headers": { "Content-type": "application/x-www-form-urlencoded" }
           "payload": "grant_type=urn:ibm:params:oauth:grant-type:apikey&response_type=cloud_iam&apikey=<api_key>”,
           "method": "POST"
        }
    },
    connection={
        "display_name": "Custom Metrics Provider",
        "endpoint": rest_endpoint_url
    }
).result
Tip: The simplied flow from the SDK automatcally registers the custom monitor, enables the custom monitor, and creates the integrated system provider. All you need to do is implement the metric endpoint.

Managing custom metrics with the custom metrics provider and user interface

Do the following steps:

  1. Add metric groups.
  2. Add the metric endpoint.
  3. Configure custom monitors.

Add metric groups

  1. On the home page, click Configure, and then click Metric groups.
  2. Click Add metric group.
  3. To configure a metric group by using a JSON file, click Import from file. Upload a JSON file and click Import.
  4. To configure a metric group manually, click Configure new group.
    1. Type a name for the metric group, and click Apply. The name must be less than or equal to 48 characters.
    2. Click the Edit icon on the Model types to support tile and select one or more model types that your evaluation supports. Click Next.
    3. To specify an evaluation schedule, click the toggle and specify the interval. Click Next.
    4. Specify the details for the input parameters. For each input parameter, enter the details and then click Add. The parameter name that you specify must match the parameter name that is specified in the metric API. If a parameter is required to configure your custom monitor, select the Required parameter checkbox.
    5. Click Save.

Add metric endpoints

  1. On the home page, click Configure, and then click Metric endpoints.

  2. Click Add metric endpoint.

  3. Specify a name and a description for the metric endpoint.

  4. Click the Edit icon on the Connection tile and specify the connection details. Click Next.

  5. Select the metric groups that you want to associate with the metric endpoint and click Save.

    If you enable batch support, when you specify a custom watsonx.ai Runtime endpoint URL, you can add the following input parameters to a metric group:

Input parameters
Parameter Datatype Value
custom_metrics_provider_type String wml_batch
space_id String Your space ID
deployment_id String watsonx.ai Runtime deployment ID of your custom metric endpoint
hardware_spec_id String watsonx.ai Runtime hardware spec ID of your custom metric endpoint
custom_metrics_wait_time Int time in seconds(ex: 60)

Configure custom monitors

  1. On the home page, click Insights Dashboard.
  2. On a model deployment tile, select Configure monitors.
  3. In the Evaluations section, select the name of the metric group that you added.
  4. Select the Edit icon on the Metric endpoint tile.
  5. Select a metric endpoint and click Next. If you don't want to use a metric endpoint, select None.
  6. Use the toggles to specify the metrics that you want to use to evaluate the model and provide threshold values. Click Next.
  7. Specify values for the input parameters. If you selected JSON as the data type for the metric group, add the JSON data. Click Next.

You can now evaluate models with a custom monitor.

Accessing and visualizing custom metrics

Visualization of your custom metrics appears on the Insights Dashboard.

The RAG Quality monitor page shows a time series graph that displays metrics include HAP and PII.

Learn more

Reviewing evaluation results