Creating custom evaluations and metrics
To create custom evaluations, select a set of custom metrics to quantitatively track your model deployment and business application. You can define these custom metrics and use them alongside metrics that are generated by other types of evaluations.
You can use one of the following methods to manage custom evaluations and metrics:
- Managing custom metrics with the Python SDK
- Managing custom metrics with the custom metrics provider and Python SDK
- Managing custom metrics with the custom metrics provider and user interface
Managing custom metrics with the Python SDK
To manage custom metrics with the Python SDK, you must perform the following tasks:
- Register the custom monitor with metrics definition.
- Enable the custom monitor.
- Store metric values.
The following advanced tutorial shows how to do this:
You can disable and re-enable custom monitoring at any time. You can remove the custom monitor if you do not need it anymore.
For more information, see the Python SDK documentation.
Register the custom monitor with metrics definition
Before you can start by using custom metrics, you must register the custom monitor, which is the processor that tracks the metrics. You also must define the metrics themselves.
- Use the
get_definition(monitor_name)method to import theMetricandTagobjects. - Use the
metricsmethod to define the metrics, which requirename,thresholds, andtypevalues. - Use the
tagsmethod to define metadata.
The following code is from the working sample notebook that was previously mentioned:
def get_definition(monitor_name):
monitor_definitions = wos_client.monitor_definitions.list().result.monitor_definitions
for definition in monitor_definitions:
if monitor_name == definition.entity.name:
return definition
return None
monitor_name = 'my model performance'
metrics = [MonitorMetricRequest(name='sensitivity',
thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.8)]),
MonitorMetricRequest(name='specificity',
thresholds=[MetricThreshold(type=MetricThresholdTypes.LOWER_LIMIT, default=0.75)])]
tags = [MonitorTagRequest(name='region', description='customer geographical region')]
existing_definition = get_definition(monitor_name)
if existing_definition is None:
custom_monitor_details = wos_client.monitor_definitions.add(name=monitor_name, metrics=metrics, tags=tags, background_mode=False).result
else:
custom_monitor_details = existing_definition
To check how you're doing, run the client.data_mart.monitors.list() command to see whether your newly created monitor and metrics are configured properly.
You can also get the monitor ID by running the following command:
custom_monitor_id = custom_monitor_details.metadata.id
print(custom_monitor_id)
For a more detailed look, run the following command:
custom_monitor_details = wos_client.monitor_definitions.get(monitor_definition_id=custom_monitor_id).result
print('Monitor definition details:', custom_monitor_details)
Enable the custom monitor
Next, you must enable the custom monitor for subscription. This activates the monitor and sets the thresholds.
- Use the
targetmethod to import theThresholdobject. - Use the
thresholdsmethod to set the metriclower_limitvalue. Supply themetric_idvalue as one of the parameters. If you don't remember, you can always use thecustom_monitor_detailscommand to get the details as shown in the previous example.
The following code is from the working sample notebook that was previously mentioned:
target = Target(
target_type=TargetTypes.SUBSCRIPTION,
target_id=subscription_id
)
thresholds = [MetricThresholdOverride(metric_id='sensitivity', type = MetricThresholdTypes.LOWER_LIMIT, value=0.9)]
custom_monitor_instance_details = wos_client.monitor_instances.create(
data_mart_id=data_mart_id,
background_mode=False,
monitor_definition_id=custom_monitor_id,
target=target
).result
To check on your configuration details, use the subscription.monitoring.get_details(monitor_uid=monitor_uid) command.
Store metric values
You must store, or save, your custom metrics to the region where your service instance exists.
- Use the
metricsmethod to set which metrics you are storing. - Use the
subscription.monitoring.store_metricsmethod to commit the metrics.
The following code is from the working sample notebook that was previously mentioned:
from datetime import datetime, timezone, timedelta
from ibm_watson_openscale.base_classes.watson_open_scale_v2 import MonitorMeasurementRequest
custom_monitoring_run_id = "11122223333111abc"
measurement_request = [MonitorMeasurementRequest(timestamp=datetime.now(timezone.utc),
metrics=[{"specificity": 0.78, "sensitivity": 0.67, "region": "us-south"}], run_id=custom_monitoring_run_id)]
print(measurement_request[0])
published_measurement_response = wos_client.monitor_instances.measurements.add(
monitor_instance_id=custom_monitor_instance_id,
monitor_measurement_request=measurement_request).result
published_measurement_id = published_measurement_response[0]["measurement_id"]
print(published_measurement_response)
To list all custom monitors, run the following command:
published_measurement = wos_client.monitor_instances.measurements.get(monitor_instance_id=custom_monitor_instance_id, measurement_id=published_measurement_id).result
print(published_measurement)
Managing custom metrics with the custom metrics provider and Python SDK
To manage custom metrics with the Python SDK, you must perform the following tasks:
The following advanced tutorial shows how to do this:
For more information, see the Python SDK documentation.
Implement the metric endpoint
To generate a custom metric that includes record-level metrics, implement the custom metric and record-level metrics logic in the Custom Metrics Provider (Python Function or REST API).
Record-level metrics are metrics that are computed per row on the feedback or payload logging dataset, or on any other dataset that is used in a custom monitor. You can include record-level metrics in custom monitor implementations by computing the metrics in the custom metrics provider (Python function or REST API), saving them to a custom dataset, and viewing them in the Custom Monitors page in the UI.
For example, suppose that you want to compute the HAP metric on user inputs and outputs for each record (for security or compliance tracking). You can do this by implementing a record-level metric. When users view the evaluation results, they can click a timestamp to see the user input and user output records.
To implement the custom metric and record-level metrics logic in the Custom Metrics Provider (Python Function or REST API), do the following steps:
-
Create a custom dataset.
If you want to store record-level metrics, create a custom dataset after you define the custom monitor and its instance. The dataset can be created using the OpenScale SDK.
Tip: In the simplified flow from the SDK, this step is not required. The flow automatically creates the custom dataset.custom_dataset_info = wos_client.custom_monitor.create_custom_dataset( data_mart_id='data_mart_id_here', subscription_id='subscription_id_here', custom_monitor_id='custom_monitor_id_here' ) -
Implement the logic in the Custom Metrics Provider (Python Function or REST API).
-
OpenScale sends inputs such as data_mart_id, subscription_id, custom_monitor_id, etc., when invoking the custom metrics provider.
-
Read the data from the feedback table, payload logging table, or any other dataset using the input payload variables
-
Compute the required record-level metrics for each row and save the computed record-level metrics to the custom dataset.
-
Compute the custom metrics and publish the aggregated metrics to the measurements API.
-
Update the monitor run status to Finished. A full working example is provided in the simplified and detailed notebooks.
-
Create the custom metrics provider
Define the custom metric provider endpoint details with the authentication information. OpenScale generates the token and invokes the REST endpoint with the token during the runtime
The following code is from the working sample notebook
wos_client.integrated_systems.add(name="Custom Metrics Provider",
description="Custom Metrics Provider", type="custom_metrics_provider",
credentials= {
"auth_type":"bearer",
"token_info": {
"url": IAM_URL,
"headers": { "Content-type": "application/x-www-form-urlencoded" }
"payload": "grant_type=urn:ibm:params:oauth:grant-type:apikey&response_type=cloud_iam&apikey=<api_key>”,
"method": "POST"
}
},
connection={
"display_name": "Custom Metrics Provider",
"endpoint": rest_endpoint_url
}
).result
Managing custom metrics with the custom metrics provider and user interface
Do the following steps:
- Add metric groups.
- Implement the metric endpoint.
- Add the metric endpoint.
- Configure custom monitors.
Add metric groups
- On the home page, click Configure, and then click Metric groups.
- Click Add metric group.
- To configure a metric group by using a JSON file, click Import from file. Upload a JSON file and click Import.
- To configure a metric group manually, click Configure new group.
- Type a name for the metric group, and click Apply. The name must be less than or equal to 48 characters.
- Click the Edit icon on the Model types to support tile and select one or more model types that your evaluation supports. Click Next.
- To specify an evaluation schedule, click the toggle and specify the interval. Click Next.
- Specify the details for the input parameters. For each input parameter, enter the details and then click Add. The parameter name that you specify must match the parameter name that is specified in the metric API. If a parameter is required to configure your custom monitor, select the Required parameter checkbox.
- Click Save.
Add metric endpoints
-
On the home page, click Configure, and then click Metric endpoints.
-
Click Add metric endpoint.
-
Specify a name and a description for the metric endpoint.
-
Click the Edit icon on the Connection tile and specify the connection details. Click Next.
-
Select the metric groups that you want to associate with the metric endpoint and click Save.
If you enable batch support, when you specify a custom watsonx.ai Runtime endpoint URL, you can add the following input parameters to a metric group:
| Parameter | Datatype | Value |
|---|---|---|
| custom_metrics_provider_type | String | wml_batch |
| space_id | String | Your space ID |
| deployment_id | String | watsonx.ai Runtime deployment ID of your custom metric endpoint |
| hardware_spec_id | String | watsonx.ai Runtime hardware spec ID of your custom metric endpoint |
| custom_metrics_wait_time | Int | time in seconds(ex: 60) |
Configure custom monitors
- On the home page, click Insights Dashboard.
- On a model deployment tile, select Configure monitors.
- In the Evaluations section, select the name of the metric group that you added.
- Select the Edit icon on the Metric endpoint tile.
- Select a metric endpoint and click Next. If you don't want to use a metric endpoint, select None.
- Use the toggles to specify the metrics that you want to use to evaluate the model and provide threshold values. Click Next.
- Specify values for the input parameters. If you selected JSON as the data type for the metric group, add the JSON data. Click Next.
You can now evaluate models with a custom monitor.
Accessing and visualizing custom metrics
Visualization of your custom metrics appears on the Insights Dashboard.

If you configured record-level metrics, click a time stamp in the chart to see the records.

If you're monitoring an LLM, you can also click a row to see the transaction details.
