Terms for AI asset evaluations

Learn the terms and concepts used for evaluating AI assets, including traditional machine learning models and generative AI assets.

Acceptable fairness
The percentage of favorable outcomes that a monitored group must receive to meet the fairness threshold. It is calculated by multiplying perfect equality by the fairness threshold.

Alert
A notification that a performance metric is outside of the acceptable range specified by configured monitors.

API Key
A unique identifier issued by IBM Cloud for connecting to resources. To obtain, open https://cloud.ibm.com/resources, find and expand the resource, such as a storage service, and copy the value for Resource ID without the quotation marks.

Balanced data set
A data set that includes the scoring requests received by the model for the selected hour and the perturbed records.

Baseline data
Previous data that is collected before intervention or modification. This data serves as the foundation to which future data collected is compared to.

Batch deployment
Processes the input data from a file, data connection, or connected data in a storage bucket, and writes the output to a selected destination. A method to deploy models that processes input data from a file and writes the output to a file.

Batch processing
If you need to monitor deployments involving huge payload/feedback data, then batch processing is suggested.

Bias
When a machine learning model produces a result for a monitored person, group, or thing that is considered to be unfair when compared to a reference result. Can be caused by a problem with the training data for a model. The Fairness monitor can detect bias that falls under a threshold you set. Related term: Debiasing.

Cloud Object Storage
A service offered by IBM for storing and accessing data. If Cloud Object Storage is the repository for AI assets, the associated service credentials must be used to connect to the assets for evaluations.
See also: Resource ID, API key.

Confidence score
The probability that a machine learning model's prediction is correct. A higher score indicates a higher probability that the predicted outcome matches the actual outcome.

Contrastive explanation
Explanations that indicate the minimal set of feature column value changes to change the model prediction. This is computed for a single data point.

Data mart
Workspace where all the metadata for AI asset evaluations gets saved.

Debiased transactions
The transactions for which debiased outcome is generated.

Debiasing
Mitigating bias in model predictions using fairness algorithms, automatically or manually.

Deployment
You deploy an AI asset to make an endpoint available so you can input new data to the asset and get a score or generated output.

Drift
WWhen AI asset performance declines over time due to changes in input data or prompt patterns.

Evaluation
The process of using metrics to assess an AI asset and measure how well it performs across dimensions such as fairness, accuracy, or reliability.

Explanation
An insight into the evaluation of a particular measurement of an AI asset, helping users interpret predictive or generative behavior.

Fairness
Determines whether an AI asset produces biased or unbalanced outcomes that favor a monitored group over a reference group.

Features
List of dataset column names (feature columns) used to train a machine learning model.
Example: In a model that predicts whether a person qualifies for a loan, the features for employment status and credit history might be given greater weight than zip code.

Feedback data
Labeled data that matches the schema and structure of the data used to train or fine-tune an AI asset.

Global explanation
Explains model's prediction on a sample of data.

Headless subscription
A subscription that has a realtime deployment behind the scenes. Through headless subscription, user can monitor the deployment by using the data (Payload/Feedback) being supplied to the deployment without supplying any scoring URL.

Labeled data
Data that is labeled in a uniform manner for AI algorithms to recognize during training or fine-tuning.
Example: A table of data with labeled columns is typical for supervised machine learning. Images can also be labeled for use in a machine learning problem.

Local explanation
Explains a model's prediction by using specific, individual examples.

Meta-fields
Specialized data that is unique between products.

Monitor
Tracks performance results for different AI asset evaluations such as fairness, drift, quality, and explainability.
Example: Fairness, drift, quality, explainability.

Monitored group
When evaluating fairness, the monitored group represents the values that are most at risk for biased outcomes.
Example: In the sex feature, Female and Nonbinary can be set as monitored groups.

Online deployment
Method of accessing a deployment through an API endpoint that provides a real-time score or solution on new data.

Payload data
Any real-time data supplied to a model. Consists of requests to a model (input) and responses from a model (output).

Payload logging
Persisting payload data.

Perfect equality
The percentage of favorable outcomes delivered to all reference groups. For the balanced and debiased data sets, the calculation includes monitored group transactions that were altered to become reference group transactions.

Perturbations
Data points that are simulated around real data points during the computation of different metrics that are associated with monitors—such as fairness, explainability.

Pre-production space
An environment used to test data and behavior for AI asset validations before deployment.

Prediction column
The variable that a supervised machine learning model (trained with labeled data) predicts when presented with new data.
See also: Target.

Probability
The confidence with which a model predicts the output. Applicable for classification models.

Production space
A deployment space used for operationalizing AI assets in production environments.

Quality
A monitor that evaluates how well an AI asset generates or predicts accurate and relevant outcomes.

Records
Transactions on which monitors are evaluated.

Reference group
When evaluating fairness, the reference group represents the values that are least at risk for biased outcomes.
Example: For the Age feature, you can set 30-55 as the reference group and compare results for other cohorts to that group.

Relative weight
The relative weight that a feature has on predicting the target variable. A higher weight indicates more importance. Knowing the relative weight helps explain the model results.

Resource ID
The unique identifier for a resource stored in Cloud Object Storage. To obtain:

Open https://cloud.ibm.com/resources
Find and expand the resource (such as a storage service)
Copy the value for Resource ID without the quotation marks

Response time
The time taken to process a scoring or generation request by the AI asset deployment.

Runtime data
Data obtained from running an AI asset’s lifecycle, including inference or generation workloads.

Scoring endpoint
The HTTPS endpoint that users can call to receive the scoring or generated output of a deployed AI asset.

Scoring request
The input to a deployment.
See also: Payload.

Scoring
In a model inference, the action of sending request to model and getting a response.

Self-managed
Model transactions stored in your own data warehouse and evaluated by your own Spark analytics engine.

Service credentials
The access IDs required to connect to IBM Cloud resources.

Service Provider
A machine learning providers (typically a model engine: Watson Machine Learning, Amazon Web Services, Azure, Custom) which hosts the deployments.

Subscription
A deployment getting monitored. There is a 1-1 mapping between deployment and subscription.

System-managed
Model transactions stored in a database and evaluated using computing resources.

Target
The feature, label, or goal that the trained AI asset predicts or generates.
See also: Prediction column.

Threshold
A benchmark for evaluating an AI asset; when performance outcomes fall below this configured range, an alert is triggered.

Training data
Data used to teach, train, or fine-tune an AI asset's learning algorithm.

Transactions
The records for AI asset evaluations that are stored in the payload logging table.

Unlabeled data
Data that is not associated with labels that identify characteristics, classifications, and properties. Unstructured data that is not labeled in a uniform manner.
Example: Email or unlabeled images are typical of unlabeled data. Unlabeled data can be used in unsupervised machine learning.

User ID
The ID of the user associated with the scoring request.