Understanding metrics
IBM® Maximo® Visual Inspection provides several metrics to help you measure how effectively your model was trained.
- True positive
- A true positive result is when IBM Maximo Visual Inspection correctly labels or categorizes an image, for example, categorizing an image of a cat as a cat.
- False positive
- A false positive result is when IBM Maximo Visual Inspection labels or categorizes an image when it should not have, for example, categorizing an image of a cat as a dog.
- True negative
- A true negative result is when IBM Maximo Visual Inspection correctly does not label or categorize an image, for example, not categorizing an image of a cat as a dog.
- False negative
- A false negative result is when IBM Maximo Visual Inspection does not label or categorize an image, but should have, for example, not categorizing an image of a cat as a cat.
For a model in production, the values for true negative, positive and false negative, or positive can't accurately be known. These values are the expected values for these measurements.
- Metrics for image classification
- Metrics for object detection
- Metrics for object detection using the Tiny YOLO model
- Metrics for object detection using segmentation
- Metrics for custom models
- Metrics for action detection models
Metrics for image classification
The following metrics display for images that were trained for image classification and optimized for accuracy:
- Accuracy
- Measures the percentage of correctly classified images. It is calculated by the following formula: (true positives + true negatives) / (true positives + true negatives + false positives+ false negatives).
- PR curve
- This metric displays only when Advanced metrics is toggled on. The precision-recall (PR)
curve plots precision vs. recall (sensitivity). Because precision and recall are typically inversely
related, it can help you decide whether the model is appropriate for your needs. That is, do you
need a system with high precision, which has fewer results but the results are more likely to be
accurate, or high recall, which has more results but the results are more likely to contain false positives?
- Precision
- Precision describes how "clean" the population of hits is. It measures the percentage of images that are correctly classified. That is, when the model classifies an image into a category, how often is it correct? It is calculated by the following formula: true positives / (true positives + false positives).
- Recall
- The percentage of the images that were classified into a category, compared to all images that should have been classified into that category. That is, when an image belongs in a category, how often is it identified? It is calculated by the following formula: true positives/(true positives + false negatives).
- Confusion matrix
- This metric displays only when Advanced metrics is toggled on. The confusion matrix is
used to calculate the other metrics, such as precision and recall. Each column of the matrix
represents the instances in a predicted class, such as those that IBM Maximo Visual Inspection
marked as belonging to a category. Each row represents the instances in an actual class. Therefore,
each cell measures how many times an image was correctly and incorrectly classified.
You can view the confusion matrix as a table of values or a heat map. A heat map is a way of visualizing the data, so that the higher values appear more hot, or closer to red, and lower values appear more cool, or closer to blue. Higher values show more confidence in the model.
This matrix makes it easy to see if the model is confusing classes or not identifying certain classes.
Metrics for object detection
- Accuracy: Measures the percentage of correct image classifications. It is calculated by the following formula: (true positives + true negatives) / all cases.
- Loss vs. Iteration: The Loss vs. Iteration graph presents information about the
training loss over the range of iterations used during training. For all models other than High
resolution and Anomaly optimized models, the following measurements of the training loss appear on
this graph:
- Classification, Localization, Segmentation (CLS): Combined error measurement of how accurately the trained model can split the original image into smaller regions, select or localize the most interesting regions, and classify any objects in the region.
- Bounding box (BBox): Measures how precisely the trained model can locate bounding box coordinates for any recognized object, compared to the test subset.
- mAP: Mean average precision (mAP) is the average over all classes of the maximum precision for each object at each recall value. Precision measures how accurate the model is. That is, the percent of the classified objects that are correct. Recall measures how well the model returns the correct objects. For example, out of 100 images of dogs, how many of them were classified as dogs? To calculate this, first, the PR curve is found. Then, the maximum precision for each recall value is determined. This value is the maximum precision for any recall value greater than or equal to the current recall value. For example, if the precision values range from .35 to .55 and then never reach .55 again for recall values in the interval .3 - .6, then the maximum precision for every recall value in the interval .3 - .6 is set to .55. The mAP is then calculated as the average of the maximum precision values.
- IoU: Intersection over union (IoU) is the accuracy of the location and size of the image label boxes. It is calculated by the intersection between a ground truth bounding box and a predicted bounding box, divided by the union of both bounding boxes, where the intersection is the area of overlap, a ground truth bounding box is the hand drawn box, and the predicted bounding box is drawn by IBM Maximo Visual Inspection.
- Confusion matrix: This metric displays only when Advanced metrics is toggled on. The confusion matrix is used to calculate the other metrics, such as precision and recall. Each column of the matrix represents the instances in a predicted class, such as those that IBM Maximo Visual Inspection marked as belonging to a category. Each row represents the instances in an actual class. Therefore, each cell measures how many times an image was correctly and incorrectly classified. You can view the confusion matrix as a table of values or a heat map. A heat map is a way of visualizing the data, so that the higher values appear more "hot", or closer to red, and lower values appear more "cool", or closer to blue. Higher values show more confidence in the model. This matrix makes it easy to see if the model is confusing classes or not identifying certain classes.
- PR curve: This metric displays only when Advanced metrics is toggled on. The
precision-recall (PR) curve plots precision vs. recall or sensitivity. Because precision and recall
are typically inversely related, it can help you decide whether the model is appropriate for your
needs. That is, do you need a system with high precision, which has fewer results but the results
are more likely to be accurate, or high recall, which has more results but the results are more
likely to contain false positives?
- Precision: Precision describes how "clean" the population of hits is. It measures the percentage of objects that are correctly identified. That is, when the model identifies an object, how often is it correct? It is calculated by the following formula: true positives / (true positives + false positives).
- Recall: The percentage of the images that were labeled as an object, compared to all images that contain that object. That is, how often is an object correctly identified? It is calculated by the following formula: true positives/(true positives + false negatives).
Metrics for object detection using segmentation
The following metrics display for images that were trained for object detection and optimized for Detectron2 and High resolution model types:
- Confusion matrix: This metric displays only when Advanced metrics is toggled on. The confusion matrix is used to calculate the other metrics, such as precision and recall. Each column of the matrix represents the instances in a predicted class, such as those that IBM Maximo Visual Inspection marked as belonging to a category. Each row represents the instances in an actual class. Therefore, each cell measures how many times an image was correctly and incorrectly classified. You can view the confusion matrix as a table of values or a heat map. A heat map is a way of visualizing the data, so that the higher values appear more hot, or closer to red, and lower values appear more cool, or closer to blue. Higher values show more confidence in the model. This matrix makes it easy to see if the model is confusing classes or not identifying certain classes.
- Loss vs. Iteration: The Loss vs. Iteration graph presents information about the
training loss over the range of iterations used during training. For Detectron2 models, the
following measurements of the "Train Loss" appear on this graph:
- Classification, Localization, Segmentation (CLS): A combined error measurement of how accurately the trained model can split the original image into smaller regions, select or localize the most interesting regions, and classify any objects in the region.
- Bounding box (BBox): A measurement of how precisely the trained model can locate bounding box coordinates for any recognized object, compared to the test subset.
- PR curve: This metric displays only when Advanced metrics is toggled on. The
precision-recall (PR) curve plots precision vs. recall or sensitivity. Because precision and recall
are typically inversely related, it can help you decide whether the model is appropriate for your
needs. That is, do you need a system with high precision, which has fewer results but the results
are more likely to be accurate, or high recall, which has more results but the results are more
likely to contain false positives)?
- Precision: Precision describes how "clean" the population of hits is. It measures the percentage of objects that are correctly identified. That is, when the model identifies an object, how often is it correct? It is calculated by true positives / (true positives + false positives).
- Recall: The percentage of the images that were labeled as an object, compared to all images that contain that object. That is, how often is an object correctly identified? It is calculated by the following formula: true positives/(true positives + false negatives).
Metrics for custom models
When a custom model is imported and deployed, the following metric is shown:
- Accuracy: Measures the percentage of correct categorizations. It is calculated by the following formula: (true positives + true negatives) / (true positives + true negatives + false positives + false negatives).
Metrics for action detection models
- Accuracy: Measures the percentage of correctly detected actions. It is calculated by (true positives + true negatives) / (true positives + true negatives + false positives + false negatives).
- Precision: Precision describes how "clean" the population of hits is. It measures the percentage of actions that are correctly identified. That is, when the model identifies an action, how often is it correct? It is calculated by true positives / (true positives + false positives).
- Recall: The percentage of the video segments that were labeled as an action, compared to all segments in the video that contain that action. That is, how often is an action correctly identified? It is calculated as true positives/(true positives + false negatives).
- Confusion matrix: This metric displays only when Advanced metrics is toggled on. The confusion matrix is used to calculate the other metrics, such as precision and recall. Each column of the matrix represents the instances in a predicted class, such as those that IBM Maximo Visual Inspection marked as belonging to a category. Each row represents the instances in an actual class. Therefore, each cell measures how many times an image was correctly and incorrectly classified. You can view the confusion matrix as a table of values or a heat map. A heat map is a way of visualizing the data, so that the higher values appear more "hot", or closer to red, and lower values appear more cool, or closer to blue. Higher values show more confidence in the model. This matrix makes it easy to see if the model is confusing classes, or not identifying certain classes.