Image classification applies a class label to an entire image. For example, a simple image classification model might be trained to categorize vehicle images as “car” or “truck”. Conventional image classification systems are limited in sophistication, as they do not process individual image features separately.
Object detection combines image classification with object localization, generating rectangular regions, called “bounding boxes”, in which objects are located: rather than merely labeling a vehicle image as “car” or “truck”, an object detection model could indicate where in the image the car(s) or truck(s) can be found. While object detection can classify multiple elements within an image and approximate each element’s width and height, it cannot discern precise boundaries or shapes. This limits the ability of conventional object detection models to delineate closely bunched objects with overlapping bounding boxes.
Image segmentation processes visual data at the pixel level, using various techniques to annotate individual pixels as belonging to a specific class or instance. “Classic” image segmentation techniques determine annotations by analyzing inherent qualities of each pixel (called “heuristics”) like color and intensity, while deep learning models employ complex neural networks for sophisticated pattern recognition. The outputs of this annotation are segmentation masks, representing the specific pixel-by-pixel boundary and shape of each class—typically corresponding to different objects, features or regions—in the image.
Broadly speaking, image segmentation is used for three types of tasks: semantic segmentation, instance segmentation and panoptic segmentation.