Most state-of-the-art deep learning models for classification or regression are trained through supervised learning, which requires many labeled examples of relevant data classes. Models “learn” by making predictions on a labeled training dataset; data labels provide both the range of possible answers and the correct answers (or ground truth) for each training example. “Learning,” here, means adjusting model weights to minimize the difference between the model’s predictions and that ground truth. This process requires enough labeled samples for many rounds of training and updates.
While powerful, supervised learning is impractical in some real-world scenarios. Annotating large amounts of data samples is costly and time-consuming, and in cases like rare diseases and newly discovered species, examples may be scarce or non-existent. Consider image recognition tasks: according to one study, humans can recognize approximately 30,000 individually distinguishable object categories.1 It’s not feasible, in terms of time, cost and computational resources, for artificial intelligence models to remotely approach human capabilities if they must be explicitly trained on labeled data for each class.
The need for machine learning models to be able to generalize quickly to a large number of semantic categories with minimal training overhead has given rise to n-shot learning: a subset of machine learning that also includes few-shot learning (FSL) and one-shot learning. Few-shot learning typically uses transfer learning and meta learning-based methods to train models to quickly recognize new classes with only a few labeled training examples—or, in one-shot learning, a single labeled example.
Zero-shot learning, like all n-shot learning, refers not to any specific algorithm or neural network architecture, but to the nature of the learning problem itself: in ZSL, the model is not trained on any labeled examples of the unseen classes it is asked to make predictions on post-training.
This problem setup doesn’t account for whether that class was present (albeit unlabeled) in training data. For example, some large language models (LLMs) are well-suited for ZSL tasks, as they are pre-trained through self-supervised learning on a massive corpus of text that may contain incidental references to or knowledge about unseen data classes. Without labeled examples to draw upon, ZSL methods all rely on the use of such auxiliary knowledge to make predictions.
Given its versatility and wide range of use cases, zero-shot learning has become an increasingly notable area of research in data science, particularly in the fields of computer vision and natural language processing (NLP).