Zero-shot classification uses zero-shot prompting, a prompt engineering technique that allows a model to perform a task without any specific training or examples. This is an application of zero-shot learning (ZSL) which is a machine learning method that relies on the pretrained models’ ability to recognize and categorize objects or concepts on post training. ZSL is similar to few-shot learning (FSL), the ability for a model to make an accurate prediction by training on a small number of labeled examples. Both techniques are used to enable models to perform tasks that they haven’t been explicitly trained on.

Researchers have been experimenting with machine learning models for classification tasks since the 50s. The Perceptron is an early classification model that uses a decision-boundary to classify data into different groups. Many believe that the concept behind the model sparked interest in artificial intelligence, influencing deep learning algorithms that can classify objects or translate languages. However, most ML/DL methods rely on supervised learning techniques to classify labels and therefore need to be trained using a large amount of task-specific labeled training data. This presents a challenge as the large, annotated datasets required to train these models simply do not exist for every domain. Some researchers motivated by these constraints say that large language models (LLMs) are the way around these data limitations.

LLMs are designed to perform natural language processing (NLP) and natural language inferencing (NLI) tasks which give them a natural ability to perform zero-shot text classification. The model can generate data based on semantic descriptions because it’s trained on a large corpus of data. Like LLMs, foundation models use a transformer architecture that enables them to classify labels without any specific training data for a classification task. This is possible because of the models’ ability to perform self-supervised learning and transfer learning to classify data into unseen classes. To the advantage of data science, this approach eliminates the requirement for large datasets with human-annotated labels because it automates the preprocessing portion of the classification pipeline.