My IBM

Perform zero-shot classification with a foundation model

4 December 2024

Vanna Winland

AI Advocate & Technology Writer

What is zero-shot classification?

In this tutorial, we will use the IBM Granite-3.0-8B-Instruct model now available on watsonx.ai™ to perform zero-shot classification and apply it to improve a department's workflow.

Overview of zero-shot classification

Zero-shot classification uses zero-shot prompting, a prompt engineering technique that allows a model to perform a task without any specific training or examples. This is an application of zero-shot learning (ZSL) which is a machine learning method that relies on the pretrained models’ ability to recognize and categorize objects or concepts on post training. ZSL is similar to few-shot learning (FSL), the ability for a model to make an accurate prediction by training on a small number of labeled examples. Both techniques are used to enable models to perform tasks that they haven’t been explicitly trained on.

Researchers have been experimenting with machine learning models for classification tasks since the 50s. The Perceptron is an early classification model that uses a decision-boundary to classify data into different groups. Many believe that the concept behind the model sparked interest in artificial intelligence, influencing deep learning algorithms that can classify objects or translate languages. However, most ML/DL methods rely on supervised learning techniques to classify labels and therefore need to be trained using a large amount of task-specific labeled training data. This presents a challenge as the large, annotated datasets required to train these models simply do not exist for every domain. Some researchers motivated by these constraints say that large language models (LLMs) are the way around these data limitations.

LLMs are designed to perform natural language processing (NLP) and natural language inferencing (NLI) tasks which give them a natural ability to perform zero-shot text classification. The model can generate data based on semantic descriptions because it’s trained on a large corpus of data. Like LLMs, foundation models use a transformer architecture that enables them to classify labels without any specific training data for a classification task. This is possible because of the models’ ability to perform self-supervised learning and transfer learning to classify data into unseen classes. To the advantage of data science, this approach eliminates the requirement for large datasets with human-annotated labels because it automates the preprocessing portion of the classification pipeline.

How foundation models perform zero-shot classification

Foundation models are built on the transformer architecture that can take raw text at scale through its attention mechanism and understand how words relate to each other to form a statistical representation of language. The transformer is a type of neural network architecture designed to interpret meaningful representations of sequences or collections of data points. This capability is the reason why these models perform so well on NLP tasks.

The transformer model architecture includes an encoder-decoder structure and self-attention mechanism that allows the model to draw connections between input and output using an autoregressive prediction. The encoder processes the tokenized input data into embeddings that represent the data in a format the model can read. The decoder interprets the embeddings to generate an output. The self-attention mechanism computes the weights for each word, or token, in a sentence based on its relationship to every other word in the sentence. This allows the model to take the semantic and syntactic relationships between words. The self-attention mechanism is integral for entailment, an NLI task that heavily relies on the self-attention mechanism because it helps the model understand the context within text data.

What models are best for zero-shot classification?

Choosing the right model for your zero-shot classification depends on your classification task. It’s no surprise that there is an abundance to choose from, let’s consider three types of models:

Zero-shot classification model is a type of model that can classify data into categories without task-specific labeled examples during the prediction phase. The models rely on training data, which is normally a large-scale, general dataset to classify new, unseen classes. One of the most popular zero-shot classification models for zero-shot text classification is HuggingFace’s facebook/bart-large-mnli model based on the BART-large transformer model architecture. Zero-shot classification models perform well on generalized tasks but because there is no fine-tuning on specific tasks or datasets, accuracy might be limited. Because of this limitation, the model requires well-formulated prompts.
Large language models (LLMs) such as GPT and BERT are designed to perform a variety of natural language processing tasks. This can limit use cases using multimodal datasets such as images and audio. These models are designed to handle text data, often that use deep learning architectures such as transformers. LLMs are trained on a large corpus of text data, giving them extensive knowledge of language, syntax, semantics and some domain-specific knowledge. These models generally perform well with little to no task-specific fine-tuning, making them suitable for zero-shot and few-shot classification scenarios. Due to their generalized training, LLMs might have limited accuracy for specialized tasks especially ones that require domain-specific data. These models are best to work with when the dataset is text-based.
Foundation models are multimodal, meaning they are often trained on a myriad of modalities including text, images and speech. These models are generally versatile and after pretraining can undergo optimization for many different tasks. IBM® Granite™ models classify data by using a large language model (LLM) trained on a curated dataset of business-relevant information, including legal, financial and technical domains. This allows models to analyze text and identify patterns to categorize data into specific classes based on the context and semantic meaning within the text. Because of their multimodal capacity, these types of models can handle image and text classification. These models are ideal when you need a broad range of capabilities or want to handle multiple types of data, for instance, image or audio classification.

Use cases

Image classification: Computer vision is a type of artificial intelligence (AI) that allows the use of machine learning models to analyze images and videos. A crucial task within computer vision is image classification which involves labeling and categorizing groups of pixels within an image. Image classification is used in many domains such as social media for photo tagging, self-driving cars and even healthcare.

Text classification: NLP uses text classification to enable the models understanding of human language. Text classification is used for many NLP tasks such as sentiment analysis, similarity scoring, key phrase detection and much more. A popular use case, and one we’ll be exploring in this tutorial, is for customer service analysis.

Audio classification: The goal of audio classification is to use the model to recognize and distinguish between audio recordings so that it can perform sound categorization. This form of classification is used in smart home and security systems and technologies such as text to speech applications.

Prerequisites

To follow this tutorial, you need an IBM Cloud® account to create a watsonx.ai project.

Steps

Step 1: Set up your environment

While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook. Jupyter Notebooks are widely used within data science to combine code, text, images and data visualizations to formulate a well-formed analysis.

Log in to watsonx.ai using your IBM Cloud account.
Create a watsonx.ai project.

Take note of the project ID in project > Manage > General > Project ID.
You’ll need this ID for this tutorial.

3. Create a Jupyter Notebook.

This step will open a notebook environment where you can copy the code from this tutorial to perform zero-shot classification on your own. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. This Jupyter Notebook is available on GitHub.

Step 2: Set up watsonx.ai Runtime service instance and API key

In this step, you associate your project with the watsonx.ai Runtime service.

Create a watsonx.ai Runtime service instance (choose the Lite plan, which is a free instance).
Generate an API Key.
Associate the watsonx.ai Runtime service to the project you created in watsonx.ai.

Step 3: Install and import relevant libraries and set up your credentials

We'll need some libraries and modules for this tutorial. Make sure to import the following ones, and if they're not installed, you can resolve this with a quick pip installation.

!pip install -U langchain_ibm
!pip install "ibm-watson-machine-learning>=1.0.327"

from langchain_ibm import WatsonxLLM
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams

Step 4: Set up your watsonx™ credentials

Run the following to input and save your watsonx.ai runtime API key and project ID:

import getpass
from langchain_ibm import WatsonxLLM

credentials = {
    "url": "https://us-south.ml.cloud.ibm.com",
    "apikey": getpass.getpass("Please enter your watsonx.ai Runtime API key (hit enter): "),
    "project_id": getpass.getpass("Please enter your project ID (hit enter): "),
}

Step 5: Set up the model for zero-shot classification

Next, we'll set up the IBM Granite-3.0-8B-Instruct model to perform zero-shot classification.

model = WatsonxLLM(
      model_id = "ibm/granite-3-8b-instruct",
      url = credentials.get("url"),
      apikey = credentials.get("apikey"),
      project_id = credentials.get("project_id"),
      params={
        # GenParams.DECODING_METHOD: 'greedy',
        GenParams.MAX_NEW_TOKENS: 500,
        GenParams.MIN_NEW_TOKENS: 1,
        GenParams.REPETITION_PENALTY: 1.1,
        GenParams.STOP_SEQUENCES: [], # Leave this empty if not needed
        GenParams.TEMPERATURE: 0.7, # Adjust for variable responses
        GenParams.TOP_K: 100,
        GenParams.TOP_P: 0,
    },
      )

Step 6: Define the prompt

Now that the model is prepared to perform zero-shot classification, let's define a prompt. Imagine a scenario where it's imperative to triage certain data, perhaps an IT department's flooded inbox full of user-described technical issues. In this example, the model is asked to classify an IT issue as belonging to either the class "High" or "Low," indicating the priority of the issue. The prompt should showcase the model’s ability to classify the priority of IT issues immediately upon use.

The code block below sets up and defines the prompt that the model will respond to. The prompt can be any input, but let's try out the example first. Run the code block to define your user prompt along with some example input text.

def generate_text(prompt):
    response = None # Ensure the variable is defined before the try block
    try:
        response = model.generate([prompt])
        return str(response)
    except Exception as e:
        print(f"Error: {e}")
        if response:
            print(f"Response: {response}")
            return None
    # Define the prompt here
defined_prompt = "Set the class name for the issue described to either: high or low. Issue: Users are reporting that they are unable to upload files."

Step 7: Perform zero-shot classification

Once the prompt is defined, we can run the next block to allow the model to predict and print its output.

# Generate and print the text based on the defined prompt
generated_text = generate_text(defined_prompt)
print("Generated text:", generated_text)

In this example, the model correctly infers the classification label "high" based on its ability to understand the critical impact of the inability to upload files for users.

Classify service reviews based on sentiment classification

Let's apply zero-shot classification to a different aspect of a department's everyday workflow. The same IT department used in the preceding example has a backlog of customer support reviews that need organized and analyzed. The organization feels the best way to accomplish this is to classify them based on sentiment: "Positive," "Negative," "Neutral."

Run the following code block with the defined prompt and customer review to classify the sentiment of the text.

# Define the prompt here
defined_prompt = "Classify the following customer review as 'Positive', 'Negative', 'Neutral': Customer review: 'My IT issue was not resolved.'"

# Generate and print the text based on the defined prompt
generated_text = generate_text(defined_prompt)
print("Generated text:", generated_text)

The model is able to perform sentiment analysis and classify the review correctly as "Negative." This capability can be useful for a variety of domains, not just IT. Try out your own prompts to explore how you might use zero-shot classification to automate time-consuming tasks.

Summary

In this tutorial we set up the IBM 3-8B-Instruct model to perform zero-shot classification. Then we defined a user prompt and scenario to perform zero-shot classification. We tested out two examples including one semantic and one sentiment analysis.

How to choose the right foundation model

Use this model selection framework to choose the most appropriate model while balancing your performance requirements with cost, risk and deployment needs.

Resources

What is an encoder-decoder model?