In this tutorial, we will use the IBM Granite-3.0-8B-Instruct model now available on watsonx.ai™ to perform zero-shot classification and apply it to improve a department's workflow.
Zero-shot classification uses zero-shot prompting, a prompt engineering technique that allows a model to perform a task without any specific training or examples. This is an application of zero-shot learning (ZSL) which is a machine learning method that relies on the pretrained models’ ability to recognize and categorize objects or concepts on post training. ZSL is similar to few-shot learning (FSL), the ability for a model to make an accurate prediction by training on a small number of labeled examples. Both techniques are used to enable models to perform tasks that they haven’t been explicitly trained on.
Researchers have been experimenting with machine learning models for classification tasks since the 50s. The Perceptron is an early classification model that uses a decision-boundary to classify data into different groups. Many believe that the concept behind the model sparked interest in artificial intelligence, influencing deep learning algorithms that can classify objects or translate languages. However, most ML/DL methods rely on supervised learning techniques to classify labels and therefore need to be trained using a large amount of task-specific labeled training data. This presents a challenge as the large, annotated datasets required to train these models simply do not exist for every domain. Some researchers motivated by these constraints say that large language models (LLMs) are the way around these data limitations.
LLMs are designed to perform natural language processing (NLP) and natural language inferencing (NLI) tasks which give them a natural ability to perform zero-shot text classification. The model can generate data based on semantic descriptions because it’s trained on a large corpus of data. Like LLMs, foundation models use a transformer architecture that enables them to classify labels without any specific training data for a classification task. This is possible because of the models’ ability to perform self-supervised learning and transfer learning to classify data into unseen classes. To the advantage of data science, this approach eliminates the requirement for large datasets with human-annotated labels because it automates the preprocessing portion of the classification pipeline.
Foundation models are built on the transformer architecture that can take raw text at scale through its attention mechanism and understand how words relate to each other to form a statistical representation of language. The transformer is a type of neural network architecture designed to interpret meaningful representations of sequences or collections of data points. This capability is the reason why these models perform so well on NLP tasks.
The transformer model architecture includes an encoder-decoder structure and self-attention mechanism that allows the model to draw connections between input and output using an autoregressive prediction. The encoder processes the tokenized input data into embeddings that represent the data in a format the model can read. The decoder interprets the embeddings to generate an output. The self-attention mechanism computes the weights for each word, or token, in a sentence based on its relationship to every other word in the sentence. This allows the model to take the semantic and syntactic relationships between words. The self-attention mechanism is integral for entailment, an NLI task that heavily relies on the self-attention mechanism because it helps the model understand the context within text data.
Choosing the right model for your zero-shot classification depends on your classification task. It’s no surprise that there is an abundance to choose from, let’s consider three types of models:
To follow this tutorial, you need an IBM Cloud® account to create a watsonx.ai project.
While you can choose from several tools, this tutorial walks you through how to set up an IBM account to use a Jupyter Notebook. Jupyter Notebooks are widely used within data science to combine code, text, images and data visualizations to formulate a well-formed analysis.
Take note of the project ID in project > Manage > General > Project ID.
You’ll need this ID for this tutorial.
3. Create a Jupyter Notebook.
This step will open a notebook environment where you can copy the code from this tutorial to perform zero-shot classification on your own. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. This Jupyter Notebook is available on GitHub.
In this step, you associate your project with the watsonx.ai Runtime service.
We'll need some libraries and modules for this tutorial. Make sure to import the following ones, and if they're not installed, you can resolve this with a quick pip installation.
!pip install -U langchain_ibmRun the following to input and save your watsonx.ai runtime API key and project ID:
import getpassNext, we'll set up the IBM Granite-3.0-8B-Instruct model to perform zero-shot classification.
model = WatsonxLLM(Now that the model is prepared to perform zero-shot classification, let's define a prompt. Imagine a scenario where it's imperative to triage certain data, perhaps an IT department's flooded inbox full of user-described technical issues. In this example, the model is asked to classify an IT issue as belonging to either the class "High" or "Low," indicating the priority of the issue. The prompt should showcase the model’s ability to classify the priority of IT issues immediately upon use.
The code block below sets up and defines the prompt that the model will respond to. The prompt can be any input, but let's try out the example first. Run the code block to define your user prompt along with some example input text.
def generate_text(prompt):Once the prompt is defined, we can run the next block to allow the model to predict and print its output.
# Generate and print the text based on the defined promptIn this example, the model correctly infers the classification label "high" based on its ability to understand the critical impact of the inability to upload files for users.
Let's apply zero-shot classification to a different aspect of a department's everyday workflow. The same IT department used in the preceding example has a backlog of customer support reviews that need organized and analyzed. The organization feels the best way to accomplish this is to classify them based on sentiment: "Positive," "Negative," "Neutral."
Run the following code block with the defined prompt and customer review to classify the sentiment of the text.
# Define the prompt hereThe model is able to perform sentiment analysis and classify the review correctly as "Negative." This capability can be useful for a variety of domains, not just IT. Try out your own prompts to explore how you might use zero-shot classification to automate time-consuming tasks.
In this tutorial we set up the IBM 3-8B-Instruct model to perform zero-shot classification. Then we defined a user prompt and scenario to perform zero-shot classification. We tested out two examples including one semantic and one sentiment analysis.
Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with ease and build AI applications in a fraction of the time with a fraction of the data.
Redefine how you work with AI for business. IBM Consulting™ is working with global clients and partners to co-create what’s next in AI. Our diverse, global team of more than 20,000 AI experts can help you quickly and confidently design and scale cutting edge AI solutions and automation across your business.
IBM’s artificial intelligence solutions help you build the future of your business. These include: IBM watsonx, our AI and data platform and portfolio of AI-powered assistants; IBM Granite, our family of open-sourced, high-performing and cost-efficient models trained on trusted enterprise data; IBM Consulting, our AI services to redesign workflows; and our hybrid cloud offerings that enable AI-ready infrastructure to better scale AI.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com, openliberty.io