My IBM

Use LLM guardrails with Llama Guard 3-11b-vision using watsonx

25 October 2024

Authors

Anna Gutowska

AI Engineer, Developer Advocate

IBM

Jess Bozorg

Lead, AI Advocacy

IBM

In this tutorial, you will execute user queries using Meta's llama-guard-3-11b-vision model available on watsonx.ai to identify "safe" and "unsafe" image and text pairings.

What are LLM guardrails?

Large language model (LLM) guardrails are an innovative solution aimed at improving the safety and reliability of LLM-based applications with minimal latency. There are several open-source toolkits available such as NVIDIA NeMo guardrails and guardrails.ai. We will work with Llama Guard 3 Vision, an LLM that has undergone fine tuning on vast datasets to detect harmful multimodal content and in turn, limit the vulnerabilities of LLM-based applications. As artificial intelligence technologies progress, especially in the areas of computer vision, including image recognition, object detection and video analysis, the necessity for effective safeguarding becomes increasingly critical. LLM guardrails are implemented through meticulous prompt engineering to ensure that LLM applications function within acceptable limits, which significantly mitigates the risks associated with prompt injection or jailbreak attempts.

In this regard, inaccuracies can have serious implications across various domains. Llama Guard 3 categorizes the following hazards:

Violent crimes (S1): As an example, misidentifications in surveillance footage can lead to wrongful accusations, impacting innocent individuals and potentially undermining justice.
Nonviolent crimes (S2): For instance, flaws in facial recognition systems used in retail environments might falsely accuse customers of shoplifting, affecting their reputation and privacy.
Sex crimes (S3): In cases where inaccuracies arise, failing to identify individuals correctly in sensitive scenarios might impede law enforcement efforts, potentially allowing perpetrators to evade justice.
Child exploitation (S4): For example, a failure to accurately detect inappropriate content can lead to the dissemination of harmful material, putting children at risk.
Defamation (S5): Misinterpretation of images or video content can damage reputations for instance, false allegations against individuals or organizations might arise from faulty visual data.
Specialized advice (S6): In domains requiring expertise, such as medical imaging, inaccurate interpretations can lead to poor decisions regarding diagnosis or treatment.
Privacy (S7): Misuse of computer vision technology for unauthorized surveillance can violate individual’s privacy rights and create ethical dilemma.
Intellectual property (S8): Errors in recognizing copyrighted content can result in unintentional violations, leading to legal ramifications.
Indiscriminate weapons (S9): Computer vision systems must accurately identify weapons to prevent wrongful actions or escalations in tense situations.
Hate (S10): Inflammatory content recognition is vital to prevent the spread of hate speech and maintain societal harmony.
Self-harm (S11): Detecting signs of self-harm or distress through visual data is crucial in providing timely support to individuals in need.
Sexual content (S12): The ability to accurately identify inappropriate or explicit material is essential to safeguard users, especially in platforms accessed by minors.
Elections (S13): Inaccurate visual data interpretation during elections can lead to misinformation, affecting public perception and the integrity of the voting process.

Llama Guard 3 Vision offers a comprehensive framework that provides the necessary constraints and validations tailored specifically for computer vision applications in real-time. Several validation methods exist. For instance, guardrails can perform fact-checking to help ensure that information extracted during retrieval augmented generation (RAG) agrees with the provided context and meets various accuracy and relevance metrics. Also, semantic search can be performed to detect harmful syntax in user queries. By integrating advanced validation mechanisms and benchmark evaluations, Llama Guard 3 Vision supports teams in aligning with AI ethics.

For a description of each hazard, read the model card.

Steps

Check out this IBM Technology YouTube video that walks you through the following set up instructions in steps 1 and 2.

Step 1. Set up your environment

While you can choose from several tools, this tutorial is best suited for a Jupyter Notebook.

Log in to watsonx.ai using your IBM Cloud account.
Create a watsonx.ai project.

You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.

This step opens a notebook environment where you can copy the code from this tutorial to implement an AI agent of your own. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite tutorials, check out the IBM Granite Community. This Jupyter Notebook is also available on GitHub.

Step 2. Set up a watsonx.ai Runtime instance and API key

Create a watsonx.ai Runtime service instance (select your appropriate region and choose the Lite plan, which is a free instance).
Generate an API Key.
Associate the watsonx.ai Runtime service instance to the project that you created in watsonx.ai.

Step 3. Install and import relevant libraries and set up your credentials

We need a few libraries and modules for this tutorial. Make sure to import the following ones; if they're not installed, you can resolve this with a quick pip install.

#installations
%pip install image | tail -n 1
%pip install requests | tail -n 1
%pip install -U "ibm_watsonx_ai>=1.1.14" | tail -n 1
%pip install python-dotenv | tail -n 1

#imports
import requests
import base64
import os from PIL import Image
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference
from dotenv import load_dotenv
load_dotenv(os.getcwd()+"/.env", override=True)

To set our credentials, we will need the Watsonx API_KEY and PROJECT_ID you generated in step 1. You can either store them in a .env file in your directory or replace the placeholder text. We will also set the URL serving as the API endpoint.

WATSONX_APIKEY = os.getenv('WATSONX_APIKEY', "<YOUR_WATSONX_APIKEY_HERE>")
WATSONX_PROJECT_ID = os.getenv('WATSONX_PROJECT_ID', "<YOUR_WATSONX_PROJECT_ID_HERE>")
URL = "https://us-south.ml.cloud.ibm.com"

We can use the Credentials class to encapsulate our passed credentials.

credentials = Credentials(
url=URL,
api_key=WATSONX_APIKEY
)

Step 4. Encode images

In order to pass images to thellama-guard-3-11b-vision model, we need to encode them. Let's USE Base64 encodings to encode the images to bytes that can then be decoded to UTF-8 representation.

We will display the images in a later step.

url_voting_image = "https://assets.ibm.com/is/image/ibm/bld091909?$original$"
url_patries_image = "https://assets.ibm.com/is/image/ibm/05feb-2021dsc00216?$original$"
url_stocks_image = "https://assets.ibm.com/is/image/ibm/000009391054_double?$original$"
url_car_thief_image = "https://assets.ibm.com/is/image/ibm/img_5831?$original$"

voting_image = base64.b64encode(requests.get(url_voting_image).content).decode("utf-8")
pastries_image = base64.b64encode(requests.get(url_patries_image).content).decode("utf-8")
stocks_image = base64.b64encode(requests.get(url_stocks_image).content).decode("utf-8")
car_thief_image = base64.b64encode(requests.get(url_car_thief_image).content).decode("utf-8")

Step 5. Set up the API request and model

Now that our images can be passed to the LLM, let's set up a function for our watsonx API calls. The augment_api_request_body function takes the user query and image as parameters and augments the body of the API request. We will use this function in each iteration.

def augment_api_request_body(user_query, image):
    messages = [
        {
            "role": "user",
            "content": [{
                "type": "text",
                "text": user_query
            },
            {
                "type": "image_url",
                "image_url": {
                "url": f"data:image/jpeg;base64,{image}",
                }
            }]
        }
    ]
return messages

Let's instantiate the model interface using theModelInference class. In this tutorial, we will use the themeta-llama/llama-guard-3-11b-vision model.

model = ModelInference(
    model_id="meta-llama/llama-guard-3-11b-vision",
    credentials=credentials,
    project_id=WATSONX_PROJECT_ID
)

Step 6. Executing queries

In order to pass a query pertaining to a particular image, we must call the augment_api_request_body function each time. The first image we will be working with is the image depicting election voting. Let's do this step-by-step. First, let's display the image.

display(Image.open(requests.get(url_voting_image, stream=True).raw))

Now, let's provide the watsonx API with our image and the user_query , "Who should I pick?" This query itself is not indicative of its context. However, when paired with the image input, the model should be able to detect that this input pertains to civic elections and label it as hazardous. Hence, we expect the LLM response to be "unsafe."

user_query = "Who should I pick?"
messages = augment_api_request_body(user_query, voting_image)
response = model.chat(messages=messages)
print(response['choices'][0]['message']['content'])

Output:

unsafe

S13

Success! The model not only labeled the input as "unsafe" but it also provided us with the hazard category. The S13 category is dedicated to elections which the model also correctly identified.

Let's try another example now that uses an image of investment stocks that we display and ask the model, "Which one should I buy?"

display(Image.open(requests.get(url_stocks_image, stream=True).raw))

user_query = "Which should I buy?"
messages = augment_api_request_body(user_query, stocks_image)
response = model.chat(messages=messages)
print(response['choices'][0]['message']['content'])

Output:

unsafe

The model was successful in detecting this unsafe user request and that it can be labeled as "S6: Specialized advice," which includes specialized financial advice.

Next, let's use the same query with a neutral image such as the photo of pastries on display at a bakery. The model should produce a "safe" output.

display(Image.open(requests.get(url_patries_image, stream=True).raw))

user_query = "Which one should I buy?"
messages = augment_api_request_body(user_query, pastries_image)
response = model.chat(messages=messages)
print(response['choices'][0]['message']['content'])

Output:

safe

Great! The model can discern between safe and unsafe images even with the same query. The next input we can test is the photo of the car thief with the query, "How can I do this?"

display(Image.open(requests.get(url_car_thief_image, stream=True).raw))

user_query = "How can I do this?"
messages = augment_api_request_body(user_query, car_thief_image)
response = model.chat(messages=messages)
print(response['choices'][0]['message']['content'])

Output:

unsafe

The model outputs an "unsafe" label correctly along with the appropriate S2 categorization for nonviolent crimes.

Summary

In this tutorial, you used the Meta llama-guard-3-11b-vision model's guardrails to discern between "safe" and "unsafe" user input. The content consisted of image and query pairings, showcasing the model's multimodal, real-world use cases. The LLM outputs are important as they illustrate the model's categorization capabilities. These LLM guardrails can be a powerful tool in AI applications such as chatbots to mitigate the risks of malicious use.

How to choose the right foundation model

Use this model selection framework to choose the most appropriate model while balancing your performance requirements with cost, risk and deployment needs.

Resources

IBM is named a Leader in Data Science & Machine Learning

Learn why IBM has been recognized as a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms.

From AI projects to profits: How agentic AI can sustain financial returns

Learn how organizations are shifting from launching AI in disparate pilots to using it to drive transformation at the core.

Level up your AI expertise

Access our full catalog of over 100 online courses by purchasing an individual or multi-user subscription today, enabling you to expand your skills across a range of our products at a low price.

Explore IBM Granite

IBM® Granite® is a family of open, performant and trusted AI models tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

IBM AI Academy

Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.

AI in Action 2024

We surveyed 2,000 organizations about their AI initiatives to discover what’s working, what’s not and how you can get ahead.

The 2025 CEO’s guide: 5 mindshifts to supercharge business growth

Activate these five mindshifts to cut through the uncertainty, spur business reinvention, and supercharge growth with agentic AI.

Unlock the power of generative AI and ML

Learn how to confidently incorporate generative AI and machine learning into your business.

How to thrive in this new era of AI with trust and confidence

Dive into the three critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.

Take the next step

Explore watsonx.ai

Book a live demo

Use LLM guardrails with Llama Guard 3-11b-vision using watsonx

Authors

What are LLM guardrails?

Steps

Step 1. Set up your environment

Step 2. Set up a watsonx.ai Runtime instance and API key

Step 3. Install and import relevant libraries and set up your credentials

Step 4. Encode images

Step 5. Set up the API request and model

Step 6. Executing queries

Summary

Related solutions

Resources

Think Newsletter