25 October 2024
In this tutorial, you will execute user queries using Meta's llama-guard-3-11b-vision model available on watsonx.ai to identify "safe" and "unsafe" image and text pairings.
Large language model (LLM) guardrails are an innovative solution aimed at improving the safety and reliability of LLM-based applications with minimal latency. There are several open-source toolkits available such as NVIDIA NeMo guardrails and guardrails.ai. We will work with Llama Guard 3 Vision, an LLM that has undergone fine tuning on vast datasets to detect harmful multimodal content and in turn, limit the vulnerabilities of LLM-based applications. As artificial intelligence technologies progress, especially in the areas of computer vision, including image recognition, object detection and video analysis, the necessity for effective safeguarding becomes increasingly critical. LLM guardrails are implemented through meticulous prompt engineering to ensure that LLM applications function within acceptable limits, which significantly mitigates the risks associated with prompt injection or jailbreak attempts.
In this regard, inaccuracies can have serious implications across various domains. Llama Guard 3 categorizes the following hazards:
Llama Guard 3 Vision offers a comprehensive framework that provides the necessary constraints and validations tailored specifically for computer vision applications in real-time. Several validation methods exist. For instance, guardrails can perform fact-checking to help ensure that information extracted during retrieval augmented generation (RAG) agrees with the provided context and meets various accuracy and relevance metrics. Also, semantic search can be performed to detect harmful syntax in user queries. By integrating advanced validation mechanisms and benchmark evaluations, Llama Guard 3 Vision supports teams in aligning with AI ethics.
For a description of each hazard, read the model card.
Check out this IBM Technology YouTube video that walks you through the following set up instructions in steps 1 and 2.
While you can choose from several tools, this tutorial is best suited for a Jupyter Notebook.
Log in to watsonx.ai using your IBM Cloud account.
Create a watsonx.ai project.
You can get your project ID from within your project. Click the Manage tab. Then, copy the project ID from the Details section of the General page. You need this ID for this tutorial.
Create a Jupyter Notebook.
This step opens a notebook environment where you can copy the code from this tutorial to implement an AI agent of your own. Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset. To view more Granite tutorials, check out the IBM Granite Community. This Jupyter Notebook is also available on GitHub.
We need a few libraries and modules for this tutorial. Make sure to import the following ones; if they're not installed, you can resolve this with a quick pip install.
To set our credentials, we will need the Watsonx
We can use the
In order to pass images to the
We will display the images in a later step.
Now that our images can be passed to the LLM, let's set up a function for our watsonx API calls. The
Let's instantiate the model interface using the
In order to pass a query pertaining to a particular image, we must call the
Now, let's provide the watsonx API with our image and the
Output:
unsafe
S13
Success! The model not only labeled the input as "unsafe" but it also provided us with the hazard category. The S13 category is dedicated to elections which the model also correctly identified.
Let's try another example now that uses an image of investment stocks that we display and ask the model, "Which one should I buy?"
Output:
unsafe
S6
The model was successful in detecting this unsafe user request and that it can be labeled as "S6: Specialized advice," which includes specialized financial advice.
Next, let's use the same query with a neutral image such as the photo of pastries on display at a bakery. The model should produce a "safe" output.
Output:
safe
Great! The model can discern between safe and unsafe images even with the same query. The next input we can test is the photo of the car thief with the query, "How can I do this?"
Output:
unsafe
S2
The model outputs an "unsafe" label correctly along with the appropriate S2 categorization for nonviolent crimes.
In this tutorial, you used the Meta
Direct, manage and monitor your AI using a single platform to speed responsible, transparent, explainable AI
IBM Consulting™ helps you weave responsible AI governance into the fabric of your business.
IBM’s artificial intelligence solutions help you build the future of your business. These include: IBM® watsonx™, our AI and data platform and portfolio of AI-powered assistants; IBM® Granite™, our family of open-sourced, high-performing and cost-efficient models trained on trusted enterprise data; IBM Consulting, our AI services to redesign workflows; and our hybrid cloud offerings that enable AI-ready infrastructure to better scale AI.