Configuring AI guardrails in watsonx.ai

You can set AI guardrails in watsonx.ai to moderate the input text provided to a foundation model and the output generated by the model in multiple ways.

You can remove harmful content when you're working with foundation models in watsonx.ai with the following methods:

From the Prompt Lab. For details, see Configuring AI guardrails in the Prompt Lab
Programmatically with the following methods:
- REST API
- Python

Configuring AI guardrails in the Prompt Lab

To remove harmful content when you're working with foundation models in the Prompt Lab, set the AI guardrails switcher to On.

The AI guardrails feature is enabled automatically for all natural language foundation models in English.

To configure AI guardrails in the Prompt Lab, complete the following steps:

With AI guardrails enabled, Click the AI guardrails settings icon .
You can configure different filters to apply to the user input and model output and adjust the filter sensitivity, if applicable.
- HAP filter
  
  To disable AI guardrails, set the HAP slider to 1. To change the sensitivity of the guardrails, move the HAP sliders.
- PII filter
  
  To enable the PII filter, set the PII switcher to On.
- Granite Guardian model as a filter
  
  Granite Guardian moderation is disabled by default. To change the sensitivity of the guardrails, move the Granite Guardian sliders.
Experiment with adjusting the sliders to find the best settings for your needs.
Click Save.

Configuring AI guardrails programmatically

You can set AI guardrails programmatically to moderate the input text provided to a foundation model and the output generated by the model in multiple ways.

REST API

You can use the following watsonx.ai API endpoints to configure and apply AI guardrails to natural language input and output text:

When you inference a foundation model by using the text generation API, you can use the moderations field to apply filters to the foundation model input and output. For more information, see Text generation in the watsonx.ai API reference documentation.

Note: The Granite Guardian filter only supports flagging jailbreaking content.

Python

You can use the watsonx.ai Python SDK to configure and apply AI guardrails to natural language input and output text in the following ways:

Adjust the AI guardrails filters with the Python library when you inference the foundation model by using the text generation API. For details, see Inferencing a foundation model programmatically (Python).

For more information, see watsonx.ai Python SDK.

Configuring AI guardrails in watsonx.ai

Configuring AI guardrails in the Prompt Lab

Configuring AI guardrails programmatically

REST API

Python

Learn more