Configuring AI guardrails in watsonx.ai

You can set AI guardrails in watsonx.ai to moderate the input text provided to a foundation model and the output generated by the model in multiple ways.

You can remove harmful content when you're working with foundation models in watsonx.ai with the following methods:

Configuring AI guardrails in the Prompt Lab

To remove harmful content when you're working with foundation models in the Prompt Lab, set the AI guardrails switcher to On.

The AI guardrails feature is enabled automatically for all natural language foundation models in English.

To configure AI guardrails in the Prompt Lab, complete the following steps:

  1. With AI guardrails enabled, Click the AI guardrails settings icon AI guardrails settings icon.

  2. You can configure different filters to apply to the user input and model output and adjust the filter sensitivity, if applicable.

    • HAP filter

      To disable AI guardrails, set the HAP slider to 1. To change the sensitivity of the guardrails, move the HAP sliders.

    • PII filter

      To enable the PII filter, set the PII switcher to On.

    • Granite Guardian model as a filter

      Granite Guardian moderation is disabled by default. To change the sensitivity of the guardrails, move the Granite Guardian sliders.

    Experiment with adjusting the sliders to find the best settings for your needs.

  3. Click Save.

Configuring AI guardrails programmatically

You can set AI guardrails programmatically to moderate the input text provided to a foundation model and the output generated by the model in multiple ways.

REST API

You can use the following watsonx.ai API endpoints to configure and apply AI guardrails to natural language input and output text:


Note: The Granite Guardian filter only supports flagging jailbreaking content.

Python

You can use the watsonx.ai Python SDK to configure and apply AI guardrails to natural language input and output text in the following ways:

For more information, see watsonx.ai Python SDK.

Learn more