HAP Detection

Use Granite Guardian to detect Hate, Abuse, and Profanity (HAP).

This recipe showcases a Granite Guardian model designed to detect hate, abuse, and profanity, either in a prompt or LLM output. This is an example of a “guard rail” used in generative AI applications for safety.

The model used in this recipe has been fine-tuned on several English HAP benchmarks and utilizes the slate.38m.english.distilled base model.

You will need a Hugging Face token to run this recipe in Colab. Instructions for obtaining this credential can be found here.

warning

HAP Detection examples may contain profanities.

Get started

Explore sample code in a GitHub repo

Try it out

Execute sample code in Colab