This recipe showcases a Granite Guardian model designed to detect hate, abuse, and profanity, either in a prompt or LLM output. This is an example of a “guard rail” used in generative AI applications for safety. The model used in this recipe has been fine-tuned on several English HAP benchmarks and utilizes the slate.38m.english.distilled base model. You will need a Hugging Face token to run this recipe in Colab. Instructions for obtaining this credential can be found here.Documentation Index
Fetch the complete documentation index at: https://wwwpoc.ibm.com/llms.txt
Use this file to discover all available pages before exploring further.
Get started
Explore sample code in a GitHub repo
Try it out
Execute sample code in Colab