The Granite Guardian models are a collection of models designed to detect risks in prompts and responses. Trained on instruction fine-tuned Granite languages models, these models can help with risk detection along many key dimensions catalogued in the IBM AI Risk Atlas. These models are trained on unique data comprising human annotations from socioeconomically diverse people and synthetic data informed by internal red-teaming.
Granite Guardian is provided in six different models with two architectures:
Detecting harm-related risks within prompt text or model response (as guardrails). These present two fundamentally different use cases as the former assesses user-supplied text while the latter evaluates model-generated text.
RAG (retrieval-augmented generation) use-case where the guardian model assesses three key issues: context relevance (whether the retrieved context is relevant to the query), groundedness (whether the response is accurate and faithful to the provided context),and answer relevance (whether the response directly addresses the user’s query).
Function calling risk detection within agentic workflows, where Granite Guardian evaluates intermediate steps for syntactic and semantic hallucinations. This includes assessing the validity of function calls and detecting fabricated information, particularly during query translation.
Ideal for edge devices
Ideal for limited computational power and resources, faster training times
2B/3B
8B/5B