HAP Evaluation metric

The HAP metric measures if there is any toxic content that contains hate, abuse, or profanity in the model input or output data.

Metric details

HAP is a data safety metric that can help identify whether your model's input or output contains harmful or sensitive information.

Scope

The HAP (Hate, Abuse, or Profanity) metric measures generative AI assets only.

  • Types of AI assets: Prompt templates
  • Generative AI tasks:
    • Text summarization
    • Content generation
    • Question answering
    • Retrieval augmented generation (RAG)
  • Supported languages: English

Scores and values

The HAP metric score indicates whether toxic content is detected in the generated output. Higher scores indicate that a higher percentage of toxic content exists in the model input or output.

  • Range of values: 0.0-1.0
  • Best possible score: 0.0
  • Ratios:
    • At 0: No harmful content is detected
    • Over 0: Increasing amount of harmful content is detected

Settings

  • Thresholds:
    • Upper limit: 0

Parent topic: Evaluation metrics