Prompt leakage risk evaluation metric
The prompt leakage risk metric measures the risk of leaking the prompt template by calculating the similarity between the leaked prompt template and original prompt template.
Metric details
Prompt leakage risk is a metric that measures how robust a prompt template is against leakage attacks. The metric is available only when you use the Python SDK to calculate evaluation metrics.
Scope
The prompt leakage risk metric evaluates generative AI assets only.
- Types of AI assets: Prompt templates
- Generative AI tasks:
- Text classification
- Text summarization
- Content generation
- Question answering
- Entity extraction
- Retrieval augmented generation (RAG)
- Supported languages: English, French
Scores and values
The prompt leakage risk metric score indicates how robust your prompt template is against leakage attacks.
- Range of values: 0.0-1.0
- Best possible score: 1.0
- Ratios:
- At 0: The prompt template is robust against leakage attacks.
- Over 0: The prompt template is vulnerable to prompt leaking attacks.
Settings
- Thresholds:
- Lower bound: 0
- Upper bound: 1
Evaluation process
The prompt leakage risk metric calculates a weighted average of similarity scores that is computed on a set of predefined attack vectors. The weighted average is calculated with a rank value between 1 and 4, where rank 4 represents the prompt attack vector that is the easiest for attackers to exploit.
Learn more
For more information about red teaming metrics, see the following sample notebooks: