Creating Smart Alerts for Generative AI applications
Set up Smart Alerts to monitor your Generative AI applications and receive notifications when metrics like token usage, costs, or request volumes exceed defined thresholds.
Smart Alerts help you proactively monitor the performance and costs of your Generative AI applications. You can create alerts based on various metrics such as token consumption, API costs, and request rates. By filtering and grouping alerts by service or model, you can monitor multiple components with a single alert configuration.
Before you begin
Ensure that your Generative AI application is instrumented and sending telemetry data to Instana. For more information, see Getting started with Generative AI observability.
Creating a Smart Alert
To create a Smart Alert for your Generative AI application, complete the following steps:
Step 1: Navigate to Smart Alerts
-
From the navigation menu in the Instana UI, select Infrastructure.
-
Click the Smart Alerts tab.
-
Click Create Smart Alert.
The Smart Alert configuration dialog opens.
Step 2: Select entity type
In the Entity Type dropdown list, select GenAI app.
This selection ensures that the alert monitors metrics specific to your Generative AI applications.
Step 3: Choose metrics to monitor
In the Metrics section, select the metric you want to monitor. The following metrics are available for Generative AI applications:
-
Input tokens: Monitor the number of tokens sent to the LLM in prompts
-
Output tokens: Monitor the number of tokens generated by the LLM in responses
-
Total tokens: Monitor the combined count of input and output tokens
-
Input token cost: Monitor the cost associated with input tokens
-
Output token cost: Monitor the cost associated with output tokens
-
Total token cost: Monitor the total cost for both input and output tokens
-
Requests: Monitor the number of API requests made to the LLM
Select the metric that is most relevant to your monitoring needs. For example, if you want to control costs, select Total token cost.
Step 4: Filter by service or model
To monitor a specific Generative AI application or model, you need to add filters. Filters help you narrow down the scope of your alert to specific services or models.
Filtering by service name
To filter by a specific Generative AI application (service):
-
In the Filters section, click Add filter.
-
In the filter field, search for
metric.tag.service_name.This attribute appears under the Other category in the dropdown list. Alternatively, you can scroll down to the Other section and locate it there.
-
Select the operator (for example,
equals). -
Enter or select your service name (application name).
The service name corresponds to the application name you specified when instrumenting your Generative AI application.
Filtering by model
To filter by a specific LLM model:
-
In the Filters section, click Add filter.
-
In the filter field, search for
metric.tag.model_id. -
Select the operator (for example,
equals). -
Select your model identifier (for example,
gpt-4,claude-3-opus).
You can add multiple filters to create more specific alert conditions. For example, you can filter by both service name and model to monitor a specific model within a particular application.
Step 5: Group by service or model (optional)
Grouping allows you to create a single alert that monitors multiple services or models simultaneously. When a threshold is exceeded for any group member, the alert is triggered.
To group your alert: in the Group By section, select one of the following options:
-
metric.tag.service_name: Group by service name to monitor all services -
metric.tag.model_id: Group by model to monitor all models
Grouping is particularly useful when you want to monitor multiple components with a single alert configuration.
Step 6: Complete alert configuration
After configuring the entity type, metrics, filters, and grouping for your Generative AI application, you need to complete the remaining alert configuration steps. These steps are common across all Smart Alerts in Instana and include:
-
Setting threshold values and operators
-
Configuring time thresholds and evaluation windows
-
Adding alert channels for notifications
-
Customizing alert properties (title, description, incident triggering)
-
Adding custom payloads (optional)
For detailed instructions on completing these configuration steps, see Smart Alerts for infrastructure.
After you completed the configuration, click Create to save your Smart Alert.
Example: Monitoring costs across multiple models
This example demonstrates how to create an alert that monitors total token costs across all models for a specific Generative AI application.
Scenario: You want to be notified if ANY model used by your "customer-support-bot" application exceeds $50 in token costs within a 1-hour period.
Configuration:
-
Entity Type: GenAI app
-
Metric: Total token cost
-
Filter:
metric.tag.service_nameequalscustomer-support-bot -
Group By:
metric.tag.model_id -
Threshold:
>50 (Critical) -
Time Threshold: 1 hour, 1 consecutive violation
-
Alert Channel: Your preferred notification channel
With this configuration, you receive a single alert if any model (for example, GPT-4, Claude, or Gemini) used by your customer support bot exceeds the $50 cost threshold. The grouping by model ID allows you to monitor all models with one alert while still identifying which specific model triggered the alert.
Combining filters and grouping
You can combine filtering and grouping for more granular alerting strategies:
-
Filter by service + Group by model: Monitor all models within a specific application
-
Filter by model + Group by service: Monitor a specific model across all applications
-
Multiple filters + Grouping: Create complex monitoring scenarios for specific use cases
This flexibility allows you to create alert configurations that match your operational needs and cost management strategies.
Related information
-
To configure thresholds, time windows, alert channels, and properties, see Infrastructure Smart Alerts.
-
To view metrics and traces for your Generative AI applications, Viewing telemetry data.
-
Cost calculation - Understand how token costs are calculated
-
Alert channels - Configure notification channels for your alerts