Creating Smart Alerts for Generative AI applications

Set up Smart Alerts to monitor your Generative AI applications and receive notifications when metrics like token usage, costs, or request volumes exceed defined thresholds.

Smart Alerts help you proactively monitor the performance and costs of your Generative AI applications. You can create alerts based on various metrics such as token consumption, API costs, and request rates. By filtering and grouping alerts by service or model, you can monitor multiple components with a single alert configuration.

Creating a Smart Alert

To create a Smart Alert for your Generative AI application, complete the following steps:

Step 1: Navigate to Smart Alerts

  1. From the navigation menu in the Instana UI, select Infrastructure.

  2. Click the Smart Alerts tab.

  3. Click Create Smart Alert.

The Smart Alert configuration dialog opens.

Step 2: Select entity type

In the Entity Type dropdown list, select GenAI app.

This selection ensures that the alert monitors metrics specific to your Generative AI applications.

Step 3: Choose metrics to monitor

In the Metrics section, select the metric you want to monitor. The following metrics are available for Generative AI applications:

  • Input tokens: Monitor the number of tokens sent to the LLM in prompts

  • Output tokens: Monitor the number of tokens generated by the LLM in responses

  • Total tokens: Monitor the combined count of input and output tokens

  • Input token cost: Monitor the cost associated with input tokens

  • Output token cost: Monitor the cost associated with output tokens

  • Total token cost: Monitor the total cost for both input and output tokens

  • Requests: Monitor the number of API requests made to the LLM

Select the metric that is most relevant to your monitoring needs. For example, if you want to control costs, select Total token cost.

Step 4: Filter by service or model

To monitor a specific Generative AI application or model, you need to add filters. Filters help you narrow down the scope of your alert to specific services or models.

Filtering by service name

To filter by a specific Generative AI application (service):

  1. In the Filters section, click Add filter.

  2. In the filter field, search for metric.tag.service_name.

    This attribute appears under the Other category in the dropdown list. Alternatively, you can scroll down to the Other section and locate it there.

  3. Select the operator (for example, equals).

  4. Enter or select your service name (application name).

The service name corresponds to the application name you specified when instrumenting your Generative AI application.

Filtering by model

To filter by a specific LLM model:

  1. In the Filters section, click Add filter.

  2. In the filter field, search for metric.tag.model_id.

  3. Select the operator (for example, equals).

  4. Select your model identifier (for example, gpt-4, claude-3-opus).

You can add multiple filters to create more specific alert conditions. For example, you can filter by both service name and model to monitor a specific model within a particular application.

Step 5: Group by service or model (optional)

Grouping allows you to create a single alert that monitors multiple services or models simultaneously. When a threshold is exceeded for any group member, the alert is triggered.

To group your alert: in the Group By section, select one of the following options:

  • metric.tag.service_name: Group by service name to monitor all services

  • metric.tag.model_id: Group by model to monitor all models

Grouping is particularly useful when you want to monitor multiple components with a single alert configuration.

Step 6: Complete alert configuration

After configuring the entity type, metrics, filters, and grouping for your Generative AI application, you need to complete the remaining alert configuration steps. These steps are common across all Smart Alerts in Instana and include:

  • Setting threshold values and operators

  • Configuring time thresholds and evaluation windows

  • Adding alert channels for notifications

  • Customizing alert properties (title, description, incident triggering)

  • Adding custom payloads (optional)

For detailed instructions on completing these configuration steps, see Smart Alerts for infrastructure.

After you completed the configuration, click Create to save your Smart Alert.

Example: Monitoring costs across multiple models

This example demonstrates how to create an alert that monitors total token costs across all models for a specific Generative AI application.

Scenario: You want to be notified if ANY model used by your "customer-support-bot" application exceeds $50 in token costs within a 1-hour period.

Configuration:

  1. Entity Type: GenAI app

  2. Metric: Total token cost

  3. Filter: metric.tag.service_name equals customer-support-bot

  4. Group By: metric.tag.model_id

  5. Threshold: > 50 (Critical)

  6. Time Threshold: 1 hour, 1 consecutive violation

  7. Alert Channel: Your preferred notification channel

With this configuration, you receive a single alert if any model (for example, GPT-4, Claude, or Gemini) used by your customer support bot exceeds the $50 cost threshold. The grouping by model ID allows you to monitor all models with one alert while still identifying which specific model triggered the alert.

Combining filters and grouping

You can combine filtering and grouping for more granular alerting strategies:

  • Filter by service + Group by model: Monitor all models within a specific application

  • Filter by model + Group by service: Monitor a specific model across all applications

  • Multiple filters + Grouping: Create complex monitoring scenarios for specific use cases

This flexibility allows you to create alert configurations that match your operational needs and cost management strategies.