Defining custom events

Edit online

A custom event enables you to create issues or incidents based on an individual metric of any given entity.

Creating a custom event

Edit online

To create a custom event, complete the following steps:

On the sidebar, click Settings.
Click Events -> New Event.
Provide the basic Event Details of the event:
- Enter a name and description for the event. (Avoid using hyphens in these fields as those could lead to unexpected results during search.)
- Select the severity for the issue: warning or critical.
- Select whether the event should be considered an incident and set a grace period, which is the period to wait before closing the issue when conditions are no longer met.
Configure the Condition of the custom event:

Create a condition for the custom event by selecting a data source, which provides metrics used for triggering this event:
- Built-in metrics: These are metrics that are available when the corresponding entity is instrumented.
  - For example, when a JVM is being monitored, Instana provides metrics such as the amount of memory in use. Each entity of type JVM has such a memory.used metric out of the box.
  - Aside from that, some dynamic built-in metrics exist multiple times per entity, one for each single sub-entity. An example would be the available disk space of a Host. For each disk of a host, Instana provides a separate metrics fs.{device}.free, such as fs./dev/xvda1.free. custom events can be defined by specifying which devices should be matched, for instance starts with /dev.
- Custom metrics: These are metrics that are explicitly exposed by a monitored application. For example, applications can expose the following custom metrics:
- System Rules:
  - Offline event detection: This rule is active when an entity (such as JVM or process) goes offline.
  - Hosts that do not have matching entities running on them: This rule is active when there are no matching entities (such as JVM or process) running on a host that is in scope of the custom event.
  - Host availability detection: This rule is active when a previously seen Host is offline for a specified duration of time.
  - Hosts that have unexpected number of entities running on them: This rule is active when an unexpected number of matching entities (such as JVM or process) that run on a host are within the scope of the Custom event.
Depending on the metric type, you have different options how to define the condition that triggers the custom event. For example, you can configure an event if the error-rate within a time window of 5 minutes is greater than 10%.

A maximum of 5 metric conditions can be defined. If the AND logical operator is used, all these conditions need to be fulfilled to trigger the rule. If the OR logical operator is used, only one of the conditions is required to trigger the rule.

When a dynamic metric is combined with a normal metric, such as fs.{device}.free with cpu.used, then the metric of each device is combined with the CPU metric one by one. As a result, if the metric pattern of the dynamic metric matches three devices of a single host, then you can see up to three active Issues at the same time, which are related with the following metrics respectively:
- fs./dev/first.free and cpu.used
- fs./dev/second.free and cpu.used
- fs./dev/third.free and cpu.used
Define the Scope of the event:

Typically, you don’t want an event to be triggered on all entities in your application or system landscape, but you want to restrict the event to a specific set of entities. The scope lets you define which entities the event will be evaluated for:
- Application perspective: Reference an application perspective.
- Selected entities: Define a Dynamic Focus query (DFQ.md). Only entities matching that query will be considered when the event is evaluated.
- Selected entities (Scope hosts by tag): Only host entities with matching tags will be considered. The tag has to be defined on the host and not on an entity running on the host.
- All available entities: No restriction, evaluate the event for all entities in your application or system landscape.
Limitation: When a custom event is defined on a service or endpoint using the scope on specific application, either by selecting an application explicitly or using DFQ, then issue-detection will be applied to services and endpoints on that scope. However, the KPIs for each selected service or endpoint are based on calls to the entire entity and not just calls which are in context of the application in scope. Thus, the scope is only used for entity selection but has no impact on the KPI used.
Configure transient events (optional): Many events resolve quickly on their own, often within a few minutes, before any action can be taken. To reduce noise, Instana identifies these short-lived events as potential transients based on the historical patterns. Users can choose to filter them out in the event view. This feature is enabled by default.
- Enable or disable the feature: Use the toggle to enable or disable the feature (default: enabled).
- Set the transient threshold: Choose the time window (in minutes or hours) that defines what counts as a transient event. If an event is predicted to last less than this threshold, it is considered as transient. This threshold reflects how long you are willing to wait before deciding an issue needs your attention.
- Choose notification behavior:
  - Send an alert immediately: An alert is triggered when the event starts. This is the default behavior for regular events.
  - Only send an alert if an event persists after the threshold: Alerts are suppressed for events that are expected to be transient. If an event lasts longer than the threshold, an alert is sent. Otherwise, if it ends within the threshold, no alert is triggered — helping reduce noise from short-lived issues.
Note: When you enable the transient event detection feature, it does not generate events. Instead, the feature identifies and labels existing events as transient, which helps you filter and analyze them. For more information, see Configuration summary in the FAQ section.
To save the new custom event, click Create.

FAQ

Edit online

Why are some custom events marked as deprecated?

Edit online

Custom events on entities that relate to application perspectives, such as Application, Service or Endpoint, are marked as deprecated in favour of Application Smart Alerts.

As indicated in the "Settings" page, you will not be able to create new custom events on these three entity types soon. You are suggested to not create any new custom events on these three entity types. Create a Smart Alert instead. For more information about custom events on these three affected entity types that you have already created, see migration to Smart Alerts guide.

Custom events on any other entity types such as Host, JVM or Kubernetes Pod, are not affected by this at all.

What is a transient event, and how does it work?

Edit online

A transient event is one that Instana predicts might resolve on its own shortly after it starts, based on historical patterns. The system does not guarantee resolution, but uses past data to estimate which events are likely to be short-lived.

This feature is designed to help teams manage noisy environments where many events appear briefly and disappear before anyone can meaningfully respond. These short-lived events are often too brief to act on effectively, as they might resolve before any meaningful investigation can begin. They also tend to occur frequently, cluttering dashboards and contributing to alert fatigue. By identifying such events as potentially transient, Instana allows users to temporarily filter or mute them, making it easier for SREs to focus on persistent, actionable issues.

Prediction logic

Edit online

Instana uses historical data to estimate the expected duration of a new event. This prediction depends on the duration of similar past events for a particular configuration on a specific entity. If the system predicts a quick resolution, it labels the event as transient. The result is saved in the event state in the `isTransient` field, which shows whether Instana expects the event to resolve on its own within a set threshold.

Note: This feature applies only to issues, not incidents. The system does not make predictions for events that are linked to actions defined in a policy.

Transient threshold configuration

Edit online

You can define a threshold (for example, 5 minutes) to determine what qualifies as transient. If an event is expected to resolve within this time, it is marked as transient. This threshold defines how much delay you are willing to tolerate before you look into an issue. For example, if your team is expected to resolve high-severity issues within one day, and most incidents take a few hours to fix, then setting the threshold to one hour means that you are comfortable ignoring events that are likely to resolve on their own within that window.

Event behavior

Edit online

The system labels predicted transient events as Transient in the State column of the Events table.
If an event persists beyond the threshold, then the event is reclassified as a regular event, and the transient label is removed.
You can select an item in the Transient events section to filter events based on their transient status. Available options include the following items:
- Show all
- Show transient only
- Show non-transient only
This feature allows you to control the visibility of short-lived events based on your operational needs.

Alerting behavior

Edit online

You can configure alerts in one of the following two ways:

Send immediately when the event starts (default behavior).
Send only if the event persists beyond the threshold, reducing noise from short-lived, self-resolving events.

Configuration summary

Edit online

When the transient event detection feature is disabled, the following behaviors occur:
- No prediction is made, and no transient tag is added.
- Alerts are always sent immediately.
When the transient event detection feature is enabled with the Send an alert immediately option, the following behaviors occur:
- A transient prediction is made for a new event.
- If predicted transient: an alert is sent, and the event is tagged as transient in the event table.
- If predicted not transient: an alert is sent, with no tag.
When the transient event detection feature is enabled with the option Send an alert only if the event persists after the threshold, the following behaviors occur:
- A transient prediction is made for a new event.
- If predicted transient: The event is tagged as transient, but the alert is held until the threshold is reached.
- If the event resolves before the threshold: No alert is sent.
- If the event persists beyond the threshold: An alert is sent, and the transient tag is removed.
- If predicted not transient: An alert is sent immediately, with no tag.