Displaying metric details for an alert

This task describes how to view an alert's metric information obtained by the metric anomaly detection capability.

About this task

Procedure

  1. Click the alert that you are interested in from the Alerts page.
  2. If there is metric data that is associated with the alert, an extra tab Metrics anomaly details is present on the preview pane, under the Actions and Information tabs.
    Note: If you want to directly identify which are the alerts that present Metrics anomaly details, you can use the search bar in the Events page by looking up the string Anomaly.
  3. Click the Metrics anomaly details to see the metric data, the associated baseline data for that metric, and any anomalous intervals for the metric. It is possible to view a maximum of two days of data in this preview chart. By default the chart shows the two days preceding the LastOccurrence of the alert.
    Complete either of following steps to disable the baseline from the graph:
    • Click Baseline. If it is already selected by default, then click it again to deselect the Baseline. Deselecting the Baseline removes the baseline in the graph, or
    • Select the metrics anomaly checkbox in the Related alerts table to include the metric anomaly in the Metrics anomaly details graph. Clear the selection to remove the metrics anomaly from the Metrics anomaly details graph.
    Note: In some cases, when you click the check-box in the Related alerts, it does not add the related anomaly to the metric anomaly chart.
  4. Click the drop-down icon to use a calendar and show data for different days in a graph. The graph shows the value of the metric over time. If the metric strays outside the baseline, then it becomes anomalous and is marked by a red bar.
    Attention: Regarding data retention. By default, metric data is removed from the system 15 days after the data are inserted.
    Remember:

    The default time period can be changed by editing the time to live (TTL) setting in the Cassandra table, where the default value is 1296000 seconds.

    First, set the Cassandra username and password.
    export RELEASE=netcool     # change this for your system
    export CASSANDRA_USER=$(oc get secret $RELEASE-cassandra-auth-secret -o jsonpath --template '{.data.username}' | base64 --decode; echo)
    export CASSANDRA_PASS=$(oc get secret $RELEASE-cassandra-auth-secret -o jsonpath --template '{.data.password}' | base64 --decode; echo)
    
    Then, set the retention to 30 days, by running the following command.
    oc rsh ${RELEASE}-cassandra-0 cqlsh -u $CASSANDRA_USER -p $CASSANDRA_PASS --ssl -e "ALTER TABLE tararam.dt_metric_value with default_time_to_live = 2592000"
    If mutual TLS is not set up for Cassandra communication, then remove --ssl from the command.
    oc rsh ${RELEASE}-cassandra-0 cqlsh -u $CASSANDRA_USER -p $CASSANDRA_PASS -e "ALTER TABLE tararam.dt_metric_value with default_time_to_live = 2592000"
    This TTL change affects the data that is loaded after the setting is changed. The change impacts the storage and memory sizing for the Cassandra service.
  5. You can zoom in the graph in two different ways:
    • Click and drag action on the main chart
    • Adjusting the slider bar in the spark-line
    The metric data is aggregated according to a 5-minute granularity. For example, if you have the following five values:
    15:00 - 10.6
    15:01 - 11.0
    15:02 - 10.2
    15:03 - 10.7
    15:04 - 10.8
    They are represented as a single value at 15:00, with the averaged value of 10.66.
    Note: It is possible for you to see overlapping anomalous red shading in the UI, leading to a darker red.

    The following example shows how the overlapping occurs and its resolution.

    • The first datapoint is seen at 6:23. This datapoint is aggregated and aligned to 6:20, as the METRIC_SPARK_AGGREGATION_INTERVAL is set to its default of 5 minutes. The red anomalous area for this datapoint is then drawn for the time range from 6:20 to 6:35.
    • Then, in the following interval, the datapoint is seen at 6:32. This datapoint is aggregated and aligned to 6:30. The red anomalous area for this datapoint is drawn for the time range from 6:30 to 6:45. Thus, an overlap is seen in the red anomalous area from 6:30 to 6:35.

    In this example, to avoid this overlap, change the value of METRIC_SPARK_AGGREGATION_INTERVAL to 15 minutes.

    Edit the deployment as follows:

    oc edit deployment ${RELEASE}-metric-spark-service-metricsparkservice

    Where ${RELEASE} is the custom resource release name of your deployment.

    Search METRIC_SPARK_AGGREGATION_INTERVAL, and change the value to the minimum aggregation interval for your data.