Setting up training for log anomaly detection - golden signals

Log anomaly detection - golden signals uses the Metric anomaly detection pipeline for log data sets. It takes in raw logs and converts the textual information to log patterns with variables (both known and unknown) and static content.

The patterns are also referred to as templates. Log data after templatization is filtered, based on golden signals, and converted to metric data.

Prerequisites

Take the following actions.

  1. To set up training for this algorithm, you need log data that shows normal operational behavior. If your logs include anomalous situations, such as a critical incident or a technical glitch, filter out this anomalous data before you train the algorithm.

  2. In the Cloud Pak for AIOps home page, click the navigation icon (four horizontal bars) to go to the main navigation menu.

  3. In the main navigation menu, click Operate > AI Model Management to open the AI Model Management.

  4. Disable the Log anomaly detection - natural language and Log anomaly detection - statistical baseline algorithms.

  5. Create a training definition for Metric anomaly detection with a training schedule.

Starting the training setup

  1. Within the Log anomaly detection - golden signals tile, click Set up training.

    Note: If this AI algorithm is already set up for training, Set up training is not available. For more information about changing the algorithm training setup, see Editing the training setup for an algorithm.

  2. Click Next to open the Getting started panel. The panel explains this AI algorithm and how it helps in your production environment. The panel also provides a list of integrations that are available to generate a model. At least one integration must exist in this list for the model to collect data and start training.

  3. If no integrations are listed or if you expected to see more integrations, click Integrations to modify your integrations.

  4. Add at least one integration by using the Integrations area of IBM Cloud Pak for AIOps. Alerts are generated after the following conditions are true:

    • At least 100,000 log messages are processed for template quality.
    • The logs span at least 3.5 days, and metric anomaly detection training is run on that data.

    On the Integrations, check that at least one integration is listed.

    Note: If no integrations are listed, or you expected to see more integrations, click Go to Integrations to modify your integrations.

    Note: Make sure that all the log data integrations that are created in the Integrations section are in Live data for initial training or Historical data for initial AI training mode​. If you choose Historical data for initial AI training, the source parallelism must be set to 1. You can set up log data integration before or after you enable log anomaly golden signals.

    log integration

  5. Click Next.

  6. In the Training setup panel, click Done. The training begins.

What happens next

Now the training setup is complete.

The template training continuously runs in the background and in parallel data is sent to Metric anomaly detection.

If you chose the Historical data for initial AI training mode, and the integration status displays as Done, edit your integration and select Live data for initial AI training mode for continuous AI training and anomaly detection.

Upon receiving sufficient data, Metric anomaly detection trains models and generates alerts if any anomalous logs are detected.

Disabling the training

You can turn off the algorithm by toggling the switch to Disabled in the Log anomaly detection - golden signals tile in the AI model management UI.

To restart, toggle the switch to Enabled.

The training pipeline sets la_golden_signals_enabled_flag to True in Elasticsearch. Flink jobs is now enabled to start with log anomaly golden signal components.

To initiate the change, the training pipeline restarts all the Flink jobs.