Generating an AI model

Generate an AI model by training one of the AI algorithms. The AI model that is generated by the training acts on live data and generate insights to your SREs.

About this task

You only need to generate an AI model for trainable AI algorithms. Pre-trained AI algorithms, in contrast, analyze your data in real time and derive insights without the need to generate a model. For more details, see Algorithm types.

Before you begin

You must save at least one algorithm training setup, as described in Setting up training for trainable AI algorithms.

Procedure

  1. In the Cloud Pak for AIOps home page, click the navigation icon at the upper-left corner of the screen to go to the main navigation menu.

  2. In the main navigation menu, click Operate > AI model management to open the Training page.

    A list of algorithms is shown in tile format under sections depending on whether they are trainable or pre-trained AI algorithms.

  3. Click 'Set up training' on the trainable algorithm tile for which you want to generate an AI model. When the setup is complete for that algorithm, the details from the 'Training Set up' page appear. To understand a particular model - what it does and what kind of data you need to run it – click Description (information icon) next to the tile's title.

    Note: You can run precheck before training your model. Precheck checks the quality of the data first. Regardless, if you try to train models without first running the precheck, AI Models still run the precheck first anyhow. Even if the training data changes, the precheck will be rerun on the new data.

  4. If the setup is complete for an algorithm, click the algorithm's tile for more details on that AI model and to begin the training.

    When the model detail page opens, you can click Precheck data from the panel on the right side of the screen, before training. This checks the quality of the data. Regardless, if you try to train models without running the precheck, AI models still run the precheck first, anyhow. Even if the training data changes, the precheck will be rerun on the new data. Note: There is no data precheck for Metric anomaly detection.

  5. In the right side panel, click Train Models.

    After a few moments the IBM Cloud Pak for AIOps console responds as follows:

    • A message appears with the text: Training successfully started.
    • The AI Training tile displays the message: In Progress. Training is now in progress.
  6. As training progresses it passes through phases, which you can view under Activity, in the side panel.

There are six possible phases that are shown, though they vary depending on which algorithm you chose to train:

1. **Queued**
2. **Data retrieving**
3. **Quality check**
4. **Preparing data**
5. **Training**
6. **Saving models**

Only change risk contains all six phases. Once training is complete, the AI Training tile displays the messages:

  • Complete
  • Deployed (if indeed it has also been deployed).

At this point AI models have been created, with a specified data quality, which can take one of the following values:

  • Good

  • Needs improvements

    Note: No quality check is carried out on the Metric anomaly detection or Temporal Grouping models, as the check is not supported by the API that provides data to these algorithms.

For more details, click the information symbol in the Data quality tile. For a full set of data quality measures and messages, see Data quality messages.

For log anomaly detection training, an extra Models tile is displayed containing information on the models that were generated during training. For more details, see Displaying log anomaly detection resources and models.

Change risk tiles

For change risk, there is a new section entitled Ticket data quality, consisting of three tiles:

  • Closed tickets: A number. For example, 10000. Historical ticket data must be in a terminal state (closed tickets) in order for the algorithm to train.
  • Missing tickets: Data in high-importance fields within a ticket is needed to create good models. The more complete the data that you have in all fields (such as type, business service, sys domain and short description), the better your models perform. Note: if less than 0.5% of the total closed tickets are missing the relevant field (type, business service, and so on), it is displayed as 0% in the UI.
  • Ticket Word Count: Tickets with a short (<5 words) overall word count generally don’t provide enough information for the algorithm to create good models. This field combines long and short count words for the number of tickets, say 10000.

There is also a Model prediction tile, which shows the number of incidents causing tickets as a percentage.

What to do next

If you selected manual deployment when setting up training for the model, then you can now proceed to deploy the AI models that you created. For more details, see Deploying models.

If you encountered errors in the training, see the following sections for more information:

Reference information

The following information is referenced in this task.

List of algorithm training setups

Column Description
AI algorithm The AI algorithm that is related to this record. Possible values are: Change risk, Log anomaly detection,Metric anomaly detection, Similar incidents, and Temporal grouping.
Latest model version Latest version of the model. A new version of the model is generated every that time the algorithm training setup is run.
Version deployed This is the currently deployed version of the model. If training is scheduled and deployment is automatic, then the value in this field always matches the value in the Version field.
Date trained Date and time this training setup was last run. If it has never been run then this field is blank.
Activity Displays one of the following status values: (a) Not started: this training setup has not yet been run; (b) Training: the training setup is currently running; (c) Training queued: training is currently waiting on another training job to complete; (d) Training complete: the most recent training setup run was successful; (e) Failed: Training failed.
Danger zone Contains the Delete option, which will delete all models for the algorithm.