Setting up training for log anomaly detection - natural language
Set up training for this AI algorithm to detect unusual patterns in your logs and notify you by generating an alert when they occur.
About this task
For more information on the how the different types of log anomaly detection work, see About log anomaly detection algorithms.
To set up training for this algorithm, you need log data that shows normal operational behavior. If your logs include anomalous situations, such as a critical incident, technical glitch, or some other change that deviates from the norm, you can filter that data out before training the algorithm.
Training setup of the algorithm is done by completing some setup tasks. When the training setup is complete, you need to go to a different part of the AI Model Management to train the model.
Note If both natural language and statistical baseline log anomaly detection algorithms are active, then any log anomalies that are discovered by both will be reconciled, so that only one alert is generated.
Before you begin
When providing log data, select a date range that includes at least 10,000 lines of messages for each resource that you want insights from. An ideal amount would be around 100,000 lines. When selecting longer date ranges, consider that it might take more time to process the data and might have system impacts.
To check that you have met these requirements, review the prerequisites for this AI algorithm. For more details, see the rows that are associated with Log anomaly detection in the prerequisites table on this page: Prerequisites.
In addition, you must perform the following tasks:
- Configure resources such as CPU and memory to ensure good training performance. For more details, see Configuring resources for log anomaly detection
Running the task
There are multiple parts to this task:
- Starting the training setup
- Checking data integrations
- Specifying which data to train on
- Excluding irregular data
- Deciding how to run the training
- Deciding how to deploy the training
- What to do next
- Viewing the running jobs
Starting the training setup
Go to the AI Model Management and complete some setup tasks.
- In the Cloud Pak for AIOps home page, click the navigation icon at the upper-left corner of the screen to go to the main navigation menu.
- In the main navigation menu, click Operate > AI model management to open the AI Model Management.
- In the AI algorithms tab, click Set up Training within the Log anomaly detection - natural language tile.
Note: If this AI algorithm has already been set up for training, then the word Set up appears in grayed out text at the bottom of the tile. In this case, you can edit the algorithm training setup, as described in Editing the training setup for an algorithm.
Checking data integrations
Review the information on this page. It explains what this AI algorithm is, how it helps in your production environment, and provides a list of integrations that are needed to generate a model.
-
On the Integrations, check that at least one integration is listed.
Note: If no integrations are listed, or you expected to see more integrations, click Go to Integrations to modify your integrations. For more details, see this page: Prerequisites.
-
Click Next to move to the next panel.
Specifying which data to train on
Review and modify your data integrations, if necessary.
-
Review the Data assets table.
This table lists the data integrations from the Getting started panel, and provides the following information for each data integration:
- How is the data being collected? Options are Live or Historical.
- Is the data flow on?
To make any changes, click Go to Integrations to modify your integrations. For more details, see this page: Prerequisites.
Optionally remove a data integration by clicking the checkbox to the left of the data integration name, and clicking the Delete icon that appears.
-
Specify a date range to train on. Select Preset for a preset option or Custom for a range of dates.
Tip If you are planning to schedule the execution of the training for this algorithm then you must select one of the preset options that retrieves a fixed window of data relative to the current time.
-
Preset: Click this option and select one of the following time ranges. Note that 14 days of log data is the maximum amount that is supported with the default values.
- Past day
- Past 3 days
- Past 7 days
- Past 10 days
- Past 14 days
Figure. Schedule training -
Custom: This is an easy and quick option to use if you are not concerned with training models on data from a very specific window of time. Click this option and specify a fixed date range.
Figure. Custom Note When you set custom dates and times, you must be aware of the following:
- You have the option to specify start and end times in local time or UTC. By default, the time is in local time.
- Ingested log data is stored in Elasticsearch in UTC.
- To ensure that you select the correct data intervals, specify both the start and end times in this panel in UTC.
For example, if you select November 15 12:00 AM through December 15 11:59 PM UTC time, the training window includes data from the start day of November 15 through to the end of the day on December 15. If you instead select December 15 12:00 AM, then the training window includes data from November 15 through December 14 11:59 PM.
- If you choose to use local time, you need to adjust the start and end date and time to match the period for which log data is available in ES.
-
-
Click Next to move to the next panel.
Filter data
Filter data from the training. This includes data associated with event storms, maintenance windows, major events, and other unusual outages. This will enable the Log anomaly detection algorithm to baseline normal behavior and record expected log messages.

-
(Optional) Select date range of irregular data to filter out. Then click Apply dates. Your selection appears in the Applied dates table below.
Tip Go to your data source to view your log data. This will help you determine when there were incidences of irregular data.
Note When you set custom dates and times, you must be aware of the following:
-
You have the option to specify start and end times in local time or UTC. By default, the time is in local time.
-
Ingested log data is stored in Elasticsearch in UTC.
-
To ensure that you select the correct data intervals, specify both the start and end times in this panel in UTC.
For example, if you select November 15 12:00 AM through December 15 11:59 PM UTC time, the training window includes data from the start day of November 15 through to the end of the day on December 15. If you instead select December 15 12:00 AM, then the training window includes data from November 15 through December 14 11:59 PM.
-
If you choose to use local time, you need to adjust the start and end date and time to match the period for which log data is available in ES.
-
-
(Optional) Repeat step 1 for each date range of irregular data you want to apply.
The data set is listed in the Data sets table at the bottom of the screen.
-
Click Next to move to the next panel.
Schedule the training
Decide whether to run the training setup on demand or to schedule it to run on an ongoing basis.

-
Proceed as follows:
- To run the training setup on demand, ensure that Schedule to run is set to Off, and go to step 3.
- To run the training setup on a schedule, set Schedule to run to Yes, and go to the next step.
-
Schedule the run. Click this option to specify a schedule. You can specify a start date with an optional end date, a frequency, and a time based on Coordinated Universal Time (UTC).
Note:
- 'Start' is when you want the scheduler to begin launching the training run. You can select 'now' for the scheduler to immediately start running, or you can specify a custom time by using the 'on date' option.
- 'End' lets you specify how long you want the scheduler to manage the training runs. By selecting 'Never', the scheduler will keep running trainings (on the defined frequency) forever. You can also specify a custom end date by selecting 'on date'.
- 'Frequency' lets you select how often they want the model to be retrained. The possible options are daily, weekly, bi-weekly, or monthly.
-
Click Next to move to the next panel.
Deciding how to deploy the training
Decide whether to review the results of training before deploying the model, or to have the system automatically deploy the model.
-
Proceed as follows:
- To review the results of training before deploying the model, ensure that Deployment type is set to Review first.
- To have the system automatically deploy the model, set Deployment type to On completion.
-
Click Done to save the training setup for the algorithm.
Viewing the running jobs
- From the main navigation of the console, click AI model management.
- Click on the tile of the AI algorithm you want to view.
- From the Summary tab, click Start training to begin training your AI algorithm. While the training runs, the jobs and their statuses are displayed in the Activity section.
What to do next
Now that the training setup is complete, you can train the algorithm. For more details, see Launching the training.