Data quality messages and alerts

The data quality precheck stage always runs on your input data before it can be approved for training. Once the precheck stage is complete, you are presented with an assessment of the data quality.

The data quality precheck examines whether enough data is available to return meaningful results (data sufficiency), and whether the data meets mapping requirements (data formatting). A result is displayed on the console, and this result affects the execution of the model training, as if the precheck fails, the training is disabled, and you cannot run training.

If you change the selected training data, the precheck stage is always run again.

Mapping data

Poor data formatting is often associated with incorrect field mapping.

Data is collected when you select a source on the Integrations and add a integration. Within the window for configuring the integration, use the Field mapping section to map source fields to standard fields. For example, the _timestamp in Mezmo data should be mapped to timestamp in the standard format. One reason that schema validation might fail, is when incorrect data mappings are entered in the Mapping field. For more information on the overall data loading and AI training process, see Planning data loading and training.

Good data quality

Good data quality means that requirements for data sufficiency and data formatting are met. Good data quality is represented on the Cloud Pak for AIOps console using the following label: Good data.

Data quality Recommendation Description
Good data Precheck came back with good data. You are now ready to start training models. Requirements for data sufficiency and data formatting are met.

Poor data quality

Poor data quality messages vary with the AI algorithm that you are training. Poor data quality is represented on the Cloud Pak for AIOps console using the following label: Data needs improvement.

For each of the following AI algorithms, this topic describes the possible data quality messages, provides a brief description of the message, and points to any relevant documentation task, which can support remediation action.

Change risk

This algorithm takes as input change and incident data from ServiceNow. To meet overall data quality requirements, both change and incident data must meet data sufficiency and data formatting requirements.

Data quality alert Recommendation Description Link
Data needs improvement Part of your data did not pass precheck. View recommendations to see how you might improve data quality. Data precheck issue: Insufficient number of problematic change tickets for a good model. For a closed change ticket count less or equal to 150, you need at least three problematic tickets. For a closed change ticket greater than 150, you need at least two percent of change tickets to be problematic. A problematic change ticket is defined as a change ticket that can either be associated with incident tickets, or have an 'unsuccessful' value set in the change ticket close code field. See ServiceNow integration
Data needs improvement Change data: The data is not valid according to schema. Check input data mapping or data source. Data formatting issue: Change data does not meet data formatting needs. You might need to modify the data's field mapping. See ServiceNow integration
Data needs improvement More data is needed. You need at least _X_ change tickets for a good model. Data sufficiency issue: Change data does not meet data sufficiency needs. More change ticket data is needed See ServiceNow integration
Data needs improvement Incident data: The data is not valid according to schema. Check input data mapping or data source. Data formatting issue: Incident data does not meet data formatting needs. You might need to modify the data's field mapping. See ServiceNow integration
Data needs improvement More data is needed. You need at least 1 incident data point for a good model. Data sufficiency issue: Incident data does not meet data sufficiency needs. More change ticket data is needed See ServiceNow integration
Data needs improvement Data is insufficient for a good model. You need at least 2 percent of change tickets to either be associated with incident tickets or have an 'unsuccessful' value set in the change ticket close code field. Data sufficiency issue: Incident data does not meet data sufficiency needs. You must provide data with a change ticket failure rate of 2% or higher. See ServiceNow integration

Log anomaly detection

This algorithm takes as input log data from one or more log systems. The log anomaly detection algorithm itself has multiple underlying microservices. Data sufficiency and data formatting requirements for this algorithm work as follows:

  • Each of these microservices requires a minimum of 2000 data points. To meet the minimum data sufficiency requirement for the algorithm as a whole, at least 75% of the algorithms must meet this 2000 data point requirement. For example, if only two microservices out of 10 have less than 2000 data points, and the rest have more than 2000 data points, then data sufficiency requirement for the algorithm as a whole is good.
  • To meet the data formatting requirement for the algorithm as a whole, at least 75% of all data (from all of the microservices) must be valid with respect to the schema.
Data quality alert Recommendation Description Link
Data needs improvement Part of your data did not pass precheck. There were not enough logs for X out of Y resources. Models were only created for those where sufficient data existed. You can expand the date range to gather more data and train again to expand coverage. Data precheck issue: Data does not meet data precheck needs. You might need to modify the data's quality before running the training. See Train model precheck
Data needs improvement There were not enough logs for _X_ out of log_anomaly_detection components. Models were only created for those where sufficient data existed. You can expand the date range to gather more data and train again to expand coverage. Data sufficiency issue: Data does not meet data sufficiency needs. More than 25% of the component microservices have less than the required 2000 data points. More log data is needed. Try expanding the data range of the data provided. See Natural language log anomaly detection: data to train on
Data needs improvement We have detected that a portion of this data set is in an unsupported language, and could affect the quality of this model. We recommend that you remove this data from the data set and retrain the model. Language support issue: Log anomaly detection only supports the following languages and language combinations: English, Spanish, German, English and Spanish, English and German. Your data contains either an unsupported language or an unsupported language combination. For more information, see Language support. To remediate this issue, see Managing unsupported languages.

Similar tickets

This algorithm takes as input incident data from ServiceNow. To meet overall data quality requirements, both change and incident data must meet data sufficiency and data formatting requirements.

  • To meet data sufficiency requirements, a minimum of five closed incidents with associated resolution is required.
Data quality alert Recommendation Description Link
Data needs improvement Part of your data did not pass precheck. View recommendations to see how you might improve data quality. Data precheck issue: More data is needed. You need at least five closed incidents with resolution for a good model. See ServiceNow integration
Data needs improvement More data is needed. You need at least 5 closed incidents with resolution for a good model. Data sufficiency issue: Data does not meet data sufficiency needs. More closed incident ticket data is needed. See ServiceNow integration

Note There is a new data quality alert that appears in the console if the precheck is launched (for all of the three algorithm failures) without any data loaded. The alert tells you that the 'Precheck failed', and the recommendation is 'Cannot find any data'.