Managing unsupported languages

If natural language log anomaly training reports that the model needs improvement because of unsupported language issues, you can remove the offending data from the data set by deleting component-specific logs, and then retrain.

About this task

Log anomaly detection only supports the following languages and language combinations:

  • English
  • English and French
  • English and German
  • English and Italian
  • English and Spanish
  • French
  • German
  • Italian
  • Spanish

If your data contains either an unsupported language or an unsupported language combination, then you will get the following data quality alert on completion of training:

Needs improvement We have detected that a portion of this data set is in an unsupported language, and could affect the quality of this model. We recommend that you remove this data from the data set and retrain the model.

To remediate the situation you must delete all the logs from components whose logs are all or partially in an unsupported language.

Procedure

Perform the following steps to remediate the situation.

  1. Log in to your cluster.

    oc login -u kubeadmin -p <password>
    

    For more information, see Logging in to the OpenShift CLI.

  2. Run the oc project command to set the context to the project where Cloud Pak for AIOps is deployed.

  3. Find the name of the apiserver pod.

    oc get pod | grep api-server
    
  4. Open a terminal on that pod.

    oc exec -it <api-server_podname> bash
    
  5. Run the following curl command exactly as specified below to retrieve data quality alert details.

    curl -k -u $ES_USERNAME:$ES_PASSWORD -X GET "https://$ES_ENDPOINT/prechecktrainingdetails/_search?pretty" -H 'Content-Type: application/json' -d'{"_source":"dataQualityDetails.languageInfo.components"}' 
    

    In this command:

    • ES_USERNAME is the Elasticsearch username.
    • ES_PASSWORD is the Elasticsearch password.
    • $ES_ENDPOINT is the Elasticsearch environment variable in the pod.
    • https://$ES_ENDPOINT/prechecktrainingdetails is the Elasticsearch endpoint for the prechecktrainingdetails index, which stores language information associated with the data set sampled for a specific component.
  6. In the output that is generated, navigate to the dataQualityDetails field, and find the nested languageInfo field. Then retrieve the names of the component where the language list has unknown as one of the values. For example, in the example below you would select abc.

    "components" : [
                 {
                   "name" : "abc",
                   "language" : [
                       "English",
                       "unknown"
                                ],
    
  7. For each component that you identified in the previous step, delete the logs from all log data indices by running the following command.

    curl -k -u $ES_USERNAME:$ES_PASSWORD -X POST "https://$ES_ENDPOINT/*-logtrain/_delete_by_query" -H 'Content-Type: application/json' -d'{"query": {"bool": {"must": [{"term": {"instance_id.keyword": "abc"}}]}}}'
    

    Where abc must be replaced with the name of the component.

    Note If all components are detected as having logs in an unsupported language, then this step will delete all data from the Elasticsearch indices for log anomaly training. You will need to train again when fresh data arrives.

  8. Once you have deleted the relevant logs, you must relaunch the training of your natural language log anomaly detection algorithm, as described in Generating AI models.