Managing unsupported languages
If natural language log anomaly training reports that the model needs improvement because of unsupported language issues, you can remove the offending data from the data set by deleting component-specific logs, and then retrain.
About this task
Log anomaly detection only supports the following languages and language combinations:
- English
- English and French
- English and German
- English and Italian
- English and Spanish
- French
- German
- Italian
- Spanish
If your data contains either an unsupported language or an unsupported language combination, then you will get the following data quality alert on completion of training:
Needs improvement We have detected that a portion of this data set is in an unsupported language, and could affect the quality of this model. We recommend that you remove this data from the data set and retrain the model.
To remediate the situation you must delete all the logs from components whose logs are all or partially in an unsupported language.
Procedure
Perform the following steps to remediate the situation.
-
Log in to your cluster.
oc login -u kubeadmin -p <password>
For more information, see Logging in to the OpenShift CLI.
-
Run the
oc project
command to set the context to the project where Cloud Pak for AIOps is deployed. -
Find the name of the apiserver pod.
oc get pod | grep api-server
-
Open a terminal on that pod.
oc exec -it <api-server_podname> bash
-
Run the following
curl
command exactly as specified below to retrieve data quality alert details.curl -k -u $ES_USERNAME:$ES_PASSWORD -X GET "https://$ES_ENDPOINT/prechecktrainingdetails/_search?pretty" -H 'Content-Type: application/json' -d'{"_source":"dataQualityDetails.languageInfo.components"}'
In this command:
ES_USERNAME
is the Elasticsearch username.ES_PASSWORD
is the Elasticsearch password.$ES_ENDPOINT
is the Elasticsearch environment variable in the pod.https://$ES_ENDPOINT/prechecktrainingdetails
is the Elasticsearch endpoint for theprechecktrainingdetails
index, which stores language information associated with the data set sampled for a specific component.
-
In the output that is generated, navigate to the
dataQualityDetails
field, and find the nestedlanguageInfo
field. Then retrieve the names of the component where the language list hasunknown
as one of the values. For example, in the example below you would selectabc
."components" : [ { "name" : "abc", "language" : [ "English", "unknown" ],
-
For each component that you identified in the previous step, delete the logs from all log data indices by running the following command.
curl -k -u $ES_USERNAME:$ES_PASSWORD -X POST "https://$ES_ENDPOINT/*-logtrain/_delete_by_query" -H 'Content-Type: application/json' -d'{"query": {"bool": {"must": [{"term": {"instance_id.keyword": "abc"}}]}}}'
Where
abc
must be replaced with the name of the component.Note If all components are detected as having logs in an unsupported language, then this step will delete all data from the Elasticsearch indices for log anomaly training. You will need to train again when fresh data arrives.
-
Once you have deleted the relevant logs, you must relaunch the training of your natural language log anomaly detection algorithm, as described in Generating AI models.