Managing unsupported languages
If natural language log anomaly training reports that the model needs improvement because of unsupported language issues, you can remove the offending data from the data set by deleting component-specific logs, and then retrain.
About this task
Log anomaly detection only supports the following languages and language combinations:
- English
- English and French
- English and German
- English and Italian
- English and Spanish
- French
- German
- Italian
- Spanish
If your data contains either an unsupported language or an unsupported language combination, then you will get the following data quality alert on completion of training:
Needs improvement We have detected that a portion of this data set is in an unsupported language, and could affect the quality of this model. We recommend that you remove this data from the data set and retrain the model.
To remediate the situation you must delete all the logs from components whose logs are all or partially in an unsupported language.
Procedure
Perform the following steps to remediate the situation.
-
Log in to your cluster.
oc login -u kubeadmin -p <password>For more information, see Logging in to the OpenShift CLI.
-
Run the
oc projectcommand to set the context to the project where Cloud Pak for AIOps is deployed. -
Find the name of the apiserver pod.
oc get pod | grep api-server -
Open a terminal on that pod.
oc exec -it <api-server_podname> bash -
Run the following
curlcommand exactly as specified below to retrieve data quality alert details.curl -k -u $ES_USERNAME:$ES_PASSWORD -X GET "https://$ES_ENDPOINT/prechecktrainingdetails/_search?pretty" -H 'Content-Type: application/json' -d'{"_source":"dataQualityDetails.languageInfo.components"}'In this command:
ES_USERNAMEis the Elasticsearch username.ES_PASSWORDis the Elasticsearch password.$ES_ENDPOINTis the Elasticsearch environment variable in the pod.https://$ES_ENDPOINT/prechecktrainingdetailsis the Elasticsearch endpoint for theprechecktrainingdetailsindex, which stores language information associated with the data set sampled for a specific component.
-
In the output that is generated, navigate to the
dataQualityDetailsfield, and find the nestedlanguageInfofield. Then retrieve the names of the component where the language list hasunknownas one of the values. For example, in the example below you would selectabc."components" : [ { "name" : "abc", "language" : [ "English", "unknown" ], -
For each component that you identified in the previous step, delete the logs from all log data indices by running the following command.
curl -k -u $ES_USERNAME:$ES_PASSWORD -X POST "https://$ES_ENDPOINT/*-logtrain/_delete_by_query" -H 'Content-Type: application/json' -d'{"query": {"bool": {"must": [{"term": {"instance_id.keyword": "abc"}}]}}}'Where
abcmust be replaced with the name of the component.Note If all components are detected as having logs in an unsupported language, then this step will delete all data from the Elasticsearch indices for log anomaly training. You will need to train again when fresh data arrives.
-
Once you have deleted the relevant logs, you must relaunch the training of your natural language log anomaly detection algorithm, as described in Generating AI models.