Examples of statistical baseline log anomaly detection

This topic provides examples of log anomalies generated by the statistical baseline log anomaly detection algorithm and when they are likely to be found.

Examples presented in this topic

The following examples are described in this topic.

Example: unexpected 500 error

You recently installed the sock-shop application, which is made up of multiple microservices, such as front-end and orders. Most of the microservices that make up this application generate their own log files.

Note For the statistical baseline log anomaly detection algorithm to work, there is no formal minimum log line count. The main requirement is that all microservices generate a reasonable amount of log data.

The example proceeds as follows:

  1. Initially the microservices are all running OK and generate clean log files, without any text entities that might indicate an error. See Clean log file lines for an example of what a log like this might look like.

  2. Shortly after this, some of the pods associated with the microservices start shutting down unexpectedly; as a result of this, HTTP 500 errors appear for the first time in the live log stream.

  3. Live log data is analyzed in 10 second slots. Now, when the latest logs are compared to the baseline reference values, several error entities are identified: 500, error, and illegIllegalStateException. The algorithm determines that the number 500 is an error, as it is accompanied by the entity status, which points to an HTTP 500 error, a general HTTP status code indicating that something has gone wrong on a server but the server could not be more specific about what the exact problem is. These entities have not been identified before during normal behavior and since they are statistically significant (trivially so in this example) thus point to an anomaly. See Error log file lines for an example of what a log like this might look like.

    Note If this was an error message that occurs on a regular basis, say an optional library that is not used in a given piece of code, then the statistical baseline log anomaly detection algorithm will automatically learn this, and consequently treat this error message as normal and therefore no longer alert on it.

  4. Log anomaly alerts are generated for each log anomaly and sent to the in-cluster ObjectServer. See Log anomaly alert format for an example of what a log anomaly alert might look like.

Reference information

The following information is referenced in this task.

Clean log file lines

These example JSON log snippets display clean log lines for the front-end and orders microservices within the sock-shop application. The log algorithm extracts entities from the _line keyword within these log lines and then performs the analysis described in Workflow of statistical baseline log anomaly detection on these extracted entities.

Clean log file lines from the front-end microservice

The following log files lines do contain numbers but they are not accompanied by error entities. The numbers in these log message are actually part of user address data (for example 322 Main Street) and credit card data, and can be seen in the _line parameter in the code snippet below. You might need to scroll to the right to see the full content of the _line parameter.

[
  {
    "_app":"front-end",
    "_file":"/var/log/containers/front-end-85c54f4fb-jvzqn_sock-shop_front-end-e19f0a886c8a03aa63e8138def1df8f33a60981ec41cdbc2aca7c120d16e7a42.log",
    "_host":"front-end-85c54f4fb-jvzqn",
    "_ip":"172.30.148.208",
    "_ipremote":"52.117.50.10",
    "_lid":"1264263893737455616",
    "_line":"Posting Address: {\"number\":\"\",\"street\":\"322 Main Street\",\"city\":\"Gotham City\",\"postcode\":\"12345\",\"country\":\"United States\",\"userID\":\"1q2w3e4r5t6y7u8i9o0p\"}",
    "_logtype":"json",
    "_mac":"00:00:00:00:94:60",
    "_tag":["k8s"],"_ts":1601424000273,
    "city":"Gotham City",
    "container":"front-end",
    "containerid":"e19f0a886c8a03aa63e8138def1df8f33a60981ec41cdbc2aca7c120d16e7a42",
    "country":"United States",
    "namespace":"sock-shop",
    "node":"kube-bqvhtmdd031d903ud4g0-simulationd-default-00000105",
    "number":"",
    "pod":"front-end-85c54f4fb-jvzqn",
    "postcode":"12345",
    "street":"322 Main Street",
    "userID":"0p9o8i7u6y5t4r3e2w1q"
  },
  {
    "_app":"front-end",
    "_file":"/var/log/containers/front-end-85c54f4fb-jvzqn_sock-shop_front-end-e19f0a886c8a03aa63e8138def1df8f33a60981ec41cdbc2aca7c120d16e7a42.log",
    "_host":"front-end-85c54f4fb-jvzqn",
    "_ip":"172.30.148.208",
    "_ipremote":"52.117.50.10",
    "_lid":"1264263893737455617",
    "_line":"\u001B[0mPOST /addresses \u001B[32m200 \u001B[0m9.663 ms - -\u001B[0m",
    "_logtype":"customapp",
    "_mac":"00:00:00:00:94:60",
    "_tag":["k8s"],
    "_ts":1601424000282,
    "container":"front-end",
    "containerid":"e19f0a886c8a03aa63e8138def1df8f33a60981ec41cdbc2aca7c120d16e7a42",
    "namespace":"sock-shop",
    "node":"kube-bqvhtmdd031d903ud4g0-simulationd-default-00000105",
    "pod":"front-end-85c54f4fb-jvzqn"
  }
]
Clean log file lines from the orders microservice

The following log files lines do contain numbers but they are not accompanied by error entities. The numbers in these log message are actually part of user address data (for example 322 Main Street) and credit card data, and can be seen in the _line parameter in the code snippet below. You might need to scroll to the right to see the full content of the _line parameter.

[
    {"_app":"orders",
    "_file":"/var/log/containers/orders-b8cb69f5c-gn87g_sock-shop_orders-9d699fcc72f3283bf66ecb0bb850251e73dd5a7f5a0026b4fcb6facc6c533baf.log",
    "_host":"orders-b8cb69f5c-gn87g",
    "_ip":"172.30.148.208",
    "_ipremote":"52.117.50.10",
    "_lid":"1264263925219901440",
    "_line":"2020-09-30 00:00:07.959  INFO [orders,4ecc59a2bbf00a18,276a9b5feb58dcd3,true] 7 --- [p-nio-80-exec-8] w.w.s.o.controllers.OrdersController     : Sending payment request: PaymentRequest{address=Address{id=null, number='', street='322 Main Street', city='Gotahm City', country='United States', postcode='12345'}, card=Card{id=null, longNum='1234567812345678', expires='12/56', ccv='321'}, customer=Customer{id=null, firstName='John', lastName='Doe', username='John_Doe', addresses=[], cards=[]}}",
    "_logtype":"json",
    "_mac":"00:00:00:00:94:60",
    "_tag":["k8s"],
    "_ts":1601424007959,
    "addresses":"[],",
    "cards":"[]}}",
    "container":"orders",
    "containerid":"9d699fcc72f3283bf66ecb0bb850251e73dd5a7f5a0026b4fcb6facc6c533baf",
    "firstName":"'John',",
    "lastName":"'Doe',",
    "level":"INFO",
    "message":"[orders,4ecc59a2bbf00a18,276a9b5feb58dcd3,true] 7 --- [p-nio-80-exec-8] w.w.s.o.controllers.OrdersController     : Sending payment request: PaymentRequest{address=Address{id=null, number='', street='322 Main Street', city='Gotham City', country='United States', postcode='12345'}, card=Card{id=null, longNum='1234567812345678', expires='12/56', ccv='321'}, customer=Customer{id=null, firstName='John', lastName='Doe', username='John_Doe', addresses=[], cards=[]}}",
    "namespace":"sock-shop",
    "node":"kube-bqvhtmdd031d903ud4g0-simulationd-default-00000105",
    "pod":"orders-b8cb69f5c-gn87g",
    "username":"'John_Doe',"}
]

Error log file lines

The following example JSON log snippet displays error log lines for the front-end microservice within the sock-shop application. The log algorithm extracts entities from the _line keyword within these log lines and then performs the analysis described in Workflow of statistical baseline log anomaly detection on these extracted entities.

Error log file line from the front-end microservice

The following log line contains numbers accompanied by error entities; in particular, the _line parameter contains the number 500, as well as the error entities status, exception, and IllegalStateException. You might need to scroll to the right to see the full content of the _line parameter.

[
    {"_account": "6a8e297e13", 
    "_cluster": "ld70", 
    "_host": "front-end-85c54f4fb-jvzqn", 
    "_ingester": "mezmo-agent/1.6.2 (Ubuntu/16.04) k8s", 
    "_logtype": "json", 
    "_mac": "00:00:00:00:94:60", 
    "_tag": ["k8s"], 
    "_file": "/var/log/containers/front-end-85c54f4fb-jvzqn_sock-shop_front-end-e19f0a886c8a03aa63e8138def1df8f33a60981ec41cdbc2aca7c120d16e7a42.log", 
    "_line": "Order response: {\"timestamp\":1599236098757,\"status\":500,\"error\":\"Internal Server Error\",\"exception\":\"java.lang.IllegalStateException\",\"message\":\"Unable to create order due to unspecified IO error.\",\"path\":\"/orders\"}", 
    "_ts": 1599236098759, 
    "_app": "front-end", 
    "pod": "front-end-85c54f4fb-jvzqn", 
    "namespace": "sock-shop", 
    "container": "front-end", 
    "containerid": "e19f0a886c8a03aa63e8138def1df8f33a60981ec41cdbc2aca7c120d16e7a42", 
    "node": "kube-bqvhtmdd031d903ud4g0-simulationd-default-00000105", 
    "_ip": "172.30.148.208", 
    "_ipremote": "52.117.50.10", 
    "__key": "logline:6a8e297e13:front-end:front-end-85c54f4fb-jvzqn:ld70", 
    "_bid": "e4975ea4-6634-4f88-accc-8eba410d2617:61301:ld70", 
    "level": null, 
    "timestamp": 1599236098757, 
    "error": "Internal Server Error", 
    "exception": "java.lang.IllegalStateException", 
    "message": "Unable to create order due to unspecified IO error.", 
    "_status": 500, 
    "_path": "/orders", 
    "_id": "1255087168836907020"}
]

Log anomaly alert example

This example alert incorporates data from the identified log anomalies into an ObjectServer alert structure.

  • This alert has been assigned a severity of 5.
  • The summary parameter explains that 24 errors have been found in the front-end component.
  • The details.log_anomaly_detector.text_dict.template_list parameter lists the error entities identified.
[
    {"id": "d708346f-046a-35ae-b0ab-1a9ad8f1bafa", 
    "occurrenceTime": "2020-09-04T16:15:40.000Z", 
    "sender": {"name": "Log Anomaly"}, 
    "event_link": "https://api.us-south.logging.cloud.ibm.com/v1/export?to=1599236150000&from=1599236140000", 
    "summary": "24 errors found in component: front-end", 
    "description": "500: 8\njava.lang.IllegalStateException: 8\nerror_log: 8", 
    "severity": 5, 
    "type": 
        {"eventType": "problem", 
        "classification": "log anomaly", 
        "condition": "c9680808-14bb-11ec-b72f-acde48001122"}, 
    "resource": 
        {"name": "front-end", 
        "type": "application", 
        "application": "front-end", 
        "entity": "front-end"}, 
    "expirySeconds": 900, 
    "details": 
        {"log_anomaly_detector": 
            {"start_timestamp": 1599236140000, 
            "end_timestamp": 1599236150000, 
            "original_group_id": "c9680808-14bb-11ec-b72f-acde48001122", 
            "causality": 
                {"service": ["front-end"]}, 
            "detected_at": 1632338288103.0, 
            "source_application_id": "front-end", 
            "log_anomaly_confidence": 1.0, 
            "log_anomaly_model": ["oob_entities"], 
            "prediction_error": 
                {"oob_entities_error": 24}, 
            "error_templates": [0, 1, 2], 
            "template_list": ["500", "java.lang.IllegalStateException", "error_log"], 
            "count_vector": [8, 8, 8], 
            "text_dict": 
                {"template_ids": ["oob_entities_500", "oob_entities_java.lang.IllegalStateException", "oob_entities_error_log"], 
                "template_list": ["500", "java.lang.IllegalStateException", "error_log"], 
                "count_vector": [8, 8, 8], 
                "expected_count_vector": [0.29, 0.29, 0.48], 
                "expected_count_lower_bound": [0, 0, 0], 
                "expected_count_upper_bound": [3.0, 3.0, 3.21]
                }, 
                "application_group_id": "ss", 
                "application_id": "1", 
                "model_version": "5", 
                "severity_from_model": 0, 
                "description": "500: 8\njava.lang.IllegalStateException: 8\nerror_log: 8"
            }
        }, 
    "timestamp": 1632338288103.0}
  ]

Example: WebSphere log anomaly

The following example shows a WebSphere log. In log anomalies for a specific domain like WebSphere, entities are included in the logs. WebSphere log anomaly detection identifies which entities are error indicators. Message ID, ibm_messageId, and log levels, loglevel, are examples of WebSphere entities.

{
  "type": "was_message",
  "host": "abc-123.myserver.com",
  "ibm_cellName": "gsV900Cell",
  "ibm_nodeName": "node4",
  "ibm_serverName": "nodeagent",
  "ibm_sequence": "1616166373781_0000000000002",
  "message": "TRAS0017I: The startup trace state is *=info.",
  "ibm_datetime": "2021-03-19T11:06:13.781-0400",
  "ibm_messageId": "TRAS0017I",
  "ibm_threadId": "00000001",
  "module": "com.ibm.ejs.ras.ManagerAdmin",
  "loglevel": "INFO",
  "unix_ts": 1616166373781,
  "start": 1616166370000,
  "end": 1616166380000
}

On the Incidents and alerts page, the following example from the Summary parameter explains that two logs were found in the qotd-qrcode container. This parameter also says that a primary message code of SSLC0008E is contributing to the error. This alert is further explained by the Alert details panel. For more information, see Domain-specific log anomaly alert details.

Log Anomaly found 2 logs containing errors for container(s): qotd-qrcode with a primary message code of SSLC0008E

Example: IBM MQ log anomaly

The following example shows an IBM MQ log. In log anomalies for a specific domain like IBM MQ, entities are included in the logs. IBM MQ log anomaly detection identifies which entities are error indicators. Message ID, ibm_messageId, and log levels, loglevel, are examples of IBM MQ entities.

{
  "ibm_messageId": "AMQ6287I",
  "ibm_arithInsert1": 0,
  "ibm_arithInsert2": 0,
  "ibm_commentInsert1": "Linux 4.18.0-193.70.1.el8_2.x86_64 (Cloud Pak Linux (x86-64 platform) 64-bit)",
  "ibm_commentInsert2": "/opt/mqm (Installation1)",
  "ibm_commentInsert3": "9.3.0.0 (p930-L220526)",
  "ibm_datetime": "2022-06-01T01:13:49.673Z",
  "ibm_serverName": "QMA",
  "type": "mq_log",
  "host": "slot11-qma-ibm-mq-0",
  "loglevel": "INFO",
  "module": "amqxeida.c:6755",
  "ibm_sequence": "1654046029_673594295",
  "ibm_processId": "37",
  "ibm_threadId": "3",
  "ibm_version": "9.3.0.0",
  "ibm_processName": "amqzmuc0",
  "ibm_userName": "mqm",
  "ibm_installationName": "Installation1",
  "ibm_installationDir": "/opt/mqm",
  "message": "AMQ6287I: IBM MQ V9.3.0.0 (p930-L220526)."
}

On the Incidents and alerts page, the following example from the Summary parameter explains that two logs were found in the qotd-qrcode container. This parameter also says that a primary message code of SSLC0008E is contributing to the error. This alert is further explained by the Alert details panel. For more information, see Domain-specific log anomaly alert details.

Log Anomaly found 2 logs containing errors for container(s): qotd-qrcode with a primary message code of SSLC0008E