Two events may or may not have a similar description but if the underlying logs are similar, then they are most likely related to each other — this is the key hypothesis of using logs for finding similar events.

Each application consists of several microservices, and some of these services are related to other services, forming a graph. If one service fails, then any other service which is upstream or downstream of the failed service could throw error log lines. It is important to identify error log lines corresponding to each failed microservice and collate them together to form a log signature for a particular event. We obtain log lines corresponding to each event from the time window of +- 5 minutes from outage start time (i.e., 10 minutes of log data). Each log line from the set of log lines is input to a pretrained error classifier; the output of the classifier is a 0 (error) or 1 (non-erroneous). The error classifier allows us to separate log lines pertaining to a healthy state of the system and the corresponding microservice from the non-erroneous log lines.

In order to use error log lines for event similarity, each log line is processed and templatized, and then they are collated to form a log-signature for each event. The objective of templatization is to normalize log lines to a common id, called as template-id. As a result, for a given event, there is a set of templates-ids and corresponding application-ids . We propose a log-signature representation for each event from its template-ids and corresponding application-ids, and use that for event similarity.

The example below shows a log signature for an event. There are three log template ids: template_id_a , template_id_b and template_id_c . Two log template ids ( template_id_a and template_id_b ) belong to application_id_a , and one log template id ( template_id_c ) belongs to application_id_b . This representation is called as log signature of an event:

{

"templates": [{

"application_id": "application_id_a",

"template": "template_id_a"

}, {

"application_id": "application_id_a",

"template": "template_id_b"

}, {

"application_id": "application_id_b",

"template": "template_id_c"

}]

}

Once we have a log signature for each event, the similarity is calculated between two events by computing the overlap between their application ids. For each application id that overlaps, it computes the overlap between their respective templates ids to calculate a score called as log template similarity score.