Configuring name similarity
Configure the name similarity feature by tuning parameters that govern how the system processes multiple resources when performing pattern matching.
About this task
Pattern matching enables Event Analytics to identify types of events that tend to occur together on a specific network resource. The name similarity feature extends pattern matching by enabling it to identify types of events that tend to occur together on more than one resource, where the resources within the pattern have a similar name. For examples of similar resource names that might be discovered by the name similarity feature, see Examples of name similarity.
Depending on how name similarity is configured, pattern matching will see these resource names as similar and will create a single pattern including events from all of these resource names.
Similarity threshold value: Algorithms are used to determine name similarity. First, an edit distance is calculated by a third-party algorithm. The edit distance is the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a transposition of two adjacent characters. Then, the algorithm calculates a normalized similarity distance, which lies in the range 0.0 to 1.0. In this range, 0.0 means that the strings are identical and 1.0 means that the strings are completely different. The normalized similarity distance is calculated by using a contribution of the edit distance weighted according to the first string length, the second string length, and the number of transpositions. Finally, the name similarity algorithm calculates a normalized threshold value (in the range 0.0 to 1.0) by subtracting the normalized similarity distance from the value 1.0. A threshold value of 0.0 means strings can be completely different. A threshold value of 1.0 means that strings must match exactly.
By default name similarity is configured with values which mean that very similarly named
resources are grouped together. By default, resources whose names are 90% similar and that have the
same first character are grouped together. This is controlled by the
name_similarity_default_threshold
and
name_similarity_default_lead_restriction
parameters in Table 1. See there for more
details.