In this article, we give a brief view into how entity extraction works in Watson AIOps.
As IT systems are increasingly becoming mission critical, companies need to ensure business continuity with uninterrupted access to their systems to manage their IT Operations smoothly. Automating problem detection, problem resolution and preventing issues in the first place are key to running operations smoothly. Existing solutions analyze various data sources like alerts, logs and metrics in silos without providing the broader context of an incident. This makes the site reliability engineer’s (SRE) life more difficult.
In IBM Cloud Pak® for Watson AIOps, we link signals from multiple sources of data to generate a holistic problem context. This linking is done by extracting common ‘signatures’ from each data type. These signatures are also referred to as ‘entities.’ In this article, we give a brief view into how entity extraction works in Watson AIOps.
Entity extraction in Watson AIOps is the process of identifying key elements from various data sources, such as alerts, logs and incidents. For example, Figure 1 shows some important entities that appear in an alert payload. Entities could be server ids, server names, pod ids, error codes, exception messages, etc.
These entities play a key role in bringing together the disparate data sources in AIOps by breaking down the silos across various data sources in the IT operations lifecycle. Common entities provide the links to piece together the puzzle pieces and enable us to create a holistic problem context to efficiently diagnose a problem for SREs:
Watson AIOps leverages entities for the following use cases
Use case 1: Event grouping and fault localization
Event grouping leverages entities as one of the clues to create ‘incident stories’ with a group of relevant events. These entities identified in a given incident story will then be leveraged to locate the faulty components and identify the blast radius. Figure 2 shows an example of entities extracted from an anomaly detected from logs and an independent alert that has arrived from another system via PagerDuty messaging. As you can see, both these alerts are referring to the same ts-travel2-mango service as the source of anomalies. This information helps us group the two alerts together, as both anomalies are referencing the same service ts-travel2-mango service. Therefore, the likelihood that these alerts are referring to the same incident is high:
Use case 2: Action recommendation
In addition to creating a holistic context about the current incident via entity linking, entity extraction also plays an important role in identifying a suitable problem resolution. Reducing the mean time to resolve an ongoing incident is one of the important goals of AIOps.
Extracting insights from the diagnosis and resolution actions of prior relevant incidents can help SREs in deriving a suitable set of next best actions for an ongoing incident. However, given the large amount of information buried in prior incidents, manually processing all of the prior incident ticket data can be a very tedious task for an SRE, even after narrowing down the set of related prior incident tickets to a smaller set.
Entity extraction from an ongoing incident helps in framing a query to find suitable matches from prior incident tickets. Once relevant prior incident tickets are identified, extracting entity-action phrases from those prior incident tickets further helps in retrieving the most relevant problem resolution phrases from what might be a long document filled notes from SREs detailing the full process of incident resolution. This problem resolution phrase extraction saves time for SREs as they don’t have to read the entire text in the retrieved prior incident ticket records to identify what action was taken to fix the problem in the past. Figure 3 shows sample entities extracted from the closing notes of an incident ticket recorded in the ServiceNow ticket management system:
Entity extraction techniques
The nature of IT operations data is different from human written data. IT operation data is a mix of machine-generated and human-generated data . Due to this, we leverage various techniques, ranging from rules- and dictionary-based techniques to advanced natural language processing techniques to extract entities from IT operations data.
Rule-based entity extraction
The rule-based approach leverages the predefined patterns that can be captured easily using regular expressions and dictionaries. These patterns will be defined by developers in advance and then used at runtime to extract entities. Entity types that can be handled via predefined rules include IP Address, Error Codes, Exception, Stack trace, File name, URL and Date/Time.
In addition to these entity types, cloud-native entity types like Pod and Slot information can be captured in combination with Application/Service names and predefined patterns. For example, in Figure 1, the pod name ts-travel2-service-75df4c5cd6-vxngm can be captured with the application name ts-travel2-service, followed by a specific pattern of alphanumeric characters in a single token. These dictionaries are automatically populated with topological information extracted from the corresponding environments.
While regular expressions can be used to write these rules, at IBM, researchers have developed a more efficient regular-expression execution engine that can scale for Big Data. This work was done under the System-T (Sytem-Text) project  at IBM Research. System-T specifies an Annotation Query Language (AQL) and prescribes a specific runtime to efficiently run rules that are written in AQL. This has been codified in IBM Watson’s Natural Language Processing service as a library under the name oneNLP. Our rule-based entity extraction leverages IBM Watson’s oneNLP platform to execute these rules. As mentioned, it provides an efficient rule-based engine to execute the regular expression-based rules in a scalable way and a rule language to write these rules with a language similar to SQL.
Advanced NLP-based entity extraction
While rule-based approaches can tackle entities with predefined patterns, NLP-based techniques can help extract insights from unstructured data, such as incident/closing notes, resolutions and slack conversations written by SREs.
As shown in Figure 3, entity extraction extracts the action and the component(s) the action is being performed on. Extracting these insights requires a deep understanding of the input text.
For this, entity extraction leverages the expanded semantic shallow parsing. Shallow parsing analyzes a sentence by first identifying part of speech tags (nouns, verbs) of a sentence and linking them to higher order units (noun phrases). Entity extraction leverages the Watson NLP expanded shallow semantic parsing (ESSP)  for this purpose. Given a sentence, ESSP identifies the Agent, Verb and Theme of a sentence and their interactions. In our problem context, Verb represents an action, Agent represents who performed the action and Theme represents the component action being performed on. Figure 4 shows these components on a resolution text:
Entity extraction uses this output from ESSP and further processes it to identify domain-specific action-components and their linkages using domain-specific component phrase extraction and action word dictionary generation.
Domain-specific component phrase extraction
This step extracts key phrases from the documents by analyzing NLP features, such as part of speech tags. It then leverages various linguistic and statistical features like document-level relevance metrics to find relevant phrases and general domain phrases extracted from various knowledge sources to filter out generic phrases .
Action-word dictionary generation
Entity extraction defines an action as the process of changing something that results in a state change (e.g., restart and increase). In order to capture IT operations domain-specific actions, we generate and curate the dictionaries relevant for the domain using domain-specific corpus.
Visit the IBM Cloud Pak for Watson AIOps website to learn more.
Read the blog: “Leveraging Log Data for Incident Management in AIOps”
 Rama Akkiraju, Ruchir Puri. “Implications of training machine learning models from machine-generated data and human-authored data.”
 Krishnamurthy, Rajasekar & Li, Yunyao & Raghavan, Sriram & Reiss, Frederick & Vaithyanathan, Shivakumar & Zhu, Huaiyu. (2008). “SystemT: A System for Declarative Information Extraction”. SIGMOD Record. 37. 7-13. 10.1145/1519103.1519105.
 Zhu, Huaiyu, Yunyao Li, and Laura Chiticariu. “Towards universal semantic representation.” Proceedings of the First International Workshop on Designing Meaning Representations. 2019.
 Prateeti Mohapatra, Yu Deng, Abhirut Gupta, Gargi Dasgupta, Amit Paradkar, Ruchi Mahindru, Daniela Rosu, Shu Tao, and Pooja Aggarwal. 2018. Domain Knowledge Driven Key Term Extraction for IT Services. In International Conference on Service-Oriented Computing. Springer, 489–504.