December 8, 2020 By Xiaotong Liu
Anbang Xu
Rama Akkiraju
5 min read

As IT complexity grows and the use of AI technologies expands, enterprises are looking to bring in the power of AI to transform how they develop, deploy and operate their IT.

Our past work on Sentiment Analysis and Entity Recognition has shown that artificial intelligence (AI) models customized with cross-lingual data on top of Language Models outperform those that are trained on general-purpose data alone. We were curious to see if we could replicate similar results while solving problems like anomaly predictions in the IT Operations Management domain. So, we conducted experiments to test this hypothesis. In this article, we share our experimental results in which we note that the anomaly prediction models built with advanced Language Models that are trained with IT data as features outperform the ones built with general-purpose data.  

Introduction

Language Models are critical components in Natural Language Processing (NLP). They can learn to predict the probability of a sequence of words. A 1-gram language model predicts the probably of a single missing word in a sentence. For example, in the sentence “Ana _ to get a book to read,” an English-language-trained Language Model might predict the word ‘went’ to fill in the dash with a probability of 99%.

A 2-gram language model predicts the probability of a sequence of two missing words at a time. For example, in the sentence “Ana _ _ get a book to read,” a trained Language Model might predict the word sequences ‘went to’ or ‘had gone’ — each with a probability of 95%. This can be extrapolated to n-grams.

In order to perform this task, internally, in Language Models, words are converted to real number vector representations because it is easier for mathematical models to operate on numbers. These are called Word Embeddings or Word Vectors. These Word Embeddings are widely used in NLP tasks.

To create Word Embeddings, words or phrases from the vocabulary of a language are mapped to vectors of real numbers, and each word or phrase is associated with a feature vector of a fixed dimension. Typically, Embeddings are pre-trained on large text corpora such as Wikipedia, Twitter tweets, news articles, etc., and are tested on Language Modeling tasks, which assign a probability distribution over sequences of words.

An IT operations environment generates many kinds of data. These include metrics, alerts, events, logs, tickets, application and infrastructure topology, deployment configurations, and chat conversations, among others. Our goal in this experiment is to pre-train Language Models with IT domain vocabulary that occurs in logs, tickets, metrics, alerts, events, and chats — for example, errors, exceptions, messages, service names, server names, pods, container ids, node ids, incidents, tickets, root cause, causal factor and topology, etc. Word Embeddings derived from such IT domain-specific Language Models could serve as richer features for the machine-learning-based AI models in our system.

Applying Language Models to Log Anomaly Prediction in IBM Watson AIOps

In IBM Watson AIOps, there are many AI pipelines for processing different types of data and generating insights from them. For example, application and infrastructure logs and metrics are parsed and processed to predict anomalies early in the process. These are handled by Log Anomaly and Metric Anomaly Prediction models, respectively.

Anomalies that are raised and other events and alerts that may be generated via rules are then grouped into their corresponding incident buckets by leveraging various techniques, including entity linking and spatial, temporal, and topological algorithms to reduce event noise. This is done by Event Grouping AI models. Faults are diagnosed and localized by Fault Localization AI models. The set of impacted components are noted by Blast Radius AI models. Similar incidents from the past incident records are identified and next-best-actions are derived by Incident Similarity AI models.

Each one is an AI model that employs different algorithms. Some are deep-learning algorithms, and some are unsupervised machine-learning algorithms. The features used in all these models could benefit from a deeper understanding of IT domain. Figure 1 shows our approach to using language models for different IT operations management prediction tasks:

Figure 1: An illustration of language models for different IT Operations management prediction tasks.

Anomaly detection from logs is one fundamental IT Operations management task that aims to detect anomalous system behaviors and find signals that can provide clues to the reasons of a system’s failure. In our experiment, we tested whether anomaly detection models built with features derived from Word Embeddings from the Language Models trained on IT data outperform the ones that are built with the general-purpose technologies.  

To pre-train language models in the IT Operations domain, we first process the input IT data into a normalized format using pre-defined rules — extracting the most informative texts, such as log messages, ticket descriptions, and so on. We also remove duplicates of texts, which may be auto-generated multiple times by the system for the same event. Next, we randomly sample data from each data source and use the data samples to learn the vocabulary of the IT Operations domain. After that, we pre-train the Language Model using the sampled data and tune the parameters based on model evaluation. An overview of the pre-training pipeline is shown in Figure 2:

Figure 2: The pipeline of pre-training language models using IT Operations domain data.

We trained a number of anomaly detection models using different pre-trained features. In Table 1, we report the accuracy results of anomaly prediction on two benchmark datasets for two models — one is a machine-learning model trained with fastText Word Embeddings that are trained on general purpose data (e.g., Wikipedia, news articles, etc). The other one is a machine-learning model built using embeddings trained with diverse IT Operations domain data as features. Our experimental results indicate that the fastText model customized with IT domain logs outperforms the AI model built using Language Models with domain-independent, general-purpose data on both the datasets:

Conclusion

As IT complexity grows and the use of AI technologies expands, enterprises are looking to bring in the power of AI to transform how they develop, deploy and operate their IT. IBM Watson AIOps adopts a new approach to leverage advanced Language Models for IT Operations tasks, such as log anomaly prediction. With the power of Watson AIOps, we can accelerate the development of text-based AI models for optimizing IT Operations management tasks at a large scale.

Was this article helpful?
YesNo

More from Cloud

Bigger isn’t always better: How hybrid AI pattern enables smaller language models

5 min read - As large language models (LLMs) have entered the common vernacular, people have discovered how to use apps that access them. Modern AI tools can generate, create, summarize, translate, classify and even converse. Tools in the generative AI domain allow us to generate responses to prompts after learning from existing artifacts. One area that has not seen much innovation is at the far edge and on constrained devices. We see some versions of AI apps running locally on mobile devices with…

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

The advantages and disadvantages of private cloud 

6 min read - The popularity of private cloud is growing, primarily driven by the need for greater data security. Across industries like education, retail and government, organizations are choosing private cloud settings to conduct business use cases involving workloads with sensitive information and to comply with data privacy and compliance needs. In a report from Technavio (link resides outside ibm.com), the private cloud services market size is estimated to grow at a CAGR of 26.71% between 2023 and 2028, and it is forecast to increase by…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters