Operations management components and data flow

Operations management in IBM Concert® for Z uses a microservices framework, which is a set of independent architectural components, each with a single purpose, that communicate over a common lightweight API. By using Kubernetes and containers, operations management enhances the scalability and extensibility of its architecture and offers a lightweight method for deploying and managing the components.

Operations management agent and control plane

Two components of operations management are the agent and the control plane, both of which must be deployed in Kubernetes clusters. You must configure these components during the deployment process. Figure 1 illustrates these components.
agent
The agent collects and consolidates data from the integrated products and normalizes the data for further analysis and visualization by the control plane.
control plane
The control plane aggregates the normalized data that is exported from the agent and presents it in the GUI. The control plane detects anomalies, generates event groups and advice, and provides the capability for users to act on events.
The control plane includes the following components:
GUI
A web-based user interface that provides an overview of system health. It enables you to view consolidated events and resolve events based on expert advice that is provided in context.
OpenSearch
The OpenSearch database is used to store, analyze, and query the normalized data that is collected from the agent. The stored data can then be visualized in the GUI.
PostgreSQL
The PostgreSQL database stores resource topology information and user authentication, permissions, and authorization information.
Redis
The Redis in-memory database provides the cache layer for hot data.
Apache Kafka
Operations management uses Apache Kafka to process large volumes of event-related messages. It transfers the messages to the integrated products to take actions, such as querying system information or issuing commands on z/OS® systems.
Figure 1. High-level overview of the operations management agent and control plane components
The illustration shows the components that are described in the text.

Data flow among the operations management components and the integrated products

The following information describes and illustrates the data flow among the operations management components and the products that are integrated with operations management. The step numbers correspond to the numbers that are shown in Figure 2.
  1. Data collection from integrated products: To collect data from multiple domains, tools, or sources, operations management can be integrated with the following products (which are also listed in Table 1). The OpenTelemetry collectors consolidate and normalize events, logs, and metrics from integrated products and associate them with the resource topology.
    • IBM Z® Common Data Provider
    • IBM Z OMEGAMON® Data Provider (and IBM® Tivoli® Enterprise Portal Server)
    • IBM Z Performance and Capacity Analytics
    • IBM Z Workload Scheduler
  2. Data store: Events, logs, metrics, and topology are piped into the data store for storage and querying. The data store is a centralized warehouse for the large volume of normalized data, machine training results, and user authentication information.
  3. Model training: Logs and key performance metrics (also known as key performance indicators, or KPIs) that can indicate problems are identified by Z domain experts so that only the relevant log messages and metrics are monitored. Based on historical logs and metrics, the model training component builds a machine learning model that represents the normal patterns and thresholds for the logs and metrics in your environment. The training results are stored in the data store.
  4. Model inferencing: Near real-time logs and metrics are sent to the model inferencing component. This component applies the trained model to the logs and metrics to calculate a score, which is a measurement that indicates how much a log or metric deviates from the normal behavior in your environment. New events are created for highly anomalous logs or metrics that exceed predefined thresholds. This process maximizes the accuracy of the data and minimizes false positives.
  5. Event correlation and creation of event groups: Events that are detected and created by operations management are sent to the event correlation component. This component combines events from operations management and the products that are integrated with operations management and compresses these events into event groups based on time and topology.
  6. Enablement of GUI presentation of event groups: Event groups are sent to Apache Kafka topics to ultimately be presented in the GUI.
  7. GUI presentation of infrastructure health: Event groups, events, logs, metrics, topology, and thresholds from the model training are available through the API server to be presented in the GUI. The GUI presents a high-level view of the infrastructure health.
  8. Enablement of user actions in GUI to be performed through integrated products: You can initiate some actions in the GUI. These actions are passed to the Apache Kafka topics in message format. Actuators receive the messages and perform actions through the integrated products. The following products are examples of integrated products (which are also listed in Table 1) through which the actuators can perform actions:
    IBM Z NetView
    When IBM Z NetView is integrated with operations management, you can issue MVS commands to z/OS systems through the command console that is part of the operations management GUI.
    IBM watsonx Assistant™ for Z
    When IBM watsonx Assistant for Z is integrated with operations management, you have a chat interface to ask questions and receive answers in context, based on the generative AI and built-in IBM Z expertise.
    ServiceNow
    When ServiceNow is integrated with operations management, you can create ServiceNow tickets from within the operations management GUI.
Figure 2. Data flow among the operations management components and the integrated products
The illustration shows the flow of data among the operations management components and the integrated products, as described in the text.