Hardware requirements

Learn about the hardware requirements for a deployment of IBM Cloud Pak® for AIOps on Red Hat® OpenShift® Container Platform. Your hardware must be able to support Red Hat OpenShift, IBM Cloud Pak for AIOps, and your chosen storage solution.

Before you begin

  • You cannot change your selected deployment size after installation.
  • Multi-zone high availability disaster resolution (HADR) is available as a nonproduction technology preview.. For more information, see Installing IBM Cloud Pak for AIOps on a multi-zone architecture (multi-zone HADR).
  • vCPU is defined as when one x86 CPU splits each of its physical cores into virtual cores (vCPU). It is assumed that one x86 CPU's physical core can be split into two logical vCPUs.
  • If Red Hat OpenShift is installed on VMware virtual machines, set the value of the sched.cpu.latencySensitivity parameter to high.
  • If you are planning to install IBM Cognos® Analytics after you have installed IBM Cloud Pak for AIOps, then you will require extra resources. For more information, see Before you begin.
  • Persistent storage is also required, for more information, see Storage requirements.

You can deploy a starter, production, or custom sized deployment of IBM Cloud Pak for AIOps. The processing abilities and hardware requirements vary for each deployment size. Review the information in the following sections:

Hardware requirements - Red Hat OpenShift

A Red Hat OpenShift cluster has master nodes and worker nodes. Tables 1 and 2 show the minimum node requirements for Red Hat OpenShift to withstand the demands of starter and production deployments of IBM Cloud Pak for AIOps.

IBM Cloud Pak for AIOps requires your Red Hat OpenShift cluster to have at least three master nodes. IBM Cloud Pak for AIOps has many Kubernetes operators that interface with the API server and etcd storage, and so master nodes must be adequately sized.

The number of worker nodes that are required varies, and depends on the storage and processing requirements of your IBM Cloud Pak for AIOps deployment. The size of each of your worker nodes can vary, but the combined resources of your worker nodes must comply with the resource totals in Tables 3 and 4. Each worker node must meet the minimum requirements in Table 2 to accommodate the placement of larger IBM Cloud Pak for AIOps pods.

A higher recommended vCPU is also given in Table 2. When you are selecting the size of your worker nodes, you must balance cost considerations against the benefits of improved resiliency and the ease of scheduling workloads. Over-allocating resources can improve resiliency by enabling sufficient resources to be available if a worker node fails, and having well-sized worker nodes makes the placement of workloads easier. If your largest worker node becomes unavailable, resiliency is improved if your smallest worker node is able to handle the largest worker node's workloads. The degree of over-allocation of resources is correlated with the extent of failure scenarios that can be accommodated.

Table 1. Master node minimum requirements
Resource Starter Production
Master node count 3 3
vCPU per node 4 4
Memory per node (GB) 16 16
Disk per node (GB) 120 120
Table 2. Worker node minimum requirements
Resource Starter Production
Minimum vCPU per node 8 16
Recommended vCPU per node 16 16
Memory per node (GB) 12 20
Disk per node (GB) 120 120

Note: The numbers that are specified are minimums. For high production workloads, you might need a larger disk per node (GB) for your worker loads.


Hardware requirements - IBM Cloud Pak for AIOps

Table 3 and table 4 show the hardware requirements for starter and production deployments of IBM Cloud Pak for AIOps, for a base deployment and an extended deployment. For more information about the differences between a base deployment and an extended deployment, see Incremental adoption.

The resource requirements are given for the following scenarios:

  • Table 3: IBM Cloud Pak for AIOps only - you are deploying IBM Cloud Pak for AIOps on an existing Red Hat OpenShift cluster.
  • Table 4: IBM Cloud Pak for AIOps and Red Hat OpenShift - you are creating a new Red Hat OpenShift cluster and deploying IBM Cloud Pak for AIOps onto it.
Table 3. Minimum requirements for starter and production deployments, without Red Hat OpenShift
Resource Base deployment
IBM Cloud Pak for AIOps only
(Starter)
Extended deployment
IBM Cloud Pak for AIOps only
(Starter)
Base deployment
IBM Cloud Pak for AIOps only
(Production)
Extended deployment
IBM Cloud Pak for AIOps only
(Production)
Master node count
Minimum worker node count 3 3 6 6
Total vCPU 47 55 130 156
Total memory (GB) 123 136 310 368
Table 4. Minimum requirements for starter and production deployments, with Red Hat OpenShift
Resource Base deployment
IBM Cloud Pak for AIOps and Red Hat OpenShift
(Starter)
Extended deployment
IBM Cloud Pak for AIOps and Red Hat OpenShift
(Starter)
Base deployment
IBM Cloud Pak for AIOps and Red Hat OpenShift
(Production)
Extended deployment
IBM Cloud Pak for AIOps and Red Hat OpenShift
(Production)
Master node count 3 3 3 3
Minimum worker node count 3 3 6 6
Total vCPU 59 67 142 168
Total memory (GB) 171 184 358 416

The Red Hat OpenShift master and worker nodes must meet the minimum size requirements in Hardware requirements - Red Hat OpenShift only.

Important: These values do not include CPU and memory resources for hosting a storage provider, such as Red Hat® OpenShift® Data Foundation or Portworx. Storage providers can require more nodes, more resources, or both, to run. The extra resources that are needed can vary based on your selected storage provider. A general recommendation is for one extra worker node for a starter deployment, and for three extra worker nodes for a production deployment. Consult your storage provider's documentation for exact requirements.

In addition to the default starter and production size deployments, you can choose to deploy a custom-sized deployment of IBM Cloud Pak for AIOps. For more information about customizing the size of your IBM Cloud Pak for AIOps deployment, see Custom sizing.

Requests and limits

In Kubernetes, requests define the resources that must be provided, and limits define the upper boundary of extra resources that can be allocated. Resource quotas can optionally be used to dictate the allowed totals for all requests and limits in a namespace. If a resource quota is set, then it must be at an adequate level to ensure that workload placement is not inhibited. If you do not deploy resource quotas, then skip this section.

The following table shows the requests and limits that the IBM Cloud Pak for AIOps installation namespace require for a typical starter or production deployment.

Table 5. Requests and limits requirements for the IBM Cloud Pak for AIOps installation namespace
Resource name IBM Cloud Pak for AIOps installation namespace
(Starter)
IBM Cloud Pak for AIOps installation namespace
(Production)
CPU Request 75 227
Memory Request (Gi) 195 466
Ephemeral Storage Request (Gi) 105 200
CPU Limit 210 436
Memory Limit (Gi) 260 533
Ephemeral Storage Limit (Gi) 340 600

Important: Deployment of additional services impacts these values. Review these values regularly to ensure that they still meet your requirements. Extra resources are required for upgrade, for more information see Ensure cluster readiness.

Use the Red Hat OpenShift command line to create or edit a ResourceQuota object with these values. For more information, see Quotas in the Red Hat OpenShift documentation.

The following example YAML file show the YAML to create ResourceQuota objects for the IBM Cloud Pak for AIOps installation namespace in a production deployment of IBM Cloud Pak for AIOps:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: aiops-large
  namespace: cp4aiops
spec:
  hard:
    requests.cpu: "227"
    requests.memory: 466Gi
    requests.ephemeral-storage: 200Gi
    limits.cpu: "436"
    limits.memory: 533Gi
    limits.ephemeral-storage: 600Gi
    pods: '275'

Processing abilities

Expand the following sections to find out about the processing abilities of starter and production sized deployments of IBM Cloud Pak for AIOps. Higher rates are supported by custom sizes. For more information about customizing the size of your IBM Cloud Pak for AIOps deployment according to your processing and footprint requirements, see Custom sizing.

Supported resource number and throughput rates for starter and production deployments

Expand this section for details.

The following table details the number of records, log messages, events, Key Performance Indicators (KPIs), and resources that can be processed by IBM Cloud Pak for AIOps for each of the deployment sizes. This includes resource and throughput values for the AI algorithms.

Table 6. Processing abilities for starter and production deployments
Component Resource Starter Production
Change risk Incidents and change request records per second 30 30
Metric anomaly detection Maximum throughput - (KPIs)
for all metric integrations
30,000 120,000
Log anomaly detection (non-kafka integration) Maximum throughput (log messages per second)
for non-kafka log integrations
1000 8000
Log anomaly detection (kafka integration) Maximum throughput (log messages per second)
for kafka log integrations
1000 25,000
Events (through Netcool integration) Steady state event rate throughput per second

Burst rate event throughput per second
20

100
150

250
Automation runbooks Fully automated runbooks run per second 1 2
Topology management Maximum number of topology resources 200,000 5,000,000
UI users Active users supported 5 20
Standing alert count Number of alerts stored in the system at any period of time 20,000 200,000

Notes:

  • Event rates in the preceding table assume a deduplication rate of 10 to 1 (10% unique events). For example, a rate of 100 alerts per second sent to IBM Cloud Pak for AIOps can be the end result of an initial 1,000 alerts per second before deduplication and other filtering is applied.
  • For metric anomaly detection, the number of key performance indicators (KPIs) that can be processed for each deployment size is shown for an aggregation period of 5 minutes and a training period of 2 weeks. IBM Cloud Pak for AIOps can process 120,000 metrics in a 5-minute interval.
  • If you are using additional integrations for metric anomaly detection and log anomaly detection with IBM Cloud Pak for AIOps, you can use default available policies to further refine the volume of data routed for issue resolution lifecycle actions by your users. You can also create custom policies tailored for your environment. For instance, you can use custom suppression policies to help determine which anomalies should be raised as alerts for user action. For more information about custom policies, see Suppress alerts Opens in a new tab.
  • The events (through Netcool integration) throughput rates represents a refined volume of alerts that corresponds to a worst case scenario where the ratio of IBM Tivoli Netcool/OMNIbus events to IBM Cloud Pak for AIOps alerts has no deduplication, and is essentially a 1:1 mapping of events to alerts. However, in most production deployments, the correlation and deduplication on the IBM Tivoli Netcool/OMNIbus server side reduces the volumes of alert data that requires processing within IBM Cloud Pak for AIOps. As part of further optimizing the workload of data presented to IBM Cloud Pak for AIOps, additional IBM Tivoli Netcool/OMNIbus probe rules can filter out events of no interest to IBM Cloud Pak for AIOps. For instance, typical IBM Tivoli Netcool/OMNIbus maintenance events are filtered out as they are not relevant on the IBM Cloud Pak for AIOps side.
  • The number of alerts for your system varies based on alerts being cleared or expiring. In addition alerts include a variety of event types so you might not always see the same alerts when you view the Alert Viewer in the UI.

Important:


Event, alert, and incident rates

Expand this section for details.

IBM Cloud Pak for AIOps include robust capabilities for managing events from your various applications, services, and devices. If you are integrating IBM Cloud Pak for AIOps with IBM Tivoli Netcool/OMNIbus the benefits that you can leverage for event management are significantly increased. This integration can give you end-to-end alert processing with an on-premises IBM Tivoli Netcool/OMNIbus server so that you can complete part of the event and incident management lifecycle on the IBM Tivoli Netcool/OMNIbus server before events are processed and delivered for action in IBM Cloud Pak for AIOps.

By default, IBM Tivoli Netcool/OMNIbus policies and triggers, such as correlation and deduplication activities, can execute to "pre-process" event workloads, thereby reducing the overall volume of active events on the IBM Tivoli Netcool/OMNIbus server. This overall volume presents a refined (event) workload for subsequent processing within the overall incident resolution (IR) lifecycle. On the IBM Cloud Pak for AIOps side, automation policies run on the remaining events that are flowing from the IBM Tivoli Netcool/OMNIbus server. IBM Cloud Pak for AIOps applies additional suppression and grouping filters to minimize effort, and executes runbooks to automatically resolve events where warranted, and promote the remaining events to Alerts and carefully refined Incidents for ITOps to take action on the most critical concerns.

To help you understand the end-to-end event processing benefits of this deployment pattern in your environment, and where to invest in policies to optimize throughput and response time, review the following event management and impact scenarios:

  • As a basic example, a small production IBM Tivoli Netcool/OMNIbus environment with an average incoming event rate of 50 events per second, with a correlation and deduplication ratio of 10:1 raw to correlated events (incidents), can result in a refined volume of 5 Alerts per second being sent to IBM Cloud Pak for AIOps for subsequent processing. With a combination of default available issue resolution (IR) policies and analytics, the alerts can be further reduced (by 90% noise reduction) to less than 1 Incident per second over time on the IBM Cloud Pak for AIOps side.
  • As a secondary, larger example, a Production IBM Tivoli Netcool/OMNIbus environment with an average event rate of 500 events per second (with the same correlation and deduplication ratio of 10:1), can in turn present a refined volume of 50 Alerts per second being sent to IBM Cloud Pak for AIOps. By using the same combination of default available issue resolution (IR) policies and analytics, the alerts can be further reduced by 90% noise reduction, with a resultant 5 Incidents per second raised in IBM Cloud Pak for AIOps. Additional issue resolution (IR) policies can be authored to further reduce and refine Incident creation. By leveraging other advanced capabilities within IBM Cloud Pak for AIOps, such as fully automated Runbooks, the volume of actionable incidents that are presented for user interaction can be further reduced.



Custom sizing

The default starter and production deployment sizes enable the full capabilities of IBM Cloud Pak for AIOps to be used with the workload volumes that are stated in the Processing abilities section. If different workload volumes are required or resource constraints are an issue, then specific capabilities such as Metric Anomaly Detection, Log Anomaly Detection, and Runbook Automation can be sized accordingly. IBM Sales representatives and Business Partners have access to a custom sizing tool that can assess your runtime requirements and provide a custom profile that scales IBM Cloud Pak for AIOps components. The custom profile is applied when you install IBM Cloud Pak for AIOps.

Table 7 shows the processing abilities of some example custom-sized deployments of IBM Cloud Pak for AIOps.

Example 1 (Minimum scale) represents a minimally sized deployment of IBM Cloud Pak for AIOps for the evaluation of event management. It demonstrates event analytics, noise reduction, and the automation of issue resolution on a small topology. Metric and log anomaly detection and change risk assessment are de-emphasized. Example 1 requires 49 vCPU and 143 GB memory.

Example 2 (Event management focused) represents a production deployment of IBM Cloud Pak for AIOps which is focused on event management capabilities. It supports event analytics, noise reduction, and the automation of issue resolution on a large topology. Metric and log anomaly detection and change risk assessment are de-emphasized. Example 2 requires 176 vCPU and 366 GB memory.

Example 3 (Maximum scale) shows the upper limits that IBM Cloud Pak for AIOps can be scaled to, across all of its capabilities.

Table 7. Processing abilities of example custom-sized deployments
Component Resource Example 1 Example 2 Example 3
Change risk Incidents and change request records per second 0 0 N/A
Metric anomaly detection Maximum throughput - (KPIs)
for all metric integration
0 0 5,000,000
Log anomaly detection Maximum throughput (log messages per second)
for all log integrations
0 0 25,000
Events (through Netcool integration) Steady state event rate throughput per second

Burst rate event throughput per second
10 600 700
1000
Automation runbooks Fully automated runbooks run per second 1 2 4
Topology management Maximum number of topology resources 5,000 5,000,000 15,000,000
UI users Active users supported 5 40 40

Infrastructure Automation

Expand the following sections to review details about the minimum resource requirements for installing Infrastructure Automation. Use this information for sizing your environment when you are deploying Infrastructure Automation. Infrastructure Automation can be installed with IBM Cloud Pak for AIOps or stand-alone (Infrastructure Automation only). Multi-region and multi-zone clusters are not supported.

Tip: Work with your IBM Sales representative (or Business Partner) to ensure that the allocated hardware resources for your environment are sufficient to best fit your business needs. For requirements for larger production deployments and to make sure that you meet the right level of resiliency, an IBM representative can work with you on your deployment plans. For Infrastructure Automation, you need to provide the following information:

  • The number of cloud environments that you connect to.
  • The number of OpenStack environments that you manage.
  • The average number of VMs in each of the cloud environments.
  • The number of on-premises virtualized environments that you manage. Identify whether it is VMware or KVM, and the number of VMs in each virtualized environment.

Resource allocation for the Red Hat OpenShift Container Platform control plane master nodes

Expand this section for details.

The following table details the minimum requirements for the Red Hat OpenShift control plane master nodes that are necessary for an Red Hat OpenShift cluster to withstand the demands of an Infrastructure Automation deployment. IBM Cloud Pak for AIOps has many Kubernetes operators that interface with the API server and etcd storage, and so master nodes must be adequately sized.

Node count vCPU (per node) Memory (GB) (per node) vCPU (all nodes) Memory (GB) (all nodes)
3 8 16 24 48

Resource allocation for the Red Hat OpenShift Container Platform worker nodes

Expand this section for details.

The following table provides the required Red Hat OpenShift worker resources per node for a sample Infrastructure Automation deployment.

Node count vCPU (per node) Memory (GB) (per node) vCPU (all nodes) Memory (GB) (all nodes) Disk space (GB) vCPU (extra capacity) Memory (GB) (extra capacity)
3 16 32 48 96 212 17 18

The following table lists the hardware requirements for Infrastructure Automation only, and does not include the requirements for the Red Hat OpenShift control plane.

Resource Deployment (Managed Services and Infrastructure management)
No infrastructure management providers
(not under load)
Deployment (Managed Services and Infrastructure management)
1 infrastructure management provider
(not under load)
Worker node count
Total vCPUs
Total memory (GB)
Persistent storage (Gi)
3
14
35
70
3
14
38
70

Notes:

  • These estimates are for an Infrastructure Automation installation with both Managed services and Infrastructure Management deployed.
  • The requirements for Infrastructure Automation are in addition to the requirements for IBM Cloud Pak for AIOps.
  • However, you might need extra worker nodes, resource capacity, or both, depending on your own requirements. The CPU, memory, and disk resources that are required for Infrastructure Automation depend on the number of provider instances and the total number of instances and VMs managed by the providers. Other worker node counts and sizes can be used based on your available hardware configurations, but the following conditions must be met:
  • The combined available vCPU and memory of the worker nodes must equal or exceed the total requirements.
  • Each worker node has at least the minimum vCPU and memory that is given in the following table.
  • Warning: These requirements do not include CPU and memory resources for hosting a storage provider. These storage providers can require more nodes, resources, or both, to run. The extra resources that are needed can vary based on your selected storage provider. A general recommendation is for one extra worker node in a starter deployment, and for three extra worker nodes for a production deployment where HA is required. Consult your storage provider's documentation for exact requirements.


Additional requirements for offline (airgap) deployments

Expand this section for details.

If you are installing in an air-gapped environment (offline), you must ensure that you have adequate space to download the Infrastructure Automation images to a local registry in your offline environment. The Infrastructure Automation images total 87 GB.

  • Bastion host method - the local registry and the bastion host must each have at least 87 GB of storage space.
  • Portable compute device method - the local registry and the portable compute device must each have at least 87 GB of storage space.
  • Portable storage device - the local registry, connected compute device, portable storage device, and the local compute device must each have at least 87 GB of storage space.