Online installation of IBM Cloud Pak for AIOps on Linux

Learn about installing IBM Cloud Pak for AIOps on Linux.

Overview

You can install a production deployment of IBM Cloud Pak for AIOps on a Linux platform, without installing Red Hat OpenShift Container Platform.

The deployment is configured to collect usage data. If you want to disable usage data collection, then you can do this post-install. For more information, see Updating usage data collection preferences.

Limitations

  • Starter-size deployments are not supported.
  • The following features and capabilities are only available for deployments of IBM Cloud Pak for AIOps on Red Hat OpenShift, and are not available for deployments of IBM Cloud Pak for AIOps on Linux:
  • The aiopsctl tool is supported only on x86_64 (amd64) architecture.
Important: If you use Instana to monitor IBM Cloud Pak for AIOps, take the following actions to avoid operational issues and installation and upgrade failures:

Before you begin

Ensure that you meet the following prerequisites:

  • To function properly, distributed applications such as IBM Cloud Pak for AIOps require the system clocks of all of their nodes to be highly synchronized with one another. Discrepancies between the clocks can cause IBM Cloud Pak for AIOps to experience operational issues. All Linux systems that are used by the IBM Cloud Pak for AIOps installation must be configured to connect to an NTP server to synchronize their clocks. Some examples of NTP clients are chrony, systemd-timesyncd, and ntpd. Verify that the clocks are synchronized between systems before you install IBM Cloud Pak for AIOps.
  • While not required, as a best practice set all systems to use the same time-zone.
  • Your cluster meets all of the requirements that are detailed in Planning an installation of IBM Cloud Pak for AIOps on Linux. The Linux cluster must be reserved for the sole use of IBM Cloud Pak for AIOps, and you must have a minimum of three control planes nodes.
  • Local storage is configured in accordance with the instructions in Configuring local volumes.
  • You have the credentials for the root user. Root user must be used to install IBM Cloud Pak for AIOps.
  • The worker nodes and the client machine that you are running the installation from have network connectivity to the control plane nodes.
  • You have a load balancer configured in accordance with the details in Load balancing.

If IBM Sales representatives and Business Partners supplied you with a custom profile ConfigMap to customize your deployment, then you must follow their instructions to apply it during installation. The custom profile cannot be applied after installation, and attempting to do so can break your IBM Cloud Pak for AIOps deployment. For more information about custom sizing, see Custom sizing.

Prerequisites

Allow access to the following sites and ports:

Site Description
  • icr.io
  • cp.icr.io
  • dd0.icr.io
  • dd2.icr.io
  • dd4.icr.io
  • dd6.icr.io
Allow access to these hosts on port 443 to enable access to the IBM Cloud Container Registry, CASE OCI artifact, and IBM Cloud Pak foundational services catalog source.
  • dd1-icr.ibm-zh.com
  • dd3-icr.ibm-zh.com
  • dd5-icr.ibm-zh.com
  • dd7-icr.ibm-zh.com
If you are located in China, also allow access to these hosts on port 443.
github.com GitHub houses CASE files, IBM Cloud Pak tools, and scripts.
mirror.openshift.com for the oc CLI

You must be able to download content from GitHub. If you are not able to, verify that your network or proxy settings permit access to GitHub's file server domain and if needed contact your network administrator to allow it.

1. Retrieve your entitlement key

Obtain the IBM entitlement key that is assigned to your IBMid. The key is needed to pull the IBM Cloud Pak for AIOps images from the IBM® Entitled Registry.

  1. Log in to MyIBM Container Software Library Opens in a new tab with the IBMid and password details that are associated with the entitled software.

  2. In the Entitlement key section, select Copy to copy the entitlement key to the clipboard.

2. Optionally configure a custom certificate

If you want to use your own custom certificate for IBM Cloud Pak for AIOps instead of the default cluster certificate, then use the following steps to create a certificate and key that you can supply as parameters at installation time.

  1. Ensure that you have the following three PEM-encoded X.509 certificate files:

    • caintermediate.pem: The intermediate certificate that issued your server certificate.
    • aiops.pem: An IBM Cloud Pak for AIOps certificate, which includes the two fully qualified domain names (FQDNs) for aiops-cpd and cp-console-aiops in the Subject Alternative Name (SAN) list.
    • aiops.key.pem: A key file for the signed certificate in aiops.pem
    Tip: You can create the FQDN strings to use for aiops-cpd and cp-console-aiops by prepending cp-console-aiops and aiops-cpd to your load balancer's host name. For example, if your load balancer host name is loadbalancerhost.acme.com, then the FQDN strings are cp-console-aiops.loadbalancerhost.acme.com and aiops-cpd.loadbalancerhost.acme.com.
  2. Concatenate the server and intermediate certificates into one file called aiops-certificate-chain.pem.

    cat aiops.pem caintermediate.pem > aiops-certificate-chain.pem 
    

If you do not install IBM Cloud Pak for AIOps with a custom certificate, you can switch to using a custom certificate after installation. For more information, see Using a custom certificate (IBM Cloud Pak for AIOps on Linux).

3. Create environment variables

Create and then source a shell script that is named aiops_var.sh, which defines the environment variables that are used to provide installation parameters for your deployment. Use the following codeblock as a template, replacing the brackets < ... > with values for your environment. It is important that you keep this file.

For more information about choosing a deployment type, see Incremental adoption. Subject to further hardware requirements, you can update the deployment type post-install. For more information, see Updating the deployment type.

Important:
  • If you used an alternative path for APP_STORAGE_PATH or PLATFORM_STORAGE_PATH when you set up your local storage in Configuring local volumes, then you must change the values of these environment variables in the following codeblock.
  • If you are not using the default CIDR ranges for IBM Cloud Pak for AIOps, then you must update the values of the environment variables CLUSTER_CIDR, SERVICE_CIDR and CLUSTER_DNS in the following codeblock.
  • If your environment uses a proxy, then you must update the values of the environment variables HTTP_PROXY, HTTPS_PROXY and NO_PROXY in the following codeblock.
#================================================================================================================================
# IBM Cloud Pak for AIOps installation variables (Linux)
#================================================================================================================================
export TARGET_USER="root"
export ACCEPT_LICENSE=false # Set to true to agree to the license terms.

# --------------------------------------------------------------------------------------------------------------------------------
# IBM Entitled Registry
# --------------------------------------------------------------------------------------------------------------------------------
export IBM_ENTITLEMENT_KEY=<ibm-entitlement-key> # Set to the entitlement key retrieved in previous step.

# --------------------------------------------------------------------------------------------------------------------------------
# Hostnames
# `<load_balancer_hostname>` - the hostname of your load balancer
# `<control_plane_node_n>` - the FQDN or IP address of each control plane node. For example, "control_plane_node_1.example.com"
# `<worker_n>` - the FQDN or IP address of each worker node. For example, "worker_1.example.com"
# --------------------------------------------------------------------------------------------------------------------------------
export LOAD_BALANCER_HOST="<load_balancer_hostname>"
export CONTROL_PLANE_NODE="<control_plane_node_1>"
export ADDITIONAL_CONTROL_PLANE_NODES=(
  "<control_plane_node_2>"
  "<control_plane_node_3>"
)
export WORKER_NODES=(
  "<worker_1>"
  "<worker_2>"
  "<worker_3>"
  "<worker_4>"
  "<worker_5>"
  "<worker_6>"
  "<worker_7>"
)

# -----------------------------------------------------------------------------------------------------------
# Incremental adoption - set your deployment type.
# Set to `extended` to install an extended deployment with log anomaly detection and ticket analysis capabilities
# Set to `base` to install a base deployment without log anomaly detection and ticket analysis capabilities
# -----------------------------------------------------------------------------------------------------------
export DEPLOY_TYPE="base"

# -------------------------------------------------------------------------------------------------------------------------------
# Storage
# -------------------------------------------------------------------------------------------------------------------------------
export APP_STORAGE_PATH="/var/lib/aiops/storage"
export PLATFORM_STORAGE_PATH="/var/lib/aiops/platform"

# -------------------------------------------------------------------------------------------------------------------------------
# Network configuration
# Leave as empty strings to use the default values, otherwise update to your required values.
# -------------------------------------------------------------------------------------------------------------------------------
export CLUSTER_CIDR="" # Default: 10.42.0.0/16
export SERVICE_CIDR="" # Default: 10.43.0.0/16
export CLUSTER_DNS="" # Default: 10.43.0.10, must be within SERVICE_CIDR range

# -------------------------------------------------------------------------------------------------------------------------------
# Proxy configuration
# Leave as empty strings if you are not using a proxy, otherwise update to your required values.
# -------------------------------------------------------------------------------------------------------------------------------
export HTTP_PROXY=""   # HTTP proxy URL (For example http://your-proxy.example.com:8888)
export HTTPS_PROXY=""  # HTTPS proxy URL (For example https://your-proxy.example.com:8888)
export NO_PROXY=""     # A comma-separated list of hosts to bypass the proxy. Must include the IP address ranges for the public
                       # and private IPs of the cluster nodes. (For example 127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16).

Run the following command to source your script and set the environment variables:

. ./aiops_var.sh

4. Install the aiopsctl tool and register cluster nodes

  1. Run the following commands to install aiopsctl on your cluster, and then register the control plane nodes and worker nodes.

    AIOPSCTL_TAR="aiopsctl-linux_amd64.tar.gz"
    AIOPSCTL_INSTALL_URL="https://github.com/IBM/aiopsctl/releases/download/v4.13.0/${AIOPSCTL_TAR}"
    ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} curl -LO "${AIOPSCTL_INSTALL_URL}"
    ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} tar xf "${AIOPSCTL_TAR}"
    ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} mv aiopsctl /usr/local/bin/aiopsctl 
    
    CLUSTER_CIDR_FLAG=$(if [ -n "${CLUSTER_CIDR}" ]; then echo "--cluster-cidr=${CLUSTER_CIDR} "; fi)
    SERVICE_CIDR_FLAG=$(if [ -n "${SERVICE_CIDR}" ]; then echo "--service-cidr=${SERVICE_CIDR} "; fi)
    CLUSTER_DNS_FLAG=$(if [ -n "${CLUSTER_DNS}" ]; then echo "--cluster-dns=${CLUSTER_DNS} "; fi) 
    
    HTTP_PROXY_FLAG=$(if [ -n "${HTTP_PROXY}" ]; then echo "--http-proxy=${HTTP_PROXY} "; fi)
    HTTPS_PROXY_FLAG=$(if [ -n "${HTTPS_PROXY}" ]; then echo "--https-proxy=${HTTPS_PROXY} "; fi)
    NO_PROXY_FLAG=$(if [ -n "${NO_PROXY}" ]; then echo "--no-proxy=${NO_PROXY} "; fi)
    
    echo "Installing main control plane node ${CONTROL_PLANE_NODE}"
    ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl cluster node up --accept-license=${ACCEPT_LICENSE} --role=control-plane --registry-token="${IBM_ENTITLEMENT_KEY}" --app-storage="${APP_STORAGE_PATH}" --platform-storage="${PLATFORM_STORAGE_PATH}" --load-balancer-host="${LOAD_BALANCER_HOST}" ${CLUSTER_CIDR_FLAG}${SERVICE_CIDR_FLAG}${CLUSTER_DNS_FLAG}${HTTP_PROXY_FLAG}${HTTPS_PROXY_FLAG}${NO_PROXY_FLAG}
    
    K3S_TOKEN=$(ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl cluster node info --token-only)
    K3S_HOST=$(ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl cluster node info --server-url-only)
    
    echo "Installing additional control plane nodes"
    for CP_NODE in "${ADDITIONAL_CONTROL_PLANE_NODES[@]}"; do
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} curl -LO "${AIOPSCTL_INSTALL_URL}"
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} tar xvf "${AIOPSCTL_TAR}"
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} mv aiopsctl /usr/local/bin/aiopsctl
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${CP_NODE} aiopsctl cluster node up --accept-license=${ACCEPT_LICENSE} --role=control-plane --server-url="${K3S_HOST}" --token="${K3S_TOKEN}" --registry-token="${IBM_ENTITLEMENT_KEY}" --app-storage="${APP_STORAGE_PATH}" --platform-storage="${PLATFORM_STORAGE_PATH}" --load-balancer-host="${LOAD_BALANCER_HOST}" ${CLUSTER_CIDR_FLAG}${SERVICE_CIDR_FLAG}${CLUSTER_DNS_FLAG}${HTTP_PROXY_FLAG}${HTTPS_PROXY_FLAG}${NO_PROXY_FLAG}
    done
    
    echo "Installing worker nodes"
    K3S_LB_HOST="https://${LOAD_BALANCER_HOST}:6443"
    for WORKER_NODE in "${WORKER_NODES[@]}"; do
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} curl -LO "${AIOPSCTL_INSTALL_URL}"
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} tar xvf "${AIOPSCTL_TAR}"
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} mv aiopsctl /usr/local/bin/aiopsctl
      ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} ssh ${TARGET_USER}@${WORKER_NODE} aiopsctl cluster node up --accept-license=${ACCEPT_LICENSE} --role=worker --server-url="${K3S_LB_HOST}" --token="${K3S_TOKEN}" --registry-token="${IBM_ENTITLEMENT_KEY}" --app-storage="${APP_STORAGE_PATH}" ${HTTP_PROXY_FLAG}${HTTPS_PROXY_FLAG}${NO_PROXY_FLAG}
    done

    The preceding commands install the Red Hat OpenShift CLI (oc) if it is not already installed. If it is already installed, then you must ensure that it is at version 4.16 or higher.

  2. If you're using a proxy, check that your system environment configuration settings are set up correctly for your proxy.

    Run the following command on control plane nodes:

    cat /etc/systemd/system/k3s.service.env
    

    Run the following command on worker nodes:

    cat /etc/systemd/system/k3s-agent.service.env
    
    Example output:
    HTTP_PROXY=<Your proxy>
    HTTPS_PROXY=<Your proxy>
    NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
    

    Check that the NO_PROXY variable includes the IP address ranges for the public and private IPs of the cluster nodes.

    If the proxy variables are missing or incorrect, run the following steps:
    1. Check and correct the Proxy configuration section in the aiops_var.sh script, as described in step 3 Create environment variables.

    2. After updating and sourcing the aiops_var.sh script, re-run this step, step 4 Install the aiopsctl tool and register cluster nodes to configure the nodes with the proxy.

  3. Run the following command to see information about your cluster:

    aiopsctl cluster node info
    
    Example output:
    #  aiopsctl cluster node info
    Node Details
    Role: Control plane
    Server URL: https://test-server.acme.com:6443
    Token: ***** (Use --show-secrets to reveal token)
    
  4. Check for network connectivity issues.

    VMWare vSphere virtual machines (VMs) with Red Hat Enterprise Linux can have connectivity issues. If you are not using vSphere VMs for your IBM Cloud Pak for AIOps deployment, then skip this step. Otherwise, run the following lookup command:

    nslookup kubernetes.default.svc.cluster.local 10.43.0.10
    

    If the lookup is successful, then proceed to the next step. If the lookup fails, then run the following command on each of your VMs to rectify connectivity problems:

    ethtool -K flannel.1 tx-checksum-ip-generic off
    

    For more information, see Installation on Linux fails with network connectivity errors when using aiopsctl to install on vSphere VMs.

5. Evaluate storage performance

Use the following procedure to evaluate whether your storage performance is sufficient to withstand the demands of a production deployment of IBM Cloud Pak for AIOps.

  1. Run the following command to benchmark your storage.

    ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} oc create namespace aiops
    ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl benchmark storage
    

    The tool selects a node in your cluster and benchmarks the performance of the node's application storage disk. The process takes around 8 minutes to run.

    If you think that the storage performance between your nodes varies significantly, then you can use the --node <node_name> argument to pass in the name of the node that you want the tool to run on.

  2. Verify that your benchmarking results meet or exceed the required metrics.

    The following table identifies the storage performance metrics that must be achieved to support a deployment of IBM Cloud Pak for AIOps. If your deployment is custom-sized to support higher rates than the default production rates listed in Processing abilities, then your storage performance must exceed these metrics.

    Metric Read Write
    Minimum sequential IOPS (higher is better, lower is worse) 5000 5000
    Minimum sequential bandwidth (higher is better, lower is worse) 20 Mi/sec 20 Mi/sec
    Maximum average sequential latency (lower is better, higher is worse) 500 usec 1000 usec

6. Install IBM Cloud Pak for AIOps

The control plane node must be able to resolve the domain name provided by "${LOAD_BALANCER_HOST}". If the control plane node cannot resolve this domain name through a Domain Name System (DNS), then you must add an entry to the control plane node's /etc/hosts file for the domain name provided by "${LOAD_BALANCER_HOST}" before you continue.

Run the aiopsctl tool from the control plane node to install IBM Cloud Pak for AIOps.

If you did not configure a custom certificate in step 2 then run the following command:

ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl server up --load-balancer-host="${LOAD_BALANCER_HOST}" --mode "${DEPLOY_TYPE}"

If you configured a custom certificate in step 2 then run the following command:

ssh ${TARGET_USER}@${CONTROL_PLANE_NODE} aiopsctl server up --load-balancer-host="${LOAD_BALANCER_HOST}" --mode "${DEPLOY_TYPE}" --certificate-file aiops-certificate-chain.pem --key-file aiops.key.pem

Running aiopsctl server up automatically runs a prerequisite check to verify if your cluster is correctly set up for an IBM Cloud Pak for AIOps installation.

7. Verify your installation

The installation takes one to two hours to complete. If the installation is unsuccessful, an error message is displayed and a nonzero exit code is returned.

Run the following command to check the status of the components of your IBM Cloud Pak for AIOps installation:

aiopsctl status
Example output for a healthy installation:
$ aiopsctl status
o- [12 Aug 24 08:40 PDT] Getting cluster status
Control Plane Node(s):
    test-server-1.acme.com Ready
    test-server-2.acme.com Ready
    test-server-3.acme.com Ready

Worker Node(s):
    test-agent-1.acme.com Ready
    test-agent-2.acme.com Ready
    test-agent-3.acme.com Ready
    test-agent-4.acme.com Ready
    test-agent-5.acme.com Ready
    test-agent-6.acme.com Ready
    test-agent-7.acme.com Ready

o- [13 Mar 26 08:40 PDT] Checking AIOps installation status

   17 Ready Components
    cassandra
    commonservice
    aimanager
    cluster.aiops-orchestrator-postgres
    aiopsui
    zenservice
    cluster.opensearch
    aiopsedge
    baseui
    lifecycletrigger
    lifecycleservice
    rediscp
    aiopsanalyticsorchestrator
    kafka
    issueresolutioncore
    zookeeper
    asm

  AIOps installation healthy

If the installation fails, or is not complete and is not progressing, then see Troubleshooting installation and upgrade and Known Issues to help you identify any installation problems.

8. Set the Postgres operator replica count

Set the replica count for the Postgres operator to 2 to help ensure high availability.

  1. Run the following command:

    kubectl patch csv cloud-native-postgresql.v1.25.5 --type json -p '[{"op": "replace", "path": "/spec/install/spec/deployments/0/spec/replicas", "value": 2}]' -n aiops

    Example output:

    clusterserviceversion.operators.coreos.com/cloud-native-postgresql.v1.25.5 patched
  2. Run the following command to verify that there are 2 Postgres replicas:
    kubectl wait deployment/postgresql-operator-controller-manager-1-25-5 --for=jsonpath='{.status.readyReplicas}'=2 -n aiops
    After a few seconds, you will see the following output if the patch command was successful:
    deployment.apps/postgresql-operator-controller-manager-1-25-5 condition met
    If the command is not successful, the following output is displayed after 30 seconds. Contact IBM Support.
    error: timed out waiting for the condition on deployments/postgresql-operator-controller-manager-1-25-5

9. Access the Cloud Pak for AIOps console

Run the following command on a control plane node to see the URL, username, and password for your IBM Cloud Pak for AIOps deployment.

aiopsctl server info --show-secrets
Example output:
aiopsctl server info --show-secrets

Cluster Access Details
URL:      aiops-cpd.test-server.acme.com
Username: cpadmin
Password: abcdefghijklmno

Ensure that your environment's Domain Name System (DNS) is correctly configured to resolve the hosts for accessing the Cloud Pak for AIOps console, and for any integration endpoints such as IBM Tivoli Netcool/Impact. For more information, see DNS requirements.

What to do next

Any commands that use oc must be run from a control plane node.

To uninstall your IBM Cloud Pak for AIOps on Linux deployment, follow the instructions in Uninstalling a deployment of IBM Cloud Pak for AIOps on Linux.