Table of contents

Changing required node settings

Some services that run on IBM® Cloud Pak for Data require specific settings on the nodes in the cluster. To ensure that the cluster has the required settings for these services, an operating system administrator with root privileges must review and adjust the settings on the appropriate nodes in the cluster.

Machine Config Operator
The Machine Config Operator is a cluster-level operator that you can use to manage the operating system and keep the cluster up to date and configured.

For more information, see Using MachineConfig objects to configure nodes.

Node Tuning Operator
You can use the Node Tuning Operator to manage node-level tuning.

On Red Hat® OpenShift®, you can use the Node Tuning Operator to manage node-level profiles. For more information, see Using the Node Tuning Operator.

Node settings for services

The following table shows the node settings that require changes for some services, with links to instructions for changing the settings.

Node settings Services that require changes to the setting Environments Instructions
HAProxy timeout settings for the load balancer
  • Watson™ Discovery
  • Watson Knowledge Catalog
  • OpenPages®
  • Also recommended if you are working with large data sets or you have slower network speeds.
All environments
CRI-O container settings
  • DataStage®
  • Watson Discovery
  • Watson Knowledge Catalog
  • Watson Studio
All environments except IBM Cloud
Kernel parameter settings
  • Watson Discovery
  • Watson Knowledge Catalog
  • Watson Studio
  • DataStage
  • Db2®
  • Db2 Warehouse
  • Db2 Big SQL
  • Data Virtualization
All environments except IBM Cloud
GPU settings
  • Jupyter Notebooks with Python 3.7 for GPU
  • Watson Machine Learning Accelerator (requires that the NVIDIA GPU Operator is installed)
All environments

Load balancer timeout settings

To prevent connections from being closed before processes complete, you might need to adjust the timeout settings on your load balancer node. The recommended timeout is at least 5 minutes. In some situations, you might need to set the timeout even higher. For more information about timeout settings in Watson Knowledge Catalog, see Processes time out before completing.

This setting is required if you plan to install the Watson Knowledge Catalog service or the OpenPages service. However, this setting is also recommended if you are working with large data sets or you have slower network speeds.

The following steps assume that you are using HAProxy. If you are using a different load balancer, see the documentation for your load balancer.

On premises or private cloud

  1. On the load balancer node, check the HAProxy timeout settings in the /etc/haproxy/haproxy.cfg file.
    The recommended values are at least:
    timeout client          300s 
    timeout server          300s 
  2. If the timeout values are less than 300 seconds (5 minutes), update the values:
    • To change the timeout client setting, enter the following command:
      sed -i -e "/timeout client/s/ [0-9].*/ 5m/" /etc/haproxy/haproxy.cfg
    • To change the timeout server setting, enter the following command:
      sed -i -e "/timeout server/s/ [0-9].*/ 5m/" /etc/haproxy/haproxy.cfg
  3. Run the following command to apply the changes that you made to the HAProxy configuration:
    systemctl restart haproxy

On IBM Cloud

If you are setting HAProxy timeout settings for Cloud Pak for Data on IBM Cloud, you can configure route timeouts by using the oc annotate command.

  1. Use the following command to set the server-side timeout for the HAProxy route to 360 seconds:
    oc annotate route zen-cpd --overwrite haproxy.router.openshift.io/timeout=360s

    If you don't provide the units, ms is the default.

  2. Optionally, customize other route-specific settings. For more information, see Route-specific annotations.
Note: On a Virtual Private Cloud (VPC) Gen2 cluster, the load balancer timeout is set to 30s by default. If you use the annotate command to set the timeout value greater than 50s, it will be set to 50s. You cannot customize the timeout value to be greater than 50s. The server might time out during long running transactions. For more information, see Connection timeouts.

CRI-O container settings

To ensure that services can run correctly, you must adjust values in the CRI-O container settings to specify the maximum number of processes and the maximum number of open files.

These settings are required if you are using the CRI-O container runtime, which is the default on the OpenShift Container Platform.

Note: If you install Cloud Pak for Data on IBM Cloud, the CRI-O container settings are automatically applied to your cluster as part of the installation. You do not need to manually change these settings.

To change CRI-O settings, you modify the contents of the crio.conf file and pass those updates to your nodes as a machine config.

  1. Obtain a copy of the existing crio.conf file from a node. For example, run the following command, replacing $node with one of the worker nodes. You can obtain the worker nodes by using the oc get nodes command.
    scp core@$node:/etc/crio/crio.conf /tmp/crio.conf

    If the crio.conf file doesn't exist in the path /etc/crio/crio.conf, use the path /etc/crio/crio.conf.d/00-default instead.

    If you don't have access by using the scp command, ask your cluster administrator for the crio.conf file.

    Make sure that you obtain the latest version of the crio.conf file. You can verify that the file is the latest version by running the oc get mcp command and verifying that the worker node is not being updated (UPDATING = False).

  2. In the crio.conf file, make the following changes in the [crio.runtime] section (uncomment the lines if necessary):
    • To set the maximum number of open files, change the default_ulimits setting to at least 66560, as follows:
      ……
      [crio.runtime]
      default_ulimits = [
              "nofile=66560:66560"
      ]
      ……
    • To set the maximum number of processes, change the pids_limit setting to at least 12288, as follows:
      ……
      # Maximum number of processes allowed in a container.
      pids_limit = 12288
      ……
      
  3. Create a machineconfig object YAML file, as follows, and apply it.
    cat << EOF | oc apply -f -
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 99-worker-cp4d-crio-conf
    spec:
      config:
        ignition:
          version: 3.1.0
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,$(cat /tmp/crio.conf | base64 -w0)
            filesystem: root
            mode: 0644
            path: /etc/crio/crio.conf
    EOF
    
  4. Monitor all of the nodes to ensure that the changes are applied, by using the following command:
    watch oc get nodes
    You can also use the following command to confirm that the MachineConfig sync is complete:
    watch oc get mcp

Kernel parameter settings

To ensure that certain services can run correctly, you must verify the kernel parameters.

Note: If you install Cloud Pak for Data and services on IBM Cloud by using the IBM Cloud Catalog, the kernel parameter settings are automatically applied to your cluster as part of the installation. You do not need to manually change these settings. If you do not use the IBM Cloud catalog for the installation, contact IBM Support.
These settings depend on the machine RAM size and the OS page size. The following steps assume that you have worker nodes with 64 GB of RAM on an x86 platform with a 4 K OS page size. If the worker nodes have 128 GB of RAM each, you must double the values for kernel.shmmax and kernel.shmall.
  • Virtual memory limit (vm.max_map_count)
  • Message limits (kernel.msgmax, kernel.msgmnb, and kernel.msgmni)
  • Shared memory limits (kernel.shmmax, kernel.shmall, and kernel.shmmni)
    The following settings are recommended:
    • kernel.shmmni: 256 * <size of RAM in GB>
    • kernel.shmmax: <size of RAM in bytes>
    • kernel.shmall: 2 * <size of RAM in the default OS system page size>
  • Semaphore limits (kernel.sem)

    As of Red Hat Enterprise Linux® version 7.8 and Red Hat Enterprise Linux version 8.1, the kernel.shmmni, kernel.msgmni, and kernel.semmni settings in kernel.sem must be set to 32768. If the boot parameter ipcmni_extend is specified, then the maximum value is 8388608 while the minimum value is 32768. Use 256 * <size of RAM in GB> to calculate possible values for kernel.shmmni and kernel.semmni. Use 1024 * <size of RAM in GB> to calculate a possible value for kernel.msgmni. For more information, see On RHEL servers, changing the semaphore value fails with a message "setting key "kernel.sem": Numerical result out of range".

    • The kernel.sem value for SEMMNS must be 1024000 for Watson Knowledge Catalog service.
    • The kernel.sem value for SEMOPM must be at least 100 for Data Virtualization service.
For more information about changing kernel parameter settings for Db2, see Deploying Db2 with limited privileges and for Db2 Warehouse, see Updating kernel semaphore settings - Db2 Big SQL.
Use the Node Tuning Operator to change the kernel parameter settings. The following steps affect all services and all worker nodes on the cluster. You might need to manage node-level profiles for each worker node in the cluster based on the services that are installed. You can limit node tuning to specific nodes. For more information, see Managing nodes.
  1. Create a custom node-level tune with the following content.
    Important: If your current settings are less than the recommendations, adjust the settings. The following command assumes that you have worker nodes with 64 GB of RAM.
    cat <<EOF |oc apply -f -
    apiVersion: tuned.openshift.io/v1
    kind: Tuned
    metadata:
      name: cp4d-wkc-ipc
      namespace: openshift-cluster-node-tuning-operator
    spec:
      profile:
      - name: cp4d-wkc-ipc
        data: |
          [main]
          summary=Tune IPC Kernel parameters on OpenShift Worker Nodes running WKC Pods
          [sysctl]
          kernel.shmall = 33554432
          kernel.shmmax = 68719476736
          kernel.shmmni = 32768
          kernel.sem = 250 1024000 100 32768
          kernel.msgmax = 65536
          kernel.msgmnb = 65536
          kernel.msgmni = 32768
          vm.max_map_count = 262144
      recommend:
      - match:
        - label: node-role.kubernetes.io/worker
        priority: 10
        profile: cp4d-wkc-ipc
    EOF
  2. Configure kubelet to allow Db2U to make syscalls as needed:
    1. Update all of the nodes to use a custom KubletConfig:
      cat << EOF | oc apply -f -
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: db2u-kubelet
      spec:
        machineConfigPoolSelector:
          matchLabels:
            db2u-kubelet: sysctl
        kubeletConfig:
          allowedUnsafeSysctls:
            - "kernel.msg*"
            - "kernel.shm*"
            - "kernel.sem"
      EOF
    2. Update the label on the machineconfigpool:
      oc label machineconfigpool worker db2u-kubelet=sysctl
    3. Wait for the cluster to reboot. Then, run the following command to verify that the machineconfigpool is updated:
      oc get machineconfigpool
      The command should return output with the following format:
      NAME     CONFIG   UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   master   True      False      False      3              3                   3                     0                      139m
      worker   worker   False     True       False      5              1                   1                     0                      139m

      Wait until all of the worker nodes are updated and ready.