Changing required node settings

Some services that run on IBM® Cloud Pak for Data require specific settings on the nodes in the cluster. To ensure that the cluster has the required settings for these services, an operating system administrator with root privileges must review and adjust the settings on the appropriate nodes in the cluster.

The Machine Config Operator is a cluster-level operator that you can use to manage the operating system and keep the cluster up to date and configured. For more information, see the instructions for using MachineConfig objects to configure nodes in the Red Hat® OpenShift® Container Platform documentation:

Node settings for services

The following table shows the services that require changes to specific node settings, with links to instructions for changing the settings:

Node settings Services that require changes to the setting Environments Instructions
HAProxy timeout settings for the load balancer
  • Db2® Data Gate
  • OpenPages®
  • Watson™ Discovery
  • Watson Knowledge Catalog
  • Watson Speech
  • Also recommended if you are working with large data sets or you have slower network speeds.
All environments
CRI-O container settings
  • Cognos® Analytics
  • Data Virtualization
  • Db2
  • Db2 Big SQL
  • Db2 Warehouse
  • Watson Discovery
  • Watson Knowledge Catalog
  • Watson Studio
  • Watson Machine Learning Accelerator
All environments except IBM Cloud
Kernel parameter settings
  • Data Virtualization
  • Db2
  • Db2 Big SQL
  • Db2 Warehouse
  • Watson Knowledge Catalog
  • Watson Studio
All environments
Power settings    
GPU settings
  • Jupyter Notebooks with Python 3.9 for GPU
  • Watson Machine Learning Accelerator (requires that the NVIDIA GPU Operator is installed)
All environments

Load balancer timeout settings

To prevent connections from being closed before processes complete, you might need to adjust the timeout settings on your load balancer node.

This setting is required if you plan to install the following services:
  • Db2 Data Gate
  • OpenPages
  • Watson Discovery
  • Watson Knowledge Catalog
  • Watson Speech

This setting is also recommended if you are working with large data sets or you have slower network speeds. For example, you might need to increase this value if you receive a timeout or failure when you upload a large file.

The following procedures show how to change the timeout settings if you are using HAProxy. If you are using a load balancer other than HAProxy, see the documentation for your load balancer for information about how to configure the timeout settings.

If you are using HAProxy, the load balancer node is the OpenShift cluster public node.


Changing timeout settings on premises or private cloud
  1. On the load balancer node, check the HAProxy timeout settings in the /etc/haproxy/haproxy.cfg file. The recommended minimum values are as follows:
    Db2 Data Gate
    timeout client          7500s 
    timeout server          7500s 
    OpenPages
    timeout client          300s 
    timeout server          300s 
    Watson Discovery
    timeout client          300s 
    timeout server          300s 
    Watson Knowledge Catalog
    timeout client          300s 
    timeout server          300s 
    Watson Speech
    timeout client          1800s 
    timeout server          1800s 
  2. If necessary, change the timeout values by running the following commands:
    • To change the timeout client setting, enter the following command:
      sed -i -e "/timeout client/s/ [0-9].*/ 5m/" /etc/haproxy/haproxy.cfg
    • To change the timeout server setting, enter the following command:
      sed -i -e "/timeout server/s/ [0-9].*/ 5m/" /etc/haproxy/haproxy.cfg
  3. Run the following command to apply the changes that you made to the HAProxy configuration:
    systemctl restart haproxy


Changing timeout settings on IBM Cloud

If you are setting HAProxy timeout settings for Cloud Pak for Data on IBM Cloud, you can configure route timeouts by using the oc annotate command.

  1. Use the following command to set the server-side timeout for the HAProxy route to 360 seconds:
    oc annotate route zen-cpd --overwrite haproxy.router.openshift.io/timeout=360s

    If you don't provide the units, ms is the default.

  2. Optionally, customize other route-specific settings. For more information, see Route-specific annotations.
Note: On a Virtual Private Cloud (VPC) Gen2 cluster, the load balancer timeout is set to 30s by default. You can use the annotate command to set the timeout value to a maximum of 50s. If you need to set the timeout value higher than 50s, open a support ticket with the Load Balance Service team. The server might time out during long running transactions. For more information, see Connection timeouts.

CRI-O container settings

To ensure that services can run correctly, you must adjust values in the CRI-O container settings to specify the maximum number of processes and the maximum number of open descriptor files.

These settings are required for the CRI-O CRI-O container runtime on the OpenShift Container Platform.

Note: If you install Cloud Pak for Data on IBM Cloud, the CRI-O container settings are automatically applied to your cluster as part of the installation. You do not need to manually change these settings.

To change CRI-O settings, you modify the contents of the crio.conf file and pass those updates to your nodes as a machine config.

  1. Obtain a copy of the existing crio.conf file from a worker node. For example, run the following command, replacing $node with one of the worker nodes. You can obtain the worker nodes by using the oc get nodes command.
    scp core@$node:/etc/crio/crio.conf /tmp/crio.conf

    If the crio.conf file doesn't exist in the path /etc/crio/crio.conf, use the path /etc/crio/crio.conf.d/00-default instead.

    If you don't have access by using the scp command, ask your cluster administrator for the crio.conf file.

    Make sure that you obtain the latest version of the crio.conf file.

  2. In the crio.conf file, make the following changes in the [crio.runtime] section (uncomment the lines if necessary):
    • To set the maximum number of open files, change the default_ulimits setting to at least 66560, as follows.
      Note: When you set the default_ulimits parameter in the crio.conf file, make sure that the ulimit -n settings in the /etc/security/limits.conf files on the worker machines also are set to at least 66560.
      ……
      [crio.runtime]
      default_ulimits = [
              "nofile=66560:66560"
      ]
      ……
    • To set the maximum number of processes, change the pids_limit setting to at least 12288, as follows.
      ……
      # Maximum number of processes allowed in a container.
      pids_limit = 12288
      ……
      
  3. Create a machineconfig object YAML file, as follows, and apply it.
    Note: If you are using Cloud Pak for Data on OpenShift Container Platform version 4.6, the ignition version is 3.1.0. If you are using Cloud Pak for Data on OpenShift Container Platform version 4.8, change the ignition version to 3.2.0.
    Note: On Mac OS systems, remove -w0 at the end of the source value so that you do not receive an error when you apply the machineconfig object YAML file.
    cat << EOF | oc apply -f -
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 99-worker-cp4d-crio-conf
    spec:
      config:
        ignition:
          version: 3.1.0
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,$(cat /tmp/crio.conf | base64 -w0)        
            mode: 0644
            overwrite: true
            path: /etc/crio/crio.conf
    EOF
    
  4. Monitor all of the nodes to ensure that the changes are applied, by using the following command:
    watch oc get nodes
    You can also use the following command to confirm that the MachineConfig sync is complete:
    watch oc get mcp

Kernel parameter settings


Enabling unsafe sysctls in on-premises and private cloud deployments
Configure kubelet to allow Db2U to make unsafe sysctl calls for Db2 to manage required memory settings.
For more information, see the instructions for enabling unsafe sysctls in the Red Hat OpenShift Container Platform documentation:
Note: This procedure applies to on-premises and private cloud deployments of Cloud Pak for Data. It does not apply to Cloud Pak for Data deployments on IBM Cloud.
  1. Update all of the nodes to use a custom KubletConfig:
    cat << EOF | oc apply -f -
    apiVersion: machineconfiguration.openshift.io/v1
    kind: KubeletConfig
    metadata:
      name: db2u-kubelet
    spec:
      machineConfigPoolSelector:
        matchLabels:
          db2u-kubelet: sysctl
      kubeletConfig:
        allowedUnsafeSysctls:
          - "kernel.msg*"
          - "kernel.shm*"
          - "kernel.sem"
    EOF
  2. Update the label on the machineconfigpool:
    oc label machineconfigpool worker db2u-kubelet=sysctl
  3. Wait for the cluster to restart and then run the following command to verify that the machineconfigpool is updated:
    oc get machineconfigpool
    The command should return output with the following format:
    NAME     CONFIG   UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
    master   master   True      False      False      3              3                   3                     0                      139m
    worker   worker   False     True       False      5              1                   1                     0                      139m

    Wait until all of the worker nodes are updated and ready.



Changing kernel parameter settings on IBM Cloud
Note: If you install Cloud Pak for Data and services on IBM Cloud by using the IBM Cloud Catalog, the kernel parameter settings are automatically applied to your cluster as part of the installation. You do not need to manually change these settings.

If you install Cloud Pak for Data and services on IBM Cloud without using the IBM Cloud Catalog, you must manually change the kernel parameter settings by applying a custom Kubernetes daemon set. For more information, see Modifying default worker node settings to optimize performance in the IBM Cloud documentation. Update the values in the daemon set based on the recommended settings for Cloud Pak for Data. For more information, see Kernel parameter requirements (Linux).



Changing kernel parameter settings in environments that do not support kubelet settings

In some environments, you might not be able to use kubelet settings to change kernel parameter. For example, you might not have permission to access the KubeletConfig and MachineConfig commands due to security restrictions. In these cases, you can verify the kernel parameters to ensure that certain services can run correctly. You can use the Red Hat OpenShift Node Tuning Operator to calculate the correct kernel parameters. For more information, see Using the Red Hat OpenShift Node Tuning Operator to set kernel parameters.


Power settings

On Power® Systems, you must complete the following steps to change the simultaneous multithreading (SMT) settings and set the kernel argument slub_max_order to 0 for small core, Kernel-based Virtual Machine (KVM) capable (LC922, IC922, AC922) systems, and big core, PowerVM® capable systems (L922, E950, E980, S922).
Note: You need to set the kernel argument slub_max_order to 0 only if your OpenShift Container Platform version is earlier than 4.8. Remove the kernel argument setting from the YAML file if your OpenShift Container Platform version is 4.8 or later.
  1. Label all small core KVM capable worker nodes that are not running Db2 Warehouse workloads to SMT=2. For example:
    oc label node <node> SMT=2 --overwrite
  2. Label all small core KVM capable worker nodes that are running Db2 Warehouse workloads to SMT=4. For example:
    oc label node <node> SMT=4 --overwrite
  3. Label all big core PowerVM capable worker nodes that are not running Db2 Warehouse workloads to SMT=4. For example:
    oc label node <node> SMT=4 --overwrite
  4. Label all big core PowerVM capable worker nodes that are running Db2 Warehouse workloads to SMT=8. For example:
    oc label node <node> SMT=8 --overwrite 
  5. Create a YAML file, smt.yaml, with the following content:
    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 99-worker-smt
    spec:
      kernelArguments:
      - slub_max_order=0
      config:
        ignition:
          version: 3.1.0
        storage:
          files:
          - contents:
              source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKZXhwb3J0IFBBVEg9L3Jvb3QvLmxvY2FsL2Jpbjovcm9vdC9iaW46L3NiaW46L2JpbjovdXNyL2xvY2FsL3NiaW46L3Vzci9sb2NhbC9iaW46L3Vzci9zYmluOi91c3IvYmluCmV4cG9ydCBLVUJFQ09ORklHPS92YXIvbGliL2t1YmVsZXQva3ViZWNvbmZpZwpDT1JFUFM9JCgvYmluL2xzY3B1IHwgL2Jpbi9hd2sgLUY6ICcgJDEgfiAvXkNvcmVcKHNcKSBwZXIgc29ja2V0JC8ge3ByaW50ICQyfSd8L2Jpbi94YXJncykKU09DS0VUUz0kKC9iaW4vbHNjcHUgfCAvYmluL2F3ayAtRjogJyAkMSB+IC9eU29ja2V0XChzXCkkLyB7cHJpbnQgJDJ9J3wvYmluL3hhcmdzKQpsZXQgVE9UQUxDT1JFUz0kQ09SRVBTKiRTT0NLRVRTCk1BWFRIUkVBRFM9JCgvYmluL2xzY3B1IHwgL2Jpbi9hd2sgLUY6ICcgJDEgfiAvXkNQVVwoc1wpJC8ge3ByaW50ICQyfSd8L2Jpbi94YXJncykKbGV0IE1BWFNNVD0kTUFYVEhSRUFEUy8kVE9UQUxDT1JFUwpDVVJSRU5UU01UPSQoL2Jpbi9sc2NwdSB8IC9iaW4vYXdrIC1GOiAnICQxIH4gL15UaHJlYWRcKHNcKSBwZXIgY29yZSQvIHtwcmludCAkMn0nfC9iaW4veGFyZ3MpCgpTTVRMQUJFTD0kKC9iaW4vb2MgZ2V0IG5vZGUgJEhPU1ROQU1FIC1MIFNNVCAtLW5vLWhlYWRlcnMgfC9iaW4vYXdrICd7cHJpbnQgJDZ9JykKaWYgW1sgLW4gJFNNVExBQkVMIF1dCiAgdGhlbgogICAgY2FzZSAkU01UTEFCRUwgaW4KICAgICAgMSkgVEFSR0VUU01UPTEKICAgIDs7CiAgICAgIDIpIFRBUkdFVFNNVD0yCiAgICA7OwogICAgICA0KSBUQVJHRVRTTVQ9NAogICAgOzsKICAgICAgOCkgVEFSR0VUU01UPTgKICAgIDs7CiAgICAgICopIFRBUkdFVFNNVD0kQ1VSUkVOVFNNVCA7IGVjaG8gIlNNVCB2YWx1ZSBtdXN0IGJlIDEsIDIsIDQsIG9yIDggYW5kIHNtYWxsZXIgdGhhbiBNYXhpbXVtIFNNVC4iCiAgICA7OwogICAgZXNhYwogIGVsc2UKICAgIFRBUkdFVFNNVD0kTUFYU01UCmZpCgpDVVJSRU5UU01UPSQoL2Jpbi9sc2NwdSB8IC9iaW4vYXdrIC1GOiAnICQxIH4gL15UaHJlYWRcKHNcKSBwZXIgY29yZSQvIHtwcmludCAkMn0nfC9iaW4veGFyZ3MpCgppZiBbWyAkQ1VSUkVOVFNNVCAtbmUgJFRBUkdFVFNNVCBdXQogIHRoZW4KICAgIElOSVRPTlRIUkVBRD0wCiAgICBJTklUT0ZGVEhSRUFEPSRUQVJHRVRTTVQKICAgIGlmIFtbICRNQVhTTVQgLWdlICRUQVJHRVRTTVQgXV0KICAgICAgdGhlbgogICAgICAgIHdoaWxlIFtbICRJTklUT05USFJFQUQgLWx0ICRNQVhUSFJFQURTIF1dCiAgICAgICAgZG8KICAgICAgICAgIE9OVEhSRUFEPSRJTklUT05USFJFQUQKICAgICAgICAgIE9GRlRIUkVBRD0kSU5JVE9GRlRIUkVBRAoKICAgICAgICAgIHdoaWxlIFtbICRPTlRIUkVBRCAtbHQgJE9GRlRIUkVBRCBdXQogICAgICAgICAgZG8KICAgICAgICAgICAgL2Jpbi9lY2hvIDEgPiAvc3lzL2RldmljZXMvc3lzdGVtL2NwdS9jcHUkT05USFJFQUQvb25saW5lCiAgICAgICAgICAgIGxldCBPTlRIUkVBRD0kT05USFJFQUQrMQogICAgICAgICAgZG9uZQogICAgICAgICAgbGV0IElOSVRPTlRIUkVBRD0kSU5JVE9OVEhSRUFEKyRNQVhTTVQKICAgICAgICAgIHdoaWxlIFtbICRPRkZUSFJFQUQgLWx0ICRJTklUT05USFJFQUQgXV0KICAgICAgICAgIGRvCiAgICAgICAgICAgIC9iaW4vZWNobyAwID4gL3N5cy9kZXZpY2VzL3N5c3RlbS9jcHUvY3B1JE9GRlRIUkVBRC9vbmxpbmUKICAgICAgICAgICAgbGV0IE9GRlRIUkVBRD0kT0ZGVEhSRUFEKzEKICAgICAgICAgIGRvbmUKICAgICAgICAgIGxldCBJTklUT0ZGVEhSRUFEPSRJTklUT0ZGVEhSRUFEKyRNQVhTTVQKICAgICAgICBkb25lCiAgICAgIGVsc2UKICAgICAgICBlY2hvICJUYXJnZXQgU01UIG11c3QgYmUgc21hbGxlciBvciBlcXVhbCB0aGFuIE1heGltdW0gU01UIHN1cHBvcnRlZCIKICAgIGZpCmZp
              verification: {}
            filesystem: root
            mode: 0755
            overwrite: true
            path: /usr/local/bin/powersmt
        systemd:
          units:
            - name: smt.service
              enabled: true
              contents: |
                [Unit]
                Description=Set SMT
                After=network-online.target
                Before= crio.service
                [Service]
                Type=oneshot
                RemainAfterExit=yes
                ExecStart=/usr/local/bin/powersmt
                [Install]
                WantedBy=multi-user.target
    
  6. Run the oc create command to apply the changes.

    Note: You must ensure that the cluster master nodes (or control plane) are in Ready status before you issue this command.

    oc create -f smt.yaml
    Your worker nodes will perform a rolling reboot action to update the kernel argument slub_max_order and set the labeled SMT level.
    Note:
    • All the worker nodes are rebooted after the command is issued. The slub_max_order=0 kernel argument and the specified SMT level are applied to all the worker nodes after the reboot completes. The SMT level on the worker nodes that are not labeled will be set to the default value.
    • After this process is done, if the SMT level on a particular worker node needs to be changed, you must label that worker node with the desired SMT level and manually reboot it.