Known issues and limitations

Review the known issues for version 3.2.0.

Installation, configuration, and upgrade

Container fails to start due to Docker issue

Installation fails during container creation due to a Docker 18.03.1 issue. If you have a subpath in the volume mount, you might receive the following error from the kubelet service, which fails to start the container:

Error: failed to start container "heketi": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/7e9cb34c-b2bf-11e8-a9eb-0050569bdc9f/volume-subpaths/heketi-db-secret/heketi/0\\\" to rootfs \\\"/var/lib/docker/overlay2/ca0a54812c6f5718559cc401d9b73fb7ebe43b2055a175ee03cdffaffada2585/merged\\\" at \\\"/var/lib/docker/overlay2/ca0a54812c6f5718559cc401d9b73fb7ebe43b2055a175ee03cdffaffada2585/merged/backupdb/heketi.db.gz\\\" caused \\\"no such file or directory\\\"\"": unknown

For more information, see the Kubernetes documentation Opens in a new tab.

To resolve this issue, delete the failed pod and try the installation again.

Incorrect version displays after fix pack rollback

If you roll back to version 3.2.0 from a fix pack version, the version file can still show the fix pack version. For instance, if you roll back from fix pack version 3.2.0.2001 to version 3.2.0, the version file /opt/ibm/cfc/version still shows version 3.2.0.2001. However, the console shows the correct version level.

Incorrect status displays after fix pack rollback

If you roll back to version 3.2.0 from a fix pack version, the Status column on the Helm releases page in the console might not show the status properly for some Helm releases. The status for these releases might incorrectly show as still loading, however, the associated nodes are still available.

Security and compliance

Vulnerability Advisor cross-architecture image scanning does not work with glibc version earlier than 2.22

Vulnerability Advisor (VA) now supports cross-architecture image scanning with QEMU (Quick EMUlator). You can scan Linux® on Power® (ppc64le) CPU architecture images with VA running on Linux® nodes. Alternatively, you can scan Linux CPU architecture images with VA running on Linux® on Power® (ppc64le) nodes.

When you are scanning Linux images, you must use glibc version 2.22 or later. If you use glibc version earlier than 2.22, the scan might not work when VA runs on Linux® on Power® (ppc64le) nodes. glibc versions earlier than 2.22 make certain system calls (time/vgetcpu/getttimeofday) by using vsyscall mechanisms. The system call implementation attempts to access hardcoded static address, which QEMU fails to translate while it is running in emulation mode.

Vulnerability Advisor policy resets to default setting after upgrade from 3.1.2 in ppc64le cluster

If you enabled Vulnerability Advisor (VA) on your Linux® on Power® (ppc64le) cluster in 3.1.2, the Vulnerability Advisor policy resets to the default setting when you upgrade to 3.2.0. To fix this issue, reset the VA policy in the management console.

ACME HTTP issuer cannot issue certificates in OpenShift clusters

IBM Cloud Private Version 3.2.0 does not apply the required permissions to the default service account for the certificate manager service in OpenShift clusters. This limitation prevents the ACME HTTP issuer from being able to process challenge requests, which prevents certificates from being issued from this issuer.

ACME HTTP issuer image is not copied to the worker nodes

The ACME HTTP issuer is added in IBM Cloud Private Version 3.2.0. You can configure the ACME HTTP issuer in your cluster to create certificates from a trusted certificate authority (CA). This feature is optional. If you choose to configure this feature in your cluster, you must complete either of the following steps:

Cannot get secret by using kubectl command when encryption of secret data at rest is enabled

When you enable encryption of secret data at rest, and use kubectl command to get the secret, sometimes you might not be able to get the secret. You might see the following error message in kube-apiserver:

Internal error occurred: invalid padding on input

This error occurs because kube-apiserver failed to decrypt the encrypted data in etcd. For more information about the issue, see Random "invalid padding on input" errors when attempting various kubectl operations Opens in a new tab.

To resolve the issue, delete the secret and re-create it. Use the following command:

kubectl -n <namespace> delete secret <secret>

For more information about encrypting secret data at rest, see Encrypting Secret Data at Rest Opens in a new tab.

LDAP user names are case-sensitive

User names are case-sensitive. You must use the name exactly the way it is configured in your LDAP directory.

Vulnerability Advisor cannot scan unsupported container images

Container images that are not supported by the Vulnerability Advisor fail the security scan.

The Security Scan column displays Failed from the Container Images page in the management console. When you select failed container image name to view more details, zero issues are detected.

Network

When Federal Information Processing Standard (FIPS) is enabled, cookie affinity doesn't work because nginx.ingress.kubernetes.io/session-cookie-hash can be set only to sha1/md5/index, which is not supported in FIPS mode.

IPV6 is not supported

IBM Cloud Private cannot use IPV6 networks. Comment out the settings in the /etc/hosts file on each cluster node to remove the IPV6 settings. For more information, see Configuring your cluster.

Calico prefix limitation on Linux® on Power® (ppc64le) nodes

If you install IBM Cloud Private on PowerVM Linux LPARs and your virtual Ethernet devices use the ibmveth prefix, you must set the network adapter to use Calico networking. During installation, be sure to set a calico_ip_autodetection_method parameter value in the config.yaml file. The setting resembles the following content:

calico_ip_autodetection_method: interface=<device_name>

The <device_name> parameter is the name of your network adapter. You must specify the ibmveth0 interface on each node of the cluster, including the worker nodes.

Note: If you used PowerVC to deploy your cluster node, this issue does not affect you.

Enable Ingress Controller to use a new annotation prefix

Intermittent failure when you log in to the management console in HA clusters that use NSX-T 2.3 or 2.4

In HA clusters that use NSX-T 2.3 or 2.4, you might not be able to log in to the management console. After you specify the login credentials, you are redirected to the login page. You might have to try logging in multiple times until you succeed. This issue is intermittent.

Encrypting cluster data network traffic with IPSec does not work on SLES 12 SP3 operating system

strongSwan version 5.3.3 or higher is necessary to deploy IPSec MeSH configuration for cluster data network traffic encryption. In SUSE Linux Enterprise Server (SLES) 12 SP3, the default strongSwan version is 5.1.3, which is not suitable for IPSec MeSH configuration.

Elasticsearch does not work with GlusterFS

Elasticsearch does not work correctly with GlusterFS that is configured in an IBM® Cloud Private environment. This issue is due to the following AlreadyClosedException error. For more information, see Red Hat Bugzilla – Bug 1430659.

[2019-01-17T10:53:49,750][WARN ][o.e.c.a.s.ShardStateAction] [logging-elk-master-7df4b7bdfc-5spqc] \
[logstash-2019.01.16][3] received shard failed for shard id [[logstash-2019.01.16][3]], allocation id \
[n9ZpABWfS4qJCyUIfEgHWQ], primary term [0], message [shard failure, reason \
[already closed by tragic event on the index writer]], \
failure [AlreadyClosedException[Underlying file changed by an external force at 2019-01-17T10:44:48.410502Z,\
(lock=NativeFSLock(path=/usr/share/elasticsearch/data/nodes/0/indices/R792nkojQ7q1UCYSEO4trQ/3/index/write.lock,\
impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2019-01-17T10:44:48.410324Z))]]
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2019-01-17T10:44:48.410502Z,\
(lock=NativeFSLock(path=/usr/share/elasticsearch/data/nodes/0/indices/R792nkojQ7q1UCYSEO4trQ/3/index/write.lock,\
impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2019-01-17T10:44:48.410324Z))

NGINX ingress rewrite-target annotation fails when you upgrade to IBM Cloud Private Version 3.2.0

IBM® Cloud Private Version 3.2.0 uses NGINX Ingress Controller Version 0.23.0. Starting in NGINX Ingress Controller Version 0.22.0, ingress definitions that use the annotation nginx.ingress.kubernetes.io/rewrite-target are not compatible with an earlier version. For more information, see Rewrite Target Opens in a new tab.

When you upgrade to IBM Cloud Private Version 3.2.0, you must replace the ingress.kubernetes.io/rewrite-target annotation with the following piece of code:

    ingress.kubernetes.io/use-regex: "true"
    ingress.kubernetes.io/configuration-snippet: |
      rewrite "(?i)/old/(.*)" /new/$1 break;
      rewrite "(?i)/old$" /new/ break;

Where, old is the path that is defined in your ingress resource, and new is the URI to access your application.

For example, if web/nginx is the path for your NGINX application in the ingress source, and the URI to access your application is /, then you rewrite the annotation as shown in the following example:

# kubectl get ingress nginx -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.kubernetes.io/configuration-snippet: |
      rewrite "(?i)/web/nginx/(.*)" /$1 break;
      rewrite "(?i)/web/nginx$" / break;
    ingress.kubernetes.io/use-regex: "true"
  name: nginx
  namespace: default
spec:
  rules:
  - host: demo.nginx.net
    http:
      paths:
      - backend:
          serviceName: nginx
          servicePort: 80
        path: /web/nginx
status:
  loadBalancer:
    ingress:
    - ip: 9.30.118.39

# curl http://demo.nginx.net/web/nginx
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>

Pods not reachable from NGINX ingress controller in OpenShift Version 3.11 multitenant mode

In OpenShift Version 3.11 clusters with multitenant isolation mode, each project is isolated by default. Network traffic is not allowed between pods or services in different projects.

To resolve the issue, disable network isolation in the kube-system project.

oc adm pod-network make-projects-global kube-system

Storage

Cannot restart node when using vSphere storage that has no replica

Shutting down a cluster in an IBM Cloud Private environment that uses vSphere Cloud moves the pod to another node in your cluster. However, the vSphere volume that the pod uses on the original node is not detached from the node. An error might occur when you try to restart the node.

To resolve the issue, first detach the volume from the node. Then, restart the node.

GlusterFS cluster becomes unusable if you configure a vSphere Cloud Provider after installing IBM Cloud Private

By default, the kubelet uses the IP address of the node as the node name. When you configure a vSphere Cloud Provider, kubelet uses the host name of the node as the node name. If you had your GlusterFS cluster set up during installation of IBM Cloud Private, Heketi creates a topology by using the IP address of the node.

When you configure a vSphere Cloud Provider after you install IBM Cloud Private, your GlusterFS cluster becomes unusable. This issue occurs because the kubelet identifies nodes by their host names, but Heketi still uses IP addresses to identify the nodes.

If you plan to use both GlusterFS and a vSphere Cloud Provider in your IBM Cloud Private cluster, ensure that you set kubelet_nodename: hostname in the config.yaml file during installation.

Monitoring and logging

Elasticsearch type mapping limitations

The IBM Cloud Private logging component uses Elasticsearch to store and index logs that are received from all the running containers in the cluster. If containers emit logs in JSON format, each field in the JSON is indexed by Elasticsearch to allow queries to use the fields. However, if two containers define the same field while they send different data types, Elasticsearch is not able to index the field correctly. The first type that is received for a field each day sets the accepted type for the rest of the day. This action results in two problems:

Alerting, logging, or monitoring pages displays 500 Internal Server Error

To resolve this issue, complete the following steps from the master node:

  1. Create an alias for the insecure kubectl api log in by running the following command:

    alias kc='kubectl -n kube-system'
    
  2. Edit the configuration map for Kibana. Run the following command:

    kc edit cm kibana-nginx-config
    

    Add the following updates:

     upstream kibana {
     server localhost:5602;
     }
     Change localhost to 127.0.0.1
    
  3. Locate and restart the Kibana pod by running the following commands:

     kc get pod | grep -i kibana
    
     kc delete pod <kibana-POD_ID>
    
  4. Edit the configuration map for Grafana by running the following command:

    kc edit cm grafana-router-nginx-config
    

    Add the following updates:

    upstream grafana {
    server localhost:3000;
    }
    Change localhost to 127.0.0.1
    
  5. Locate and restart the Grafana pod by running the following commands:

    kc get pod | grep -i monitoring-grafana
    
    kc delete pod <monitoring-grafana-POD_ID>
    
  6. Edit the configuration map for the alert manager by running the following command:

    kc edit cm alertmanager-router-nginx-config
    

    Add the following updates:

    upstream alertmanager {
    server localhost:9093;
    }
    Change localhost to 127.0.0.1
    
  7. Locate and restart the alert manager by running the following commands:

    kc get pod | grep -i monitoring-prometheus-alertmanager
    
    kc delete pod <monitoring-prometheus-alertmanager-POD_ID>
    

Monitoring data is not retained if you use a dynamically provisioned volume during upgrade

If you use a dynamically provisioned persistent volume to store monitoring data, the data is lost after you upgrade the monitoring service from 2.1.0.2 to 2.1.0.3.

Prometheus data source in Grafana is lost during a rollback of IBM Cloud Private

When you roll back from IBM Cloud Private Version 3.2.0 to 3.1.2, the Prometheus data source in Grafana is lost. The Grafana dashboards do not display any metric.

To resolve the issue, add back the Prometheus data source by configuring the data source. For more information, see Manually configure a Prometheus data source in Grafana.

Data in Prometheus database is lost during a rollback of IBM Cloud Private

With IBM Cloud Private Version 3.2.0, Prometheus is updated from version 2.3.1 to version 2.8.0 and is no longer backwards compatible. Data is retained when you upgrade from an earlier version of IBM Cloud Private to IBM Cloud Private Version 3.2.0. However, during a rollback of the monitoring service to a previous version of IBM Cloud Private the Prometheus database is retained, but the data within the database is lost. The data is lost as the version of Prometheus that is used with the previous version of IBM Cloud Private does not support the data format that is used with IBM Cloud Private Version 3.2.0.

Logging ELK pods are in CrashLoopBackOff state

Logging ELK pods continue to appear in CrashLoopBackOff state after you upgrade to the current version and increasing memory.

This issue is a known issue Opens in a new tab in Elasticsearch 5.5.1.

Note: If you have more than one data-pod, repeat steps 1-8 for each pod. For example, logging-elk-data-0, logging-elk-data-1, or logging-elk-data-2.

Complete the following steps to resolve this issue.

  1. Check the log to find the problematic file that contains the permission issue.

    java.io.IOException: failed to write in data directory [/usr/share/elasticsearch/data/nodes/0/indices/dT4Nc7gvRLCjUqZQ0rIUDA/0/translog] write permission is required
    
  2. Get the IP address of the management node where the logging-elk-data-1 pod is running.

    kubectl -n kube-system get pods -o wide | grep logging-elk-data-1
    
  3. Use SSH to log in to the management node.

  4. Navigate to the /var/lib/icp/logging/elk-data directory.

    cd /var/lib/icp/logging/elk-data
    
  5. Find all .es_temp_file files.

    find ./ -name "*.es_temp_file"
    
  6. Delete all *.es_temp_file files that you find in step 5.

    rm -rf *.es_temp_file
    
  7. Delete the old logging-elk-data-1 pod.

    kubectl -n kube-system delete pods logging-elk-data-1
    
  8. Wait 3-5 minutes, for the new logging-elk-data-1 pod to restart.

    kubectl -n kube-system get pods -o wide | grep logging-elk-data-1
    

Logs not working after logging pods are restarted

You might encounter the following problems:

To resolve this issue, complete the following steps to run a Search Guard initialization job:

  1. Save the existing Search Guard initialization job to a file.

      kubectl get job.batch/<RELEASE_PREFIX>-elasticsearch-searchguard-init -n kube-system -o yaml > sg-init-job.yaml
    

    Logging in IBM Cloud Private version 3.2.0 changed to remove the job after completion. If you do not have an existing job from which to extract the settings to a file, you can save the following YAML file to the sg-init-job.yaml file.

     apiVersion: batch/v1
      kind: Job
      metadata:
        labels:
          app: <RELEASE_PREFIX>-elasticsearch
          chart: ibm-icplogging-2.2.0 # Update to the correct version of logging installed. Current chart version can be found in the Service Catalog
          component: searchguard-init
          heritage: Tiller
          release: logging
        name: <RELEASE_PREFIX>-elasticsearch-searchguard-my-init-job # change this to a unique value
        namespace: kube-system
      spec:
        backoffLimit: 6
        completions: 1
        parallelism: 1
        template:
          metadata:
            creationTimestamp: null
            labels:
              app: <RELEASE_PREFIX>-elasticsearch
              chart: ibm-icplogging
              component: searchguard-init
              heritage: Tiller
              release: logging
              role: initialization
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                  - matchExpressions:
                    - key: beta.kubernetes.io/arch
                      operator: In
                      values:
                      - amd64
                      - ppc64le
                      - s390x
                    - key: management
                      operator: In
                      values:
                      - "true"
            containers:
            - env:
              - name: APP_KEYSTORE_PASSWORD
                value: Y2hhbmdlbWU=
              - name: CA_TRUSTSTORE_PASSWORD
                value: Y2hhbmdlbWU=
              - name: ES_INTERNAL_PORT
                value: "9300"
              image: ibmcom/searchguard-init:2.0.1-f2 # This value may be different from the one on your system; double check by running docker image | grep searchguard-init
              imagePullPolicy: IfNotPresent
              name: searchguard-init
              resources: {}
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
              volumeMounts:
              - mountPath: /usr/share/elasticsearch/config/searchguard
                name: searchguard-config
              - mountPath: /usr/share/elasticsearch/config/tls
                name: certs
                readOnly: true
            dnsPolicy: ClusterFirst
            restartPolicy: OnFailure
            schedulerName: default-scheduler
            securityContext: {}
            terminationGracePeriodSeconds: 30
            tolerations:
            - effect: NoSchedule
              key: dedicated
              operator: Exists
            volumes:
            - configMap:
                defaultMode: 420
                name: <RELEASE_PREFIX>-elasticsearch-searchguard-config
              name: searchguard-config
            - name: certs
              secret:
                defaultMode: 420
                secretName: <RELEASE_PREFIX>-certs
    

    Notes:

    1. Modify your chart version to the version that is installed on your system. You can find the current chart version in the Service Catalog.
    2. This image might be different from the one on your system: image: ibmcom/searchguard-init:2.0.1-f2. Run command, docker image | grep searchguard-init to confirm that the correct image is installed on your system.
    3. The <RELEASE_PREFIX> value for managed mode logging instances is different from the value for standard mode logging instances.
      • For managed logging instances that are installed with IBM Cloud Private installer, the value is logging-elk.
      • For standard logging instances that are installed after IBM Cloud Private installation from either the Service Catalog or by using the Helm CLI, the value is <RELEASE-NAME>-ibm-icplogging. <RELEASE-NAME> is the name that is given to the Helm release when this logging instance is installed.
  2. Edit the job file.

    1. Remove everything under metadata.* except for the following parameters:
      • metadata.name
      • metadata.namespace
      • metadata.labels.*
    2. Change metadata.name and spec.template.metadata.job-name to new names.
    3. Remove spec.selector, spec.template.metadata.labels.controller-uid
    4. Remove status.*
  3. Save the file.

  4. Run the job.
    kubectl apply -f sg-init-job.yaml
    

Kibana page displays error when OS SELinux and Docker SELinux are enabled

When OpenShift SELinux and Docker SELinux are enabled, the Kibana page displays the following error:

No matching indices found: No indices match pattern "logstash-*"

To fix this problem, you must enable hostPID: true for the Filebeat daemonset. After ICP installation, run the kubectl edit ds logging-elk-filebeat-ds command to add hostPID: true. For example:

securityContext:
  runAsUser: 0
hostPID: true

After you edit daemonset logging-elk-filebeat-ds, run command oc -n kube-system delete po <filebeat pod name> to recreate the Filebeat pods.

Prometheus container fails due to OOMKilled error during startup

You might encounter this problem when too much data remains in the /var/lib/prometheus/data/wal path inside the Prometheus container. During Prometheus startup, the data is loaded into memory. The load attempt fails due to a OOMKilled error that occurs due to lack of available memory to load the data.

To work around this problem, you can either increase Prometheus memory, or delete the /wal folder from the volume.
Note: Deleting the /wal folder can lead to data loss.

Logging-elk-kibana deployment is unhealthy and pod displays "Liveness probe errored"

After you install IBM Cloud Private-CE (Community Edition) on Linux on Power (ppc64le), the logging-elk-kibana deployment appears to be unhealthy. kubectl describe pod logging-elk-kibana-7cbb9d996f-pspv5 -n kube-system displays the following errors:

  Warning  Unhealthy  13m (x159 over 16h)   kubelet, 9.114.192.99  Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  Unhealthy  109s (x161 over 16h)  kubelet, 9.114.192.99  Readiness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded

Details of the test environment:

To work around this problem, issue the following command to check the free memory on your master and management nodes:

free -h

If the free memory is low, and available memory is high, issue the following command to release memory:

sync; echo 3 > /proc/sys/vm/drop_caches

Logs older than what is specified in log retention policy are re-created if Filebeat is restarted

A curator background job is deployed as part of the IBM Cloud Private logging service. To free disk space, the job runs once a day to remove old log data based on your retention settings.

If Filebeat pods are restarted, Filebeat finds all existing log files, and reprocesses and reingests them. This activity includes log entries that are older than what is specified by the log retention policy. This behavior can cause older logs to be reindexed to Elasticsearch, and appear in the logging console until the curator job runs again. If this behavior is problematic, you can manually delete indices older than your retention settings. For more information, see Manually removing log indices.

VolumeMounts error when you upgrade a monitoring chart

When you upgrade a monitoring chart from the version that is used in IBM Cloud Private Version 3.2.0 or earlier, the Helm upgrade fails with an error that resembles the following message:

Error: UPGRADE FAILED: Failed to recreate resource: Deployment.apps "monitoring-grafana" is invalid: spec.template.spec.containers[0].volumeMounts[5].name: Not found: "monitoring-certs"

To resolve the problem during a cluster upgrade, add the following section to your config.yaml:

upgrade_override:
  monitoring:
    tls:
      enabled: true

Platform management

Resource quota might not update

You might find that the resource quota is not updating in the cluster. This behavior is due to an issue in the kube-controller-manager. The workaround is to stop the kube-controller-manager leader container on the master nodes and let it restart. If high availability is configured for the cluster, you can check the kube-controller-manager log to find the leader. Only the leader kube-controller-manager is working. The other controllers wait to be elected as the new leader once the current leader is down.

For example:

# docker ps | grep hyperkube | grep controller-manager
97bccea493ea        4c7c25836910                                                                                              "/hyperkube controll…"   7 days ago          Up 7 days                               k8s_controller-manager_k8s-master-9.111.254.104_kube-system_b0fa31e0606015604c409c09a057a55c_2

To stop the leader, run the following command with the ID of the Docker process:

docker rm -f 97bccea493ea

The Key Management Service must deploy to a management node in a Linux® platform

The Key Management Service is deployed to the management node and is supported only on the Linux® platform. If there is no amd64 management node in the cluster, the Key Management Service is not deployed.

Synchronizing repositories might not update Helm chart contents

Synchronizing repositories takes several minutes to complete. While synchronization is in progress, there might be an error if you try to display the readme file. After synchronization completes, you can view the readme file and deploy the chart.

Helm repository names cannot contain DBCS GB18030 characters

Do not use DBCS GB18030 characters in the Helm repository name when you add the repository.

Container fails to operate or a kernel panic occurs

The following error might occur from the IBM Cloud Private node console or kernel log:

  kernel:unregister_netdevice: waiting for <eth0> to become free.

If you receive this error, the log displays both kernal:unregister_netdevice: waiting for <eth0> to be free and containers fail to operate. Continue to troubleshoot. If you meet all required conditions, reboot the node.

View https://github.com/kubernetes/kubernetes/issues/64743 Opens in a new tab to learn about the Linux Kernel bug that causes the error.

Timeouts and blank screens when displaying more than 80 namespaces

If a cluster has large number of namespaces, more than 80, you might see the following issues:

Cloning an IBM Cloud Private worker node is not supported

IBM Cloud Private does not support cloning an existing IBM Cloud Private worker node. You cannot change the host name and IP address of a node on your existing cluster.

You must add a worker node. For more information, see Adding an IBM Cloud Private cluster node.

When you add users or user groups to your team, you can search for individual users and groups. As you type into the LDAP search bar, suggestions that are associated with the search query do not automatically appear. You must press the enter key to obtain results from the LDAP server. For more information, see Create teams.

Pod liveness or readiness check might fail because Docker failed to run some commands in the container

After you upgrade from IBM Cloud Private version 3.1.0, 3.1.1, or 3.1.2 to version 3.2.0, the readiness or liveness checks for some pods might fail. This failure can also happen when you deploy a workload in the cluster, or when you shut down or restart the management node in the cluster. This issue might occur on Prometheus pods, Grafana pods, or other pods.

Depending on factors like network stability, the readiness and liveness probes can take longer to start than the time allowed before a readiness request is sent. If they are not started when a request is sent, then they don't return a ready status.

The returned status request looks similar to the following example:

# kubectl get pods -o wide --all-namespaces |grep monitor
kube-system    monitoring-grafana-59bfb7859b-f9zrd                 2/3     Running     0          43m     10.1.1.1   9.1.1.1    <none>           <none>
kube-system    monitoring-prometheus-75b7444496-zzl7b                         3/4     Running     0          43m     10.1.1.1    9.1.1.1    <none>           <none>

Check the event log for the pod to see whether there are entries that are similar to the following content:

Events:
  Type     Reason     Age                    From                   Message
  ----     ------     ----                   ----                   -------
  **Warning  Unhealthy  2m29s (x23 over 135m)  kubelet, 9.1.1.1  Readiness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  Unhealthy  2m13s (x23 over 135m)  kubelet, 9.1.1.1  Liveness probe errored: rpc error: code = DeadlineExceeded desc = context deadline exceeded**

To work around this issue, remove the failed pod and let it deploy again. You can also restart the Docker service on the cluster.

Pods show CreateContainerConfigError

After you install IBM Cloud Private, the following pods show CreateContainerConfigError error:

# kubectl get pods -o wide --all-namespaces |grep -v "Running" |grep -v "Completed"
NAMESPACE      NAME                                     READY    STATUS                            
kube-system    logging-elk-kibana-init-6z95k             0/1     CreateContainerConfigError       
kube-system    metering-dm-79d6f5894d-q2qpm              0/1     Init:CreateContainerConfigError   
kube-system    metering-reader-4tzgz                     0/1     Init:CreateContainerConfigError  
kube-system    metering-reader-5hjvm                     0/1     Init:CreateContainerConfigError  
kube-system    metering-reader-gsm44                     0/1     Init:CreateContainerConfigError  
kube-system    metering-ui-7dd45b4b6c-th2pg              0/1     Init:CreateContainerConfigError  
kube-system    secret-watcher-6bd4675db7-mcb64           0/1     CreateContainerConfigError       
kube-system    security-onboarding-262cp                 0/1     CreateContainerConfigError

The issue occurs when the pods are unable to create the IAM API key secret.

To resolve the issue, restart the iam-onboarding pod.

Complete the following steps:

  1. Install kubectl. For more information, see Installing the Kubernetes CLI (kubectl).

  2. Get the iam-onboarding pod ID and make a note of the pod ID.

    kubectl -n kube-system get pods -o wide | grep iam-onboarding
    
  3. Delete the iam-onboarding pod.

    kubectl -n kube-system delete pod <iam-onboarding-pod-id>
    
  4. Wait for 2 minutes and check the pod status.

    kubectl -n kube-system get pods -o wide | grep iam-onboarding
    

    The pod status shows as Running.

Some Pods not starting or log TLS handshake errors in IBM Power environment

In some cases when you are using IP-IP tunneling in an IBM Power environment, some of your Pods do not start or contain log entries that indicate TLS handshake errors. If you notice either of these issues, complete the following steps to resolve the issue:

  1. Run the ifconfig command or the netstat command to view the statistics of the tunnel device. The tunnel device is often named tunl0.

  2. View the changes in the TX dropped count that is displayed when you run the ifconfig command or the netstat command.

    If you use the netstat command, enter a command similar to the following command:

     netstat --interface=tunl0
    

    The output should be similar to the following content:

     Kernel Interface table
     Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
     tunl0     1300   904416      0      0 0        714067      0    806      0 ORU
    

    If you use the ifconfig command, run a command similar to the following command:

     ifconfig tunl0
    

    The output should be similar to the following content:

     tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1300
     inet 10.1.125.192  netmask 255.255.255.255
     tunnel   txqueuelen 1000  (IPIP Tunnel)
     RX packets 904377  bytes 796710714 (759.8 MiB)
     RX errors 0  dropped 0  overruns 0  frame 0
     TX packets 714034  bytes 125963495 (120.1 MiB)
     TX errors 0  dropped 806 overruns 0  carrier 0  collisions 0
    
  3. Run the command again. View the change in the TX dropped count that is displayed when you run the ifconfig command. Alternatively, view the change in the TX-DRP count that is displayed when you run the netstat command.

    If the value is continuously increasing, there is an MTU issue. To resolve it, lower the MTU settings of the tunnel and Pod interfaces, based on your network characteristics.

  4. Complete the following steps to change the Calico IP-IP tunnel MTU after it is deployed:

    1. Update the setting for veth_mtu and tunnel_mtu by running the following command:

      kubectl edit cm calico-config -n kube-system
      
    2. Restart the calico-node PODs for the changes to take effect by entering the following command:

      kubectl patch ds calico-node -n kube-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"calico-node","env":[{"name":"RESTART_","value":"'$(date +%s)'"}]}]}}}}'
      

The web-terminal alerts user with goodbye if no permission is assigned

The web terminal does not work for users who do not have permission to at least one namespace. The result is early termination with goodbye displayed. You need access to a namespace to access the web terminal. See a cluster administrator for access.

Azure Load Balancer and public IP are not removed when you uninstall IBM Cloud Private

If you deployed IBM Cloud Private on Azure cloud with the Azure cloud provider enabled and you created a LoadBalancer service type, such as istio-gateway, the Azure Load Balancer and public IP resource are not removed when you uninstall the IBM Cloud Private cluster.

To work around this issue, remove the related Azure Load Balancer and public IP from the Azure portal. Or, before you uninstall your IBM Cloud Private cluster, delete all LoadBalancer service types in all namespaces.

Pod regpod-checking has the termination status in AlertManager

When you scan an image, the regpod-checking pod is created. After the scan completes, the pod is automatically terminated.

Pod goes into a CrashLoopBackOff state when a modified subpath configmap mount fails

A pod goes into a CrashLoopBackOff state during the restart of the Docker service on a worker node. If you run the kubectl get pods command to check the pod that is in the CrashLoopBackOff state, you get the following error message:

level=error msg="Handler for POST /v1.31/containers/4a46aa25deac4af4bf60813d2c763e54499c0d12b9cd28b3d1990843e1e6c3d5/start returned error: OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:402: container init caused \\\"rootfs_linux.go:58: mounting \\\\\\\"/var/lib/kubelet/pods/78db74ec-2a7b-11ea-8ada-72f600a81a05/volume-subpaths/app-keystore/ich-mobilebanking-secured/6\\\\\\\" to rootfs \\\\\\\"/var/lib/docker/overlay2/31f618303ba3762398bbee05e51657c0f62f096d4cc9f600fb3ec18f047f94ba/merged\\\\\\\" at \\\\\\\"/var/lib/docker/overlay2/31f618303ba3762398bbee05e51657c0f62f096d4cc9f600fb3ec18f047f94ba/merged/opt/ibm/wlp/usr/servers/defaultServer/resources/security/app-truststore.jks\\\\\\\" caused \\\\\\\"no such file or directory\\\\\\\"\\\"\": unknown"

The CrashLoopBackOff state pod that has a subPath cannot be recovered if its volume contents are changed.

To recover the pod, delete the pod that has the CrashLoopBackOff error by using the following commands. When you delete the pod, the pod re-creates in a good state.

  1. Get information about the pods that are in CrashLoopBackOff state.

    kubectl get pods -n <namespace> | grep CrashLoopBackOff
    
  2. Delete the pod that has the CrashLoopBackOff error.

    kubectl delete pod <pod_name> -n <namespace>
    

Management console

Cannot log in to the management console with an LDAP user after restarting the leading master

If you cannot log in to the management console after you restart the leading master node in a high-availability cluster, take the following actions:

  1. Log in to the management console with the cluster administrator credentials. The user name is admin, and the password is admin.
  2. Click Menu > Manage > Identity & Access.
  3. Click Edit and then click Save.

    Note: LDAP users can log in to the management console.

If the problem persists, MongoDB and the pods that depend on auth-idp might not be running. Follow these instructions to identify the cause.

  1. Check whether MongoDB pod is running without any errors.

    • Use the following command to check the pod status. The pod must show the status as 1/1 Running. Check the logs, if required.

      kubectl -n kube-system get pods | grep -e mongodb
      
    • If the pod does not show the status as 1/1 Running, restart the pod by deleting it.

      kubectl -n kube-system delete pod -l app=icp-mongodb
      

      Wait for a minute or two for the pod to restart. Check the pod status by using the following command. The status must show 1/1 Running.

        kubectl -n kube-system get pods | grep -e mongodb
      
  2. After the MongoDB pod is running, restart the auth-idp pods by deleting them.

    kubectl -n kube-system delete pod -l k8s-app=auth-idp
    

    Wait for a minute or two for the pods to restart. Check the pod status by using the following command. The status must show 4/4 Running.

     kubectl -n kube-system get pods | grep auth-idp
    

The management console displays 502 Bad Gateway Error

The management console displays a 502 Bad Gateway Error after installing or rebooting the master node.

If you recently installed IBM Cloud Private, wait a few minutes and then reload the page.

If you rebooted the master node, take the following steps:

  1. Obtain the IP addresses of the icp-ds pods. From the master node, run the following command:

    kubectl get pods -o wide  -n kube-system | grep "icp-ds"
    
    kubectl patch ds calico-node -n kube-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"calico-node","env":[{"name":"RESTART_","value":"'$(date +%s)'"}]}]}}}}'
    

    The output resembles the following text:

    icp-ds-0                                                  1/1       Running       0          1d        10.1.231.171   10.10.25.134
    

    In this example, 10.1.231.171 is the IP address of the pod.

    In high availability (HA) environments, an icp-ds pod exists for each master node.

  2. From the master node, ping the icp-ds pods. Check the IP address for each icp-ds pod by running the following command for each IP address:

    ping 10.1.231.171
    

    If the output resembles the following text, you must delete the pod:

    connect: Invalid argument
    
  3. From the master node, delete each pod that is unresponsive by running the following command:

     kubectl delete pods icp-ds-0 -n kube-system
    

    In this example, icp-ds-0 is the name of the unresponsive pod.

    Important: In HA installations, you might have to delete the pod for each master node.

  4. From the master node, obtain the IP address of the replacement pod or pods by running the following command:

    kubectl get pods -o wide  -n kube-system | grep "icp-ds"
    

    The output resembles the following text:

    icp-ds-0                                                  1/1       Running       0          1d        10.1.231.172   10.10.2
    
  5. From the master node, ping the pods again and check the IP address for each icp-ds pod by running the following command for each IP address:

    ping 10.1.231.172
    

    If all icp-ds pods are responsive, you can access the IBM Cloud Private management console when that pod enters the available state.

Truncated labels are displayed on the dashboard for some languages

If you access the IBM Cloud Private dashboard in languages other than English from the Mozilla Firefox browser on a system that uses a Windows™ operating system, some labels might be truncated.

After applying a fix pack, the IBM Cloud Private Welcome page displays instead of the IBM Multicloud Manager Welcome page

When you apply a fix pack and IBM Multicloud Manager is enabled, you might initially view the IBM Cloud Private Welcome page instead of the IBM Multicloud Manager Welcome page when you log in to the management console. Clear your browser cache to view the IBM Multicloud Manager Welcome page.

Management console does not return the log in page when the access token expires

When your access token expires, the log in page is not returned if you select another page from the management console. Refresh the console and log in before you select a different page.

If cluster contains a large number of namespaces, a cluster administrator might not view certain resources

Accessing management console pages that contain all namespaces in the dropdown menu might cause an error. For example, Daemonsets, Deployments, Resource quotas, and similar pages might fail because an increased number of namespaces, such as 100 or 200, are tied to an increased number of resources. To view these resources, use the Search feature or run kubectl.

For example, to use Search as a workaround to view resources quotas for all namespaces, search for kind:resourcequota. Search is also available for other resources, such as kind: deployments, and daemonsets.

To use kubectl, run commands similar to the following:

  kubectl get quota --all-namespaces
  kubectl get Resourcequota --all-namespaces

IBM Cloud Private CLI (cloudctl)

IAM resource that was added from the CLI is overwritten by the management console]

If you update a team resource that has a Helm release resource that is assigned to it from the command-line interface (CLI) and from the management console, then the resource is unassigned. If you manage Helm release resources, add the resource from the CLI. If you manage Helm release resources from the management console, you might notice that a Helm release resource is incorrectly listed as a Namespace. For more information, see Managing Helm releases.

Manage your Helm release resource from the CLI for the most accurate team resource information. For more information, see Working with charts.

Known limitations of IBM Cloud Private on Linux on IBM Z and LinuxONE

The IBM Cloud Private on Linux on IBM Z and LinuxONE has the following limitations:

Known limitation of IBM Cloud Private Certificate Manager (cert-manager)

Only one instance of cert-manager is deployed by default. If there is more than one instance (more than one pod, for example) deployed, remove the extra instances, otherwise cert-manager is not able to function properly.