Known issues and limitations

Review the known issues for version 3.1.0.

English and translated versions of the product documentation might be different
Kubernetes API Server vulnerability
Resource quota might not update
Container fails to start due to Docker issue
Dynamic configuration does not work for external services
Dynamic configuration limitation on Linux® on Power® (ppc64le) and IBM® Z nodes
Sticky sessions must be manually set on Linux® on IBM® Z and LinuxONE and Linux® on Power® (ppc64le)
OpenTracing plugin for Jaeger
Tiller 2.7.2 does not support the upgrade or install of Kubernetes 1.9 - 1.10 resources
Alerting, logging, or monitoring pages displays 500 Internal Server Error
IPv6 is not supported
Cannot log in to the management console with an LDAP user after restarting the leading master
Calico prefix limitation on Linux® on Power® (ppc64le) nodes
Alerts in Slack contain invalid links
StatefulSets remain in Terminating state after a worker node shuts down
Limits for the LDAP connection
Syncing repositories might not update Helm chart contents
Some features are not available from the new management console
Cannot show helm chart in the catalog after upgrading from IBM Cloud Private 2.1.0.3 with Fix Pack 1 to IBM Cloud Private 3.1.0 in a HA environment
Containers fail to start or a kernel panic occurs
The management console displays 502 Bad Gateway Error
Enable Ingress Controller to use a new annotation prefix
Monitoring data is not retained if you use a dynamically provisioned volume during upgrade * Cannot restart node when using vSphere storage
Truncated labels are displayed on the dashboard for some languages
Helm repository names cannot contain DBCS GB18030 characters
GlusterFS cluster becomes unusable if you configure a vSphere Cloud Provider after installing IBM Cloud Private
A failed upgrade or rollback of IBM Cloud Private creates two release entries with different statuses
Prometheus data source is lost during a rollback of IBM Cloud Private
Matching values for cluster_CA_domain and cluster_lb_address are not supported
Vulnerability Advisor cross-architecture image scanning does not work with glibc version earlier than 2.22
Container fails to operate or a kernel panic occurs
Intermittent failure when you log in to the management console in HA clusters that use NSX-T 2.3
Precheck during IBM Cloud Private installation fails when you use a hyphen in interface names
Vulnerability Advisor policy resets to default setting after upgrade from 2.1.0.3 in ppc64le cluster
Containers can crash when running IBM Cloud Private on KVM on Power guests.
Linux kernel memory leak
In an NSX-T environment, when you restart a master node, the management console becomes inaccessible.
Logging ELK pods are in CrashLoopBackOff state
IBM Cloud Private CLI command to load archive fails when you expand the archive
IBM Cloud Private MongoDB pod fails to deploy with custom cluster_domain
Cloning an IBM Cloud Private worker node is not supported
Installation can fail with a helm-api setup error
Cannot get secret by using kubectl command when encryption of secret data at rest is enabled

English and translated versions of the product documentation might be different

IBM Cloud Private product documentation is translated for participating geographies, but the English version is updated continually. Discrepancies between English and translated versions can appear in between translation cycles. Check the English version to see whether any discrepancies were resolved after the translated versions were published.

Kubernetes API Server vulnerability

IBM Cloud Private has a patch (icp-3.1.0-build508532) on IBM® Fix Central to address the Kubernetes security vulnerability, where the proxy request handling in the Kubernetes API Server can leave vulnerable TCP connections. For full details, see the Kubernetes kube-apiserver vulnerability issue Opens in a new tab . After you apply the patch, you do not need to redeploy either IBM Cloud Private or your Helm releases. You must reapply the patch if you replace your master node.

Resource quota might not update

You might find that the resource quota is not updating in the cluster. This is due to an issue in the kube-controller-manager. The workaround is to stop the kube-controller-manager leader container on the master nodes and let it restart. If high availability is configured for the cluster, you can check the kube-controller-manager log to find the leader. Only the leader kube-controller-manager is working. The other controllers wait to be elected as the new leader once the current leader is down.

For example:

# docker ps | grep hyperkube | grep controller-manager
97bccea493ea        4c7c25836910                                                                                              "/hyperkube controll…"   7 days ago          Up 7 days                               k8s_controller-manager_k8s-master-9.111.254.104_kube-system_b0fa31e0606015604c409c09a057a55c_2

To stop the leader, run the following command with the ID of the Docker process:

docker rm -f 97bccea493ea

Container fails to start due to Docker issue

Installation fails during container creation due to a Docker 18.03.1 issue. If you have a subPath in the volume mount, you might receive the following error from the kubelet service, which fails to start the container:

Error: failed to start container "heketi": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/7e9cb34c-b2bf-11e8-a9eb-0050569bdc9f/volume-subpaths/heketi-db-secret/heketi/0\\\" to rootfs \\\"/var/lib/docker/overlay2/ca0a54812c6f5718559cc401d9b73fb7ebe43b2055a175ee03cdffaffada2585/merged\\\" at \\\"/var/lib/docker/overlay2/ca0a54812c6f5718559cc401d9b73fb7ebe43b2055a175ee03cdffaffada2585/merged/backupdb/heketi.db.gz\\\" caused \\\"no such file or directory\\\"\"": unknown

For more information, see the Kubernetes documentation Opens in a new tab .

To resolve this issue, delete the failed pod and try the installation again.

Dynamic configuration does not work for external services

For IBM Cloud Private version 3.1.0, NGINX Ingress Controller is upgraded to version 0.16.2. The --enable-dynamic-configuration=true parameter is enabled by default. However, dynamic configuration does not work for external services. Review the latest developments on the kubernetes/ingress-nginx Issue 2797 Opens in a new tab , where the fix is applied to Ingress Controller 0.19.0.

Dynamic configuration limitation on Power (ppc64le) on Z (s390x) nodes

For IBM Cloud Private version 3.1.0, NGINX Ingress Controller is upgraded to version 0.16.2. Because LuaJIT is not available on IBM® Z (s390x) and Linux® on Power® (ppc64le) architectures, the NGINX Controller disables the dynamic configuration features during startup. Review the latest developments on the kubernetes/ingress-nginx Issue Opens in a new tab .

Sticky sessions must be manually set on Linux® on IBM® Z and LinuxONE and Linux® on Power® (ppc64le)

Because LuaJIT is unavailable, Session Affinity is handled by the nginx-sticky-module-ng module. You must enable sticky session manually. For more information see, Cannot set the cookie for sticky sessions Opens in a new tab .

OpenTracing plugin for Jaeger

For IBM Cloud Private version 3.1.0, the NGINX Ingress Controller is upgraded to version 0.16.2. In version 0.16.2, the NGINX Ingress Controller might experience issues if you enable Jaeger as the OpenTracing collect host. Kubernetes has addressed this issue. For details, see kubernetes/ingress-nginx Issue 2738 Opens in a new tab .

Tiller 2.7.2 does not support the upgrade or install of Kubernetes 1.9 - 1.10 resources

Tiller version 2.7.2 is installed with IBM Cloud Private version 3.1.0. Tiller 2.7.2 uses Kubernetes API version 1.8. You cannot install or upgrade Helm charts that use only Kubernetes version 1.9 to version 1.10 resources.

You might encounter a Helm release upgrade error. The error message resembles the following content:

Error: UPGRADE FAILED: failed to create patch: unable to find api field in struct Unstructured for the json field "spec"

If you encounter this error message, you must delete the release and install a new version of the chart.

Alerting, logging, or monitoring pages displays 500 Internal Server Error

To resolve this issue, complete the following steps from the master node:

Create an alias for the insecure kubectl api log in by running the following command:
```
alias kc='kubectl -n kube-system'
```

Edit the configuration map for Kibana. Run the following command:

kc edit cm kibana-nginx-config

Add the following updates:

 upstream kibana {
 server localhost:5602;
 }
 Change localhost to 127.0.0.1

Locate and restart the Kibana pod by running the following commands:
```
 kc get pod | grep -i kibana
```
```
 kc delete pod <kibana-POD_ID>
```

Edit the configuration map for Grafana by running the following command:

kc edit cm grafana-router-nginx-config

Add the following updates:

upstream grafana {
server localhost:3000;
}
Change localhost to 127.0.0.1

Locate and restart the Grafana pod by running the following commands:

kc get pod | grep -i monitoring-grafana

kc delete pod <monitoring-grafana-POD_ID>

Edit the configuration map for the Alertmanager by running the following command:

kc edit cm alertmanager-router-nginx-config

Add the following updates:

upstream alertmanager {
server localhost:9093;
}
Change localhost to 127.0.0.1

Locate and restart the Alertmanager by running the following commands:

kc get pod | grep -i monitoring-prometheus-alertmanager

kc delete pod <monitoring-prometheus-alertmanager-POD_ID>

IPv6 is not supported

IBM Cloud Private cannot use IPv6 networks. Comment out the settings in the /etc/hosts file on each cluster node to remove the IPv6 settings. For more information, see Configuring your cluster.

Cannot log in to the management console with an LDAP user after restarting the leading master

If you cannot log in to the management console after you restart the leading master node in a high availability cluster, take the following actions:

Log in to the management console with the cluster administrator credentials. The user name is admin, and the password is admin.
Click Menu > Manage > Identity & Access.
Click Edit and then click Save.

Note: LDAP users can log in to the management console.

If the problem persists, MongoDB, MariaDB, and the pods that depend on auth-idp might not be running. Follow these instructions to identify the cause.

Check whether MongoDB and MariaDB pods are running without any errors.
- Use the following command to check the pod status. All pods must show the status as 1/1 Running. Check the logs, if required.
```
kubectl -n kube-system get pods | grep -e mariadb -e mongodb
```
- If the pods do not show the status as 1/1 Running, restart all the pods by deleting them.
```
kubectl -n kube-system delete pod -l k8s-app=mariadb
```
```
kubectl -n kube-system delete pod -l app=icp-mongodb
```
  Wait for a minute or two for the pods to restart. Check the pod status by using the following command. The status must show 1/1 Running.
```
  kubectl -n kube-system get pods | grep -e mariadb -e mongodb
```
After the MongoDB and MariaDB pods are running, restart the auth-idp pods by deleting them.
```
kubectl -n kube-system delete pod -l k8s-app=auth-idp
```
Wait for a minute or two for the pods to restart. Check the pod status by using the following command. The status must show 4/4 Running.
```
 kubectl -n kube-system get pods | grep auth-idp
```

Calico prefix limitation on Linux® on Power® (ppc64le) nodes

If you install IBM Cloud Private on PowerVM Linux LPARs and your virtual Ethernet devices use the ibmveth prefix, you must set the network adapter to use Calico networking. During installation, be sure to set a calico_ip_autodetection_method parameter value in the config.yaml file. The setting resembles the following content:

calico_ip_autodetection_method: interface=<device_name>

The <device_name> parameter is the name of your network adapter. You must specify the ibmveth0 interface on each node of the cluster, including the worker nodes.

Note: If you used PowerVC to deploy your cluster node, this issue does not affect you.

Alerts in Slack contain invalid links

If you integrated a Slack provider with Alertmanager, the links in the Slack messages are invalid. You must open the Alertmanager dashboard at https://<master_ip>:8443/alertmanager to view the alerts.

StatefulSets remain in Terminating state after a worker node shuts down

If the node where the StatefulSet pod is running shut down, the pod for the StatefulSet enters a Terminating state. You must manually delete the pod that is stuck in the Terminating state to force it to re-create on another node.

To delete the pod, run the following command:

kubectl -n <namespace> delete pods --grace-period=0 --force <pod_name>

For more information about Kubernetes pod safety management, see Pod Safety, Consistency Guarantees, and Storage Implications Opens in a new tab in the Kubernetes community feature specs.

Limits for the LDAP connection

You can define only one LDAP connection in IBM Cloud Private. After you add an LDAP connection, you can edit it, but you cannot remove it.

Syncing repositories might not update Helm chart contents

Synchronizing repositories takes several minutes to complete. While synchronization is in progress, there might be an error if you try to display the readme file. After synchronization completes, you can view the readme file and deploy the chart.

Some features are not available from the new management console

IBM Cloud Private 3.1.0 supports the new management console only. Some options from the previous console are not yet available. To access the options from the previous console you must use the kubectl CLI for the functions.

Cannot show helm chart in the catalog after upgrading from IBM Cloud Private 2.1.0.3 with Fix Pack 1 to IBM Cloud Private 3.1.0 in a HA environment

An uploaded .tgz file that is in the local-charts repository and is present in the PersistentVolume for your master node, is not copied on each master node. After upgrading, if the local-charts repository starts on another master node, it does not find any .tgz files to sync with Mongo.

Before or after the upgrade, you must manually copy your .tgz files on each master node. Verify the assumptions before you prepare to copy your .tgz files:

You must run the script from a master node that contains the hosts and ssh_key file.
The master node on which the script is executed, must have the Helm CLI.

Note: Once IBM Cloud Private 2.1.0.3 with Fix Pack 1 is updated to IBM Cloud Private 3.1.0, you must use the newest version of Helm CLI. For more information on downloading Helm, see Helm community .

The hosts file must first list the [master] nodes before the [worker] and [proxy] nodes. The output might resemble the following content:

[master]
172.16.157.119 vip_iface=ens3
172.16.160.72 vip_iface=ens3
172.16.160.225 vip_iface=ens3

[worker]
172.16.162.6
172.16.162.72

[proxy]
172.16.157.119 vip_iface=ens3
172.16.160.72 vip_iface=ens3
172.16.160.225 vip_iface=ens3

The ssh_key must allow ssh capability to every master node. The output might resemble the following content:

ssh -i ./path/to/ssh/key/file root@172.16.157.119
ssh -i ./path/to/ssh/key/file root@172.16.160.72
ssh -i ./path/to/ssh/key/file root@172.16.160.225

After you verify the required assumptions, follow the steps to manually copy your .tgz files on each master node.

Decompress and copy the helm-repo-tgz-copy-to-masters.sh.zip script onto the master node that contains the hosts file and an ssh_key.
Make the file executable by running the following command:
```
chmod +x ./helm-repo-tgz-copy-to-masters.sh
```
Run the file from the same directory as the hosts and ssh_key file or with parameters for the path to the hosts and ssh_key file by running the following commands:

./helm-repo-tgz-copy-to-masters.sh

./helm-repo-tgz-copy-to-masters.sh ./path/to/hosts/file ./path/to/ssh/key/file

Containers fail to start or a kernel panic occurs

For Red Hat Enterprise Linux (RHEL) only: Containers fail to start or a kernel panic occurs and a no space left on device error message is displayed for Red Hat Enterprise Linux (RHEL) only. This issue is a known Docker engine issue that is caused by the leaking of cgroups. For more information about this issue, see https://github.com/moby/moby/issues/29638 Opens in a new tab and https://github.com/kubernetes/kubernetes/issues/61937 .

To fix this issue, you must restart the host.

The management console displays 502 Bad Gateway Error

The management console displays a 502 Bad Gateway Error after installing or rebooting the master node.

If you recently installed IBM Cloud Private, wait a few minutes and reload the page.

If you rebooted the master node, take the following steps:

Obtain the IP addresses of the icp-ds pods. From the master node, run the following command:
```
kubectl get pods -o wide  -n kube-system | grep "icp-ds"
```
The output resembles the following text:
```
icp-ds-0                                                  1/1       Running       0          1d        10.1.231.171   10.10.25.134
```
In this example, 10.1.231.171 is the IP address of the pod.

In high availability (HA) environments, an icp-ds pod exists for each master node.
From the master node, ping the icp-ds pods. Check the IP address for each icp-ds pod by running the following command for each IP address:
```
ping 10.1.231.171
```
If the output resembles the following text, you must delete the pod:
```
connect: Invalid argument
```
From the master node, delete each pod that is unresponsive by running the following command:
```
 kubectl delete pods icp-ds-0 -n kube-system
```
In this example, icp-ds-0 is the name of the unresponsive pod.

Important: In HA installations, you might have to delete the pod for each master node.

From the master node, obtain the IP address of the replacement pod or pods by running the following command:

kubectl get pods -o wide  -n kube-system | grep "icp-ds"

The output resembles the following text:

icp-ds-0                                                  1/1       Running       0          1d        10.1.231.172   10.10.2

From the master node, ping the pods again and check the IP address for each icp-ds pod by running the following command for each IP address:
```
ping 10.1.231.172
```
If all icp-ds pods are responsive, you can access the IBM Cloud Private management console when that pod enters the available state.

Enable Ingress Controller to use a new annotation prefix

The NGINX ingress annotation contains a new prefix in version 0.9.0 that is used in IBM Cloud Private 3.1.0 nginx.ingress.kubernetes.io. This change uses the flag to avoid breaks to deployments that are running.
- To avoid breaking a running NGINX ingress controller, add the --annotations-prefix=ingress.kubernetes.io flag to the nginx ingress controller deployment. The product accepts the flag by default in IBM Cloud Private ingress controller.
If you want to use the new ingress annotation, update the ingress controller by removing the --annotations-prefix=ingress.kubernetes.io flag. To remove the flag run the following commands:

Note: Run the following commands from the master node.

For Linux® x86_64, run the following command:
```
 kubectl edit ds nginx-ingress-lb-amd64 -n kube-system
```
For Linux® on Power® (ppc64le) run the following command:
```
 kubectl edit ds nginx-ingress-lb-ppc64le -n kube-system
```
Save and exit to implement the change. Ingress controller restarts to receive the new configuration.

Monitoring data is not retained if you use a dynamically provisioned volume during upgrade

If you use a dynamically provisioned persistent volume to store monitoring data, the data is lost after you upgrade the monitoring service from 2.1.0.2 to 2.1.0.3.

Cannot restart node when using vSphere storage

Shutting down a cluster in an IBM Cloud Private environment that uses vSphere Cloud moves the pod to another node in your cluster. However, the vSphere volume that the pod uses on the original node is not detached from the node. An error might occur when you try to restart the node.

To resolve the issue, first detach the volume from the node. Then, restart the node.

Truncated labels are displayed on the dashboard for some languages

If you access the IBM Cloud Private dashboard in languages other than English from the Mozilla Firefox browser on a system that uses a Windows™ operating system, some labels might be truncated.

Helm repository names cannot contain DBCS GB18030 characters

Do not use DBCS GB18030 characters in the Helm repository name when you add the repository.

GlusterFS cluster becomes unusable if you configure a vSphere Cloud Provider after installing IBM Cloud Private

By default, the kubelet uses the IP address of the node as the node name. When you configure a vSphere Cloud Provider, kubelet uses the host name of the node as the node name. If you had your GlusterFS cluster set up during installation of IBM Cloud Private, Heketi creates a topology by using the IP address of the node.

When you configure a vSphere Cloud Provider after you install IBM Cloud Private, your GlusterFS cluster becomes unusable because the kubelet identifies nodes by their host names, but Heketi still uses IP addresses to identify the nodes.

If you plan to use both GlusterFS and a vSphere Cloud Provider in your IBM Cloud Private cluster, ensure that you set kubelet_nodename: hostname in the config.yaml file during installation.

A failed upgrade or rollback of IBM Cloud Private creates two release entries with different statuses

A failed upgrade or rollback results in two listed releases with the same name: one successful release that has not been upgraded or rolled back, and the failed upgraded or rolled back release.

These two releases with the same name are two revisions of the same release, so deleting one deletes the other. This issue of showing more than one revision of a release is a known community Helm 2.7.2 issue. For more information, see https://github.com/kubernetes/helm/issues/2941.

Prometheus data source is lost during a rollback of IBM Cloud Private

When you roll back from IBM Cloud Private Version 3.1.0 to 2.1.0.3, the Prometheus data source in Grafana is lost. The Grafana dashboards do not display any metric.

To resolve the issue, add back the Prometheus data source by completing the steps in the Manually configure a Prometheus data source in Grafana section.

Matching values for `cluster_CA_domain` and `cluster_lb_address` are not supported

Using matching values for the cluster_CA_domain and cluster_lb_address parameters during installation is not supported. To fix this issue, use an IP address for the cluster_lb_address parameter or use a different domain name.

Vulnerability Advisor cross-architecture image scanning does not work with `glibc` version earlier than 2.22

Vulnerability Advisor (VA) now supports cross-architecture image scanning with QEMU (Quick EMUlator). You can scan Linux® on Power® (ppc64le) CPU architecture images with VA running on Linux® x86_64 nodes. Alternatively, you can scan Linux® x86_64 CPU architecture images with VA running on Linux® on Power® (ppc64le) nodes.

When scanning Linux® x86_64 images, you must use glibc version 2.22 or later. If you use glibc version earlier than 2.22, the scan might not work when VA runs on Linux® on Power® (ppc64le) nodes. Glibc versions earlier than 2.22 make certain syscalls (time/vgetcpu/getttimeofday) by using vsyscall mechanisms. The syscall implementation attempts to access hardcoded static address, which QEMU fails to translate while running in emulation mode.

Container fails to operate or a kernel panic occurs

The following error might occur from the IBM Cloud Private node console or kernel log:

  kernel:unregister_netdevice: waiting for <eth0> to become free.

If you receive this error, the log displays both kernel:unregister_netdevice: waiting for <eth0> to be free and containers fail to operate. Continue to troubleshoot. If you meet all required conditions, reboot the node.

View https://github.com/kubernetes/kubernetes/issues/64743 Opens in a new tab to learn about the Linux Kernel bug that causes the error.

Intermittent failure when you log in to the management console in HA clusters that use NSX-T 2.3

In HA clusters that use NSX-T 2.3, you might not be able to log in to the management console. After you specify the login credentials, you are redirected to the login page. You might have to try logging in multiple times until you succeed. This issue is intermittent.

Precheck during IBM Cloud Private installation fails when you use a hyphen in interface names

If any node in your IBM Cloud Private cluster has a hyphen in an interface name, the IBM Cloud Private installation fails.

Vulnerability Advisor policy resets to default setting after upgrade from 2.1.0.3 in ppc64le cluster

If you enabled Vulnerability Advisor (VA) on your Linux® on Power® (ppc64le) cluster in 2.1.0.3, the Vulnerability Advisor policy resets to the default setting when you upgrade to 3.1.0. To fix this issue, reset the VA policy in the management console.

Containers can crash when running IBM Cloud Private on KVM on POWER guests.

If you are running IBM Cloud Private on KVM on Power guests, some containers might crash because of an issue with how the Transaction Memory is handled. You can work around this issue by using one of the following methods:

Turn off the Transaction Memory support for KVM on Power guests.
If you are using the Oemu emulator directly to run the virtual machine, enable the cap-htm=off option.
If you are using the libvirt library, add the following XML attribute to the domain definition:
```
<features>
  <htm state='on'/>
</features>
```
See the libvirt documentation for the detailed instructions about adding this libvirt attribute. Note: This issue is specific to KVM on Power guests and does not occur when using POWER9 bare metal or POWER9 PowerVM LPARs.

Linux kernel memory leak

Linux kernels older than release 4.17.17 contain a bug that causes kernel memory leaks in cgroup (community link). When pods in the host are restarted multiple times, the host can run out of kernel memory. This problem causes pod start failures and hung systems.

As shown in the following example, you can check your kernel core dump file and view the core stack:

[700556.898399] Call Trace:
[700556.898406]  [<ffffffff8184bdb0>] ? bit_wait+0x60/0x60
[700556.898408]  [<ffffffff8184b5b5>] schedule+0x35/0x80
[700556.898411]  [<ffffffff8184e746>] schedule_timeout+0x1b6/0x270
[700556.898415]  [<ffffffff810f90ee>] ? ktime_get+0x3e/0xb0
[700556.898417]  [<ffffffff8184bdb0>] ? bit_wait+0x60/0x60
[700556.898420]  [<ffffffff8184ad24>] io_schedule_timeout+0xa4/0x110
[700556.898422]  [<ffffffff8184bdcb>] bit_wait_io+0x1b/0x70
[700556.898425]  [<ffffffff8184b95f>] __wait_on_bit+0x5f/0x90
[700556.898429]  [<ffffffff8119200b>] wait_on_page_bit+0xcb/0xf0
[700556.898433]  [<ffffffff810c6de0>] ? autoremove_wake_function+0x40/0x40
[700556.898435]  [<ffffffff81192123>] __filemap_fdatawait_range+0xf3/0x160
[700556.898437]  [<ffffffff811921a4>] filemap_fdatawait_range+0x14/0x30
[700556.898439]  [<ffffffff8119414f>] filemap_write_and_wait_range+0x3f/0x70
[700556.898444]  [<ffffffff8129af08>] ext4_sync_file+0x108/0x350
[700556.898447]  [<ffffffff812486de>] vfs_fsync_range+0x4e/0xb0
[700556.898449]  [<ffffffff8124879d>] do_fsync+0x3d/0x70
[700556.898451]  [<ffffffff81248a63>] SyS_fdatasync+0x13/0x20
[700556.898453]  [<ffffffff8184f788>] entry_SYSCALL_64_fastpath+0x1c/0xbb
[700599.233973] mptscsih: ioc0: attempting task abort! (sc=ffff880fd344e100)

To work around the failures, you can restart the host. However, you might encounter the problem again. To avoid the problem, it is recommended that you upgrade your Linux kernel to release 4.17.17 or higher. Release 4.17.17 contains fixes for the kernel bug.

In an NSX-T environment, when you restart a master node, the management console becomes inaccessible.

In an NSX-T environment, when you restart a master node, the management console is inaccessible even though all the service pods are in a good state. This issue is caused because of non-persistent IPtable NAT rules, which help host port and pod communication through host IP. NSX-T does not support host port. IBM Cloud Private uses host port for management console.

To resolve the issue, run the following commands on all the master nodes. Use the network CIDR that you specified in the /<installation_directory>/cluster/config.yaml file.

iptables -tnat -N ICP-NSXT
iptables -tnat -A POSTROUTING -j ICP-NSXT
iptables -tnat -A ICP-NSXT ! -s  <network_cidr> -d <network_cidr> -j MASQUERADE

Logging ELK pods are in CrashLoopBackOff state

Logging ELK pods continue to appear in CrashLoopBackOff state after upgrading to the current version and increasing memory.

This is a known issue Opens in a new tab in ElasticsSearch 5.5.1.

Note: If you have more than one data-pod, repeat steps 1-8 for each pod. For example, logging-elk-data-0, logging-elk-data-1, or logging-elk-data-2.

Complete the following steps to resolve this issue.

Check the log to find the problematic file that contains the permission issue.

java.io.IOException: failed to write in data directory [/usr/share/elasticsearch/data/nodes/0/indices/dT4Nc7gvRLCjUqZQ0rIUDA/0/translog] write permission is required

Get the IP address of the management node where the logging-elk-data-1 pod is running.
```
kubectl -n kube-system get pods -o wide | grep logging-elk-data-1
```
Use SSH to log in to the management node.
Navigate to the /var/lib/icp/logging/elk-data directory.
```
cd /var/lib/icp/logging/elk-data
```
Find all .es_temp_file files.
```
find ./ -name "*.es_temp_file"
```
Delete all *.es_temp_file files that you find in step 5.
```
rm -rf *.es_temp_file
```

Delete the old logging-elk-data-1 pod.

kubectl -n kube-system delete pods logging-elk-data-1

Wait 3-5 minutes, for the new logging-elk-data-1 pod to restart.

kubectl -n kube-system get pods -o wide | grep logging-elk-data-1

IBM Cloud Private CLI command to load archive fails when you expand the archive

If the cloudctl catalog load-archive command returns the following error, you need to reconstruct your archive file:

   Expanding archive
   FAILED

Extract the archive file and repackage it with the following command, which must contain the directory entries in the archive.

   mkdir /tmp/cloudctl_archive
   tar -xzvf -C /tmp/cloudctl_archive <archive_file>
   tar -czvf -C /tmp/cloudctl_archive <new_archive_file>
   cloudctl catalog load-archive -a <new_archive_file>
   rm -rf /tmp/cloudctl_archive

IBM Cloud Private MongoDB pod fails to deploy with custom cluster_domain

For amd64 architecture, IBM Cloud Private has a patch (icp-3.1.0-build508873) on IBM® Fix Central to address a bug in the peer-finder application within the icp-mongodb-install image. To resolve the issue, you must apply the patch before installation or upgrade. The patch replaces the icp-mongodb-install image with a new image that contains an updated version of the peer-finder application.

Cloning an IBM Cloud Private worker node is not supported

IBM Cloud Private does not support cloning an existing IBM Cloud Private worker node. You cannot change the host name and IP address of a node on your existing cluster.

You must add a new worker node. For more information, see Adding an IBM Cloud Private cluster node.

Installation can fail with a `helm-api` setup error

Installation can fail with the following errors during the initial deployment of the helm-api chart:

  stderr: 'Error: secrets "rudder-secret" already exists'

You can view these errors in the install logs in the cluster/logs directory.

The errors occur because the Kubernetes secrets for rudder and helm-api are created in a pre-install hook. Issues like network issues that cause timeouts, incorrect configuration of the installation, and insufficient resources on the master node prevent the use of the secrets that were created by the prehook. When the deployment fails, the installer tries to deploy the chart four more times. On each retry, it fails when trying to recreate the secrets.

To resolve this issue, complete the following steps:

Run the uninstall procedure to remove all components of the installation.
Verify that the installation settings are correctly configured, including the config.yaml file, hosts, and other settings.
Run the installation procedure again.

Cannot get secret by using kubectl command when encryption of secret data at rest is enabled

When you enable encryption of secret data at rest, and use kubectl command to get the secret, sometimes you might not be able to get the secret. You might see the following error message in kube-apiserver:

Internal error occurred: invalid padding on input

This error occurs because kube-apiserver failed to decrypt the encrypted data in etcd. For more information about the issue, see Random "invalid padding on input" errors when attempting various kubectl operations Opens in a new tab .

To resolve the issue, delete the secret and re-create it. Use the following command:

kubectl -n <namespace> delete secret <secret>

For more information about encrypting secret data at rest, see Encrypting Secret Data at Rest Opens in a new tab .

Known issues and limitations

English and translated versions of the product documentation might be different

Kubernetes API Server vulnerability

Resource quota might not update

Container fails to start due to Docker issue

Dynamic configuration does not work for external services

Dynamic configuration limitation on Power (ppc64le) on Z (s390x) nodes

Sticky sessions must be manually set on Linux® on IBM® Z and LinuxONE and Linux® on Power® (ppc64le)

OpenTracing plugin for Jaeger

Tiller 2.7.2 does not support the upgrade or install of Kubernetes 1.9 - 1.10 resources

Alerting, logging, or monitoring pages displays 500 Internal Server Error

IPv6 is not supported

Cannot log in to the management console with an LDAP user after restarting the leading master

Calico prefix limitation on Linux® on Power® (ppc64le) nodes

Alerts in Slack contain invalid links

StatefulSets remain in Terminating state after a worker node shuts down

Limits for the LDAP connection

Syncing repositories might not update Helm chart contents

Some features are not available from the new management console

Cannot show helm chart in the catalog after upgrading from IBM Cloud Private 2.1.0.3 with Fix Pack 1 to IBM Cloud Private 3.1.0 in a HA environment

Containers fail to start or a kernel panic occurs

The management console displays 502 Bad Gateway Error

Enable Ingress Controller to use a new annotation prefix

Monitoring data is not retained if you use a dynamically provisioned volume during upgrade

Cannot restart node when using vSphere storage

Truncated labels are displayed on the dashboard for some languages

Helm repository names cannot contain DBCS GB18030 characters

GlusterFS cluster becomes unusable if you configure a vSphere Cloud Provider after installing IBM Cloud Private

A failed upgrade or rollback of IBM Cloud Private creates two release entries with different statuses

Prometheus data source is lost during a rollback of IBM Cloud Private

Matching values for cluster_CA_domain and cluster_lb_address are not supported

Vulnerability Advisor cross-architecture image scanning does not work with glibc version earlier than 2.22

Container fails to operate or a kernel panic occurs

Intermittent failure when you log in to the management console in HA clusters that use NSX-T 2.3

Precheck during IBM Cloud Private installation fails when you use a hyphen in interface names

Vulnerability Advisor policy resets to default setting after upgrade from 2.1.0.3 in ppc64le cluster

Containers can crash when running IBM Cloud Private on KVM on POWER guests.

Linux kernel memory leak

In an NSX-T environment, when you restart a master node, the management console becomes inaccessible.

Logging ELK pods are in CrashLoopBackOff state

IBM Cloud Private CLI command to load archive fails when you expand the archive

IBM Cloud Private MongoDB pod fails to deploy with custom cluster_domain

Cloning an IBM Cloud Private worker node is not supported

Installation can fail with a helm-api setup error

Cannot get secret by using kubectl command when encryption of secret data at rest is enabled

Matching values for `cluster_CA_domain` and `cluster_lb_address` are not supported

Vulnerability Advisor cross-architecture image scanning does not work with `glibc` version earlier than 2.22

Installation can fail with a `helm-api` setup error