Known issues and limitations
Review the known issues for version 3.1.0.
- English and translated versions of the product documentation might be different
- Kubernetes API Server vulnerability
- Resource quota might not update
- Container fails to start due to Docker issue
- Dynamic configuration does not work for external services
- Dynamic configuration limitation on Linux® on Power® (ppc64le) and IBM® Z nodes
- Sticky sessions must be manually set on Linux® on IBM® Z and LinuxONE and Linux® on Power® (ppc64le)
- OpenTracing plugin for Jaeger
- Tiller 2.7.2 does not support the upgrade or install of Kubernetes 1.9 - 1.10 resources
- Alerting, logging, or monitoring pages displays 500 Internal Server Error
- IPv6 is not supported
- Cannot log in to the management console with an LDAP user after restarting the leading master
- Calico prefix limitation on Linux® on Power® (ppc64le) nodes
- Alerts in Slack contain invalid links
- StatefulSets remain in Terminating state after a worker node shuts down
- Limits for the LDAP connection
- Syncing repositories might not update Helm chart contents
- Some features are not available from the new management console
- Cannot show helm chart in the catalog after upgrading from IBM Cloud Private 2.1.0.3 with Fix Pack 1 to IBM Cloud Private 3.1.0 in a HA environment
- Containers fail to start or a kernel panic occurs
- The management console displays 502 Bad Gateway Error
- Enable Ingress Controller to use a new annotation prefix
- Monitoring data is not retained if you use a dynamically provisioned volume during upgrade * Cannot restart node when using vSphere storage
- Truncated labels are displayed on the dashboard for some languages
- Helm repository names cannot contain DBCS GB18030 characters
- GlusterFS cluster becomes unusable if you configure a vSphere Cloud Provider after installing IBM Cloud Private
- A failed upgrade or rollback of IBM Cloud Private creates two release entries with different statuses
- Prometheus data source is lost during a rollback of IBM Cloud Private
- Matching values for
cluster_CA_domainandcluster_lb_addressare not supported - Vulnerability Advisor cross-architecture image scanning does not work with
glibcversion earlier than 2.22 - Container fails to operate or a kernel panic occurs
- Intermittent failure when you log in to the management console in HA clusters that use NSX-T 2.3
- Precheck during IBM Cloud Private installation fails when you use a hyphen in interface names
- Vulnerability Advisor policy resets to default setting after upgrade from 2.1.0.3 in ppc64le cluster
- Containers can crash when running IBM Cloud Private on KVM on Power guests.
- Linux kernel memory leak
- In an NSX-T environment, when you restart a master node, the management console becomes inaccessible.
- Logging ELK pods are in CrashLoopBackOff state
- IBM Cloud Private CLI command to load archive fails when you expand the archive
- IBM Cloud Private MongoDB pod fails to deploy with custom cluster_domain
- Cloning an IBM Cloud Private worker node is not supported
- Installation can fail with a
helm-apisetup error - Cannot get secret by using kubectl command when encryption of secret data at rest is enabled
English and translated versions of the product documentation might be different
IBM Cloud Private product documentation is translated for participating geographies, but the English version is updated continually. Discrepancies between English and translated versions can appear in between translation cycles. Check the English version to see whether any discrepancies were resolved after the translated versions were published.
Kubernetes API Server vulnerability
IBM Cloud Private has a patch (icp-3.1.0-build508532) on IBM® Fix Central to address the Kubernetes security vulnerability, where the proxy request handling in the Kubernetes API Server can leave vulnerable TCP connections. For
full details, see the Kubernetes kube-apiserver vulnerability issue . After you
apply the patch, you do not need to redeploy either IBM Cloud Private or your Helm releases. You must reapply the patch if you replace your master node.
Resource quota might not update
You might find that the resource quota is not updating in the cluster. This is due to an issue in the kube-controller-manager. The workaround is to stop the kube-controller-manager leader container on the master nodes and let it restart. If high availability is configured for the cluster, you can check the kube-controller-manager log to find the leader. Only the leader kube-controller-manager is working. The other controllers wait to be elected as the new leader once the current leader is down.
For example:
# docker ps | grep hyperkube | grep controller-manager
97bccea493ea 4c7c25836910 "/hyperkube controll…" 7 days ago Up 7 days k8s_controller-manager_k8s-master-9.111.254.104_kube-system_b0fa31e0606015604c409c09a057a55c_2
To stop the leader, run the following command with the ID of the Docker process:
docker rm -f 97bccea493ea
Container fails to start due to Docker issue
Installation fails during container creation due to a Docker 18.03.1 issue. If you have a subPath in the volume mount, you might receive the following error from the kubelet service, which fails to start the container:
Error: failed to start container "heketi": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/7e9cb34c-b2bf-11e8-a9eb-0050569bdc9f/volume-subpaths/heketi-db-secret/heketi/0\\\" to rootfs \\\"/var/lib/docker/overlay2/ca0a54812c6f5718559cc401d9b73fb7ebe43b2055a175ee03cdffaffada2585/merged\\\" at \\\"/var/lib/docker/overlay2/ca0a54812c6f5718559cc401d9b73fb7ebe43b2055a175ee03cdffaffada2585/merged/backupdb/heketi.db.gz\\\" caused \\\"no such file or directory\\\"\"": unknown
For more information, see the Kubernetes documentation .
To resolve this issue, delete the failed pod and try the installation again.
Dynamic configuration does not work for external services
For IBM Cloud Private version 3.1.0, NGINX Ingress Controller is upgraded to version 0.16.2. The --enable-dynamic-configuration=true parameter is enabled by default. However, dynamic configuration does not work for external services.
Review the latest developments on the kubernetes/ingress-nginx Issue 2797, where
the fix is applied to Ingress Controller 0.19.0.
Dynamic configuration limitation on Power (ppc64le) on Z (s390x) nodes
For IBM Cloud Private version 3.1.0, NGINX Ingress Controller is upgraded to version 0.16.2. Because LuaJIT is not available on IBM® Z (s390x) and Linux® on Power® (ppc64le) architectures, the NGINX Controller disables the dynamic configuration
features during startup. Review the latest developments on the kubernetes/ingress-nginx Issue .
Sticky sessions must be manually set on Linux® on IBM® Z and LinuxONE and Linux® on Power® (ppc64le)
Because LuaJIT is unavailable, Session Affinity is handled by the nginx-sticky-module-ng module. You must enable sticky session manually. For more information see, Cannot set the cookie for sticky sessions .
OpenTracing plugin for Jaeger
For IBM Cloud Private version 3.1.0, the NGINX Ingress Controller is upgraded to version 0.16.2. In version 0.16.2, the NGINX Ingress Controller might experience issues if you enable Jaeger as the OpenTracing collect host. Kubernetes has addressed
this issue. For details, see kubernetes/ingress-nginx Issue 2738
.
Tiller 2.7.2 does not support the upgrade or install of Kubernetes 1.9 - 1.10 resources
Tiller version 2.7.2 is installed with IBM Cloud Private version 3.1.0. Tiller 2.7.2 uses Kubernetes API version 1.8. You cannot install or upgrade Helm charts that use only Kubernetes version 1.9 to version 1.10 resources.
You might encounter a Helm release upgrade error. The error message resembles the following content:
Error: UPGRADE FAILED: failed to create patch: unable to find api field in struct Unstructured for the json field "spec"
If you encounter this error message, you must delete the release and install a new version of the chart.
Alerting, logging, or monitoring pages displays 500 Internal Server Error
To resolve this issue, complete the following steps from the master node:
-
Create an alias for the insecure kubectl api log in by running the following command:
alias kc='kubectl -n kube-system' -
Edit the configuration map for Kibana. Run the following command:
kc edit cm kibana-nginx-configAdd the following updates:
upstream kibana { server localhost:5602; } Change localhost to 127.0.0.1 -
Locate and restart the Kibana pod by running the following commands:
kc get pod | grep -i kibanakc delete pod <kibana-POD_ID> -
Edit the configuration map for Grafana by running the following command:
kc edit cm grafana-router-nginx-configAdd the following updates:
upstream grafana { server localhost:3000; } Change localhost to 127.0.0.1 -
Locate and restart the Grafana pod by running the following commands:
kc get pod | grep -i monitoring-grafanakc delete pod <monitoring-grafana-POD_ID> -
Edit the configuration map for the Alertmanager by running the following command:
kc edit cm alertmanager-router-nginx-configAdd the following updates:
upstream alertmanager { server localhost:9093; } Change localhost to 127.0.0.1 -
Locate and restart the Alertmanager by running the following commands:
kc get pod | grep -i monitoring-prometheus-alertmanagerkc delete pod <monitoring-prometheus-alertmanager-POD_ID>
IPv6 is not supported
IBM Cloud Private cannot use IPv6 networks. Comment out the settings in the /etc/hosts file on each cluster node to remove the IPv6 settings. For more information, see Configuring your cluster.
Cannot log in to the management console with an LDAP user after restarting the leading master
If you cannot log in to the management console after you restart the leading master node in a high availability cluster, take the following actions:
- Log in to the management console with the cluster administrator credentials. The user name is
admin, and the password isadmin. - Click Menu > Manage > Identity & Access.
-
Click Edit and then click Save.
Note: LDAP users can log in to the management console.
If the problem persists, MongoDB, MariaDB, and the pods that depend on auth-idp might not be running. Follow these instructions to identify the cause.
-
Check whether MongoDB and MariaDB pods are running without any errors.
-
Use the following command to check the pod status. All pods must show the status as
1/1 Running. Check the logs, if required.kubectl -n kube-system get pods | grep -e mariadb -e mongodb -
If the pods do not show the status as
1/1 Running, restart all the pods by deleting them.kubectl -n kube-system delete pod -l k8s-app=mariadbkubectl -n kube-system delete pod -l app=icp-mongodbWait for a minute or two for the pods to restart. Check the pod status by using the following command. The status must show
1/1 Running.kubectl -n kube-system get pods | grep -e mariadb -e mongodb
-
-
After the MongoDB and MariaDB pods are running, restart the
auth-idppods by deleting them.kubectl -n kube-system delete pod -l k8s-app=auth-idpWait for a minute or two for the pods to restart. Check the pod status by using the following command. The status must show
4/4 Running.kubectl -n kube-system get pods | grep auth-idp
Calico prefix limitation on Linux® on Power® (ppc64le) nodes
If you install IBM Cloud Private on PowerVM Linux LPARs and your virtual Ethernet devices use the ibmveth prefix, you must set the network adapter to use Calico networking. During installation, be sure to set a calico_ip_autodetection_method parameter value in the config.yaml file. The setting resembles the following content:
calico_ip_autodetection_method: interface=<device_name>
The <device_name> parameter is the name of your network adapter. You must specify the ibmveth0 interface on each node of the cluster, including the worker nodes.
Note: If you used PowerVC to deploy your cluster node, this issue does not affect you.
Alerts in Slack contain invalid links
If you integrated a Slack provider with Alertmanager, the links in the Slack messages are invalid. You must open the Alertmanager dashboard at https://<master_ip>:8443/alertmanager to view the alerts.
StatefulSets remain in Terminating state after a worker node shuts down
If the node where the StatefulSet pod is running shut down, the pod for the StatefulSet enters a Terminating state. You must manually delete the pod that is stuck in the Terminating state to force
it to re-create on another node.
To delete the pod, run the following command:
kubectl -n <namespace> delete pods --grace-period=0 --force <pod_name>
For more information about Kubernetes pod safety management, see Pod Safety, Consistency Guarantees, and Storage Implications in the Kubernetes community feature specs.
Limits for the LDAP connection
You can define only one LDAP connection in IBM Cloud Private. After you add an LDAP connection, you can edit it, but you cannot remove it.
Syncing repositories might not update Helm chart contents
Synchronizing repositories takes several minutes to complete. While synchronization is in progress, there might be an error if you try to display the readme file. After synchronization completes, you can view the readme file and deploy the chart.
Some features are not available from the new management console
IBM Cloud Private 3.1.0 supports the new management console only. Some options from the previous console are not yet available. To access the options from the previous console you must use the kubectl CLI for the functions.
Cannot show helm chart in the catalog after upgrading from IBM Cloud Private 2.1.0.3 with Fix Pack 1 to IBM Cloud Private 3.1.0 in a HA environment
An uploaded .tgz file that is in the local-charts repository and is present in the PersistentVolume for your master node, is not copied on each master node.
After upgrading, if the local-charts repository starts on another master node, it does not find any .tgz files to sync with Mongo.
Before or after the upgrade, you must manually copy your .tgz files on each master node. Verify the assumptions before you prepare to copy your .tgz files:
- You must run the script from a master node that contains the
hostsandssh_keyfile. -
The master node on which the script is executed, must have the Helm CLI.
Note: Once IBM Cloud Private 2.1.0.3 with Fix Pack 1 is updated to IBM Cloud Private 3.1.0, you must use the newest version of Helm CLI. For more information on downloading Helm, see Helm community
.
-
The
hostsfile must first list the[master]nodes before the[worker]and[proxy]nodes. The output might resemble the following content:[master] 172.16.157.119 vip_iface=ens3 172.16.160.72 vip_iface=ens3 172.16.160.225 vip_iface=ens3 [worker] 172.16.162.6 172.16.162.72 [proxy] 172.16.157.119 vip_iface=ens3 172.16.160.72 vip_iface=ens3 172.16.160.225 vip_iface=ens3 -
The
ssh_keymust allowsshcapability to every master node. The output might resemble the following content:ssh -i ./path/to/ssh/key/file root@172.16.157.119 ssh -i ./path/to/ssh/key/file root@172.16.160.72 ssh -i ./path/to/ssh/key/file root@172.16.160.225
After you verify the required assumptions, follow the steps to manually copy your .tgz files on each master node.
-
Decompress and copy the
helm-repo-tgz-copy-to-masters.sh.zipscript onto the master node that contains the
hostsfile and anssh_key. -
Make the file executable by running the following command:
chmod +x ./helm-repo-tgz-copy-to-masters.sh -
Run the file from the same directory as the
hostsandssh_keyfile or with parameters for the path to thehostsandssh_keyfile by running the following commands:
./helm-repo-tgz-copy-to-masters.sh
./helm-repo-tgz-copy-to-masters.sh ./path/to/hosts/file ./path/to/ssh/key/file
Containers fail to start or a kernel panic occurs
For Red Hat Enterprise Linux (RHEL) only: Containers fail to start or a kernel panic occurs and a no space left on device error message is displayed for Red Hat Enterprise Linux (RHEL) only. This issue is a known Docker
engine issue that is caused by the leaking of cgroups. For more information about this issue, see https://github.com/moby/moby/issues/29638 and https://github.com/kubernetes/kubernetes/issues/61937
.
To fix this issue, you must restart the host.
The management console displays 502 Bad Gateway Error
The management console displays a 502 Bad Gateway Error after installing or rebooting the master node.
If you recently installed IBM Cloud Private, wait a few minutes and reload the page.
If you rebooted the master node, take the following steps:
-
Obtain the IP addresses of the
icp-dspods. From the master node, run the following command:kubectl get pods -o wide -n kube-system | grep "icp-ds"The output resembles the following text:
icp-ds-0 1/1 Running 0 1d 10.1.231.171 10.10.25.134In this example,
10.1.231.171is the IP address of the pod.In high availability (HA) environments, an
icp-dspod exists for each master node. -
From the master node, ping the
icp-dspods. Check the IP address for eachicp-dspod by running the following command for each IP address:ping 10.1.231.171If the output resembles the following text, you must delete the pod:
connect: Invalid argument -
From the master node, delete each pod that is unresponsive by running the following command:
kubectl delete pods icp-ds-0 -n kube-systemIn this example,
icp-ds-0is the name of the unresponsive pod.Important: In HA installations, you might have to delete the pod for each master node.
-
From the master node, obtain the IP address of the replacement pod or pods by running the following command:
kubectl get pods -o wide -n kube-system | grep "icp-ds"The output resembles the following text:
icp-ds-0 1/1 Running 0 1d 10.1.231.172 10.10.2 -
From the master node, ping the pods again and check the IP address for each
icp-dspod by running the following command for each IP address:ping 10.1.231.172If all
icp-dspods are responsive, you can access the IBM Cloud Private management console when that pod enters the available state.
Enable Ingress Controller to use a new annotation prefix
-
The NGINX ingress annotation contains a new prefix in version 0.9.0 that is used in IBM Cloud Private 3.1.0
nginx.ingress.kubernetes.io. This change uses the flag to avoid breaks to deployments that are running.- To avoid breaking a running NGINX ingress controller, add the
--annotations-prefix=ingress.kubernetes.ioflag to the nginx ingress controller deployment. The product accepts the flag by default in IBM Cloud Private ingress controller.
- To avoid breaking a running NGINX ingress controller, add the
-
If you want to use the new ingress annotation, update the ingress controller by removing the
--annotations-prefix=ingress.kubernetes.ioflag. To remove the flag run the following commands:Note: Run the following commands from the master node.
For Linux® x86_64, run the following command:
kubectl edit ds nginx-ingress-lb-amd64 -n kube-systemFor Linux® on Power® (ppc64le) run the following command:
kubectl edit ds nginx-ingress-lb-ppc64le -n kube-systemSave and exit to implement the change. Ingress controller restarts to receive the new configuration.
Monitoring data is not retained if you use a dynamically provisioned volume during upgrade
If you use a dynamically provisioned persistent volume to store monitoring data, the data is lost after you upgrade the monitoring service from 2.1.0.2 to 2.1.0.3.
Cannot restart node when using vSphere storage
Shutting down a cluster in an IBM Cloud Private environment that uses vSphere Cloud moves the pod to another node in your cluster. However, the vSphere volume that the pod uses on the original node is not detached from the node. An error might occur when you try to restart the node.
To resolve the issue, first detach the volume from the node. Then, restart the node.
Truncated labels are displayed on the dashboard for some languages
If you access the IBM Cloud Private dashboard in languages other than English from the Mozilla Firefox browser on a system that uses a Windows™ operating system, some labels might be truncated.
Helm repository names cannot contain DBCS GB18030 characters
Do not use DBCS GB18030 characters in the Helm repository name when you add the repository.
GlusterFS cluster becomes unusable if you configure a vSphere Cloud Provider after installing IBM Cloud Private
By default, the kubelet uses the IP address of the node as the node name. When you configure a vSphere Cloud Provider, kubelet uses the host name of the node as the node name. If you had your GlusterFS cluster set up during installation of IBM Cloud Private, Heketi creates a topology by using the IP address of the node.
When you configure a vSphere Cloud Provider after you install IBM Cloud Private, your GlusterFS cluster becomes unusable because the kubelet identifies nodes by their host names, but Heketi still uses IP addresses to identify the nodes.
If you plan to use both GlusterFS and a vSphere Cloud Provider in your IBM Cloud Private cluster, ensure that you set kubelet_nodename: hostname in the config.yaml file during installation.
A failed upgrade or rollback of IBM Cloud Private creates two release entries with different statuses
A failed upgrade or rollback results in two listed releases with the same name: one successful release that has not been upgraded or rolled back, and the failed upgraded or rolled back release.
These two releases with the same name are two revisions of the same release, so deleting one deletes the other. This issue of showing more than one revision of a release is a known community Helm 2.7.2 issue. For more information, see https://github.com/kubernetes/helm/issues/2941.
Prometheus data source is lost during a rollback of IBM Cloud Private
When you roll back from IBM Cloud Private Version 3.1.0 to 2.1.0.3, the Prometheus data source in Grafana is lost. The Grafana dashboards do not display any metric.
To resolve the issue, add back the Prometheus data source by completing the steps in the Manually configure a Prometheus data source in Grafana section.
Matching values for cluster_CA_domain and cluster_lb_address are not supported
Using matching values for the cluster_CA_domain and cluster_lb_address parameters during installation is not supported. To fix this issue, use an IP address for the cluster_lb_address parameter or use a different
domain name.
Vulnerability Advisor cross-architecture image scanning does not work with glibc version earlier than 2.22
Vulnerability Advisor (VA) now supports cross-architecture image scanning with QEMU (Quick EMUlator). You can scan Linux® on Power® (ppc64le) CPU architecture images with VA running on Linux® x86_64 nodes. Alternatively, you can scan Linux® x86_64 CPU architecture images with VA running on Linux® on Power® (ppc64le) nodes.
When scanning Linux® x86_64 images, you must use glibc version 2.22 or later. If you use glibc version earlier than 2.22, the scan might not work when VA runs on Linux® on Power® (ppc64le) nodes. Glibc versions
earlier than 2.22 make certain syscalls (time/vgetcpu/getttimeofday) by using vsyscall mechanisms. The syscall implementation attempts to access hardcoded static address, which QEMU fails to translate while running in emulation mode.
Container fails to operate or a kernel panic occurs
The following error might occur from the IBM Cloud Private node console or kernel log:
kernel:unregister_netdevice: waiting for <eth0> to become free.
If you receive this error, the log displays both kernel:unregister_netdevice: waiting for <eth0> to be free and containers fail to operate. Continue to troubleshoot. If you meet all required conditions, reboot the
node.
View https://github.com/kubernetes/kubernetes/issues/64743 to learn about the Linux
Kernel bug that causes the error.
Intermittent failure when you log in to the management console in HA clusters that use NSX-T 2.3
In HA clusters that use NSX-T 2.3, you might not be able to log in to the management console. After you specify the login credentials, you are redirected to the login page. You might have to try logging in multiple times until you succeed. This issue is intermittent.
Precheck during IBM Cloud Private installation fails when you use a hyphen in interface names
If any node in your IBM Cloud Private cluster has a hyphen in an interface name, the IBM Cloud Private installation fails.
Vulnerability Advisor policy resets to default setting after upgrade from 2.1.0.3 in ppc64le cluster
If you enabled Vulnerability Advisor (VA) on your Linux® on Power® (ppc64le) cluster in 2.1.0.3, the Vulnerability Advisor policy resets to the default setting when you upgrade to 3.1.0. To fix this issue, reset the VA policy in the management console.
Containers can crash when running IBM Cloud Private on KVM on POWER guests.
If you are running IBM Cloud Private on KVM on Power guests, some containers might crash because of an issue with how the Transaction Memory is handled. You can work around this issue by using one of the following methods:
- Turn off the Transaction Memory support for KVM on Power guests.
- If you are using the Oemu emulator directly to run the virtual machine, enable the
cap-htm=offoption. - If you are using the libvirt library, add the following XML attribute to the domain definition:
See the libvirt documentation<features> <htm state='on'/> </features>for the detailed instructions about adding this libvirt attribute. Note: This issue is specific to KVM on Power guests and does not occur when using POWER9 bare metal or POWER9 PowerVM LPARs.
Linux kernel memory leak
Linux kernels older than release 4.17.17 contain a bug that causes kernel memory leaks in cgroup (community link). When pods in the host are restarted multiple times, the host can run out of kernel memory. This problem causes pod start failures and hung systems.
As shown in the following example, you can check your kernel core dump file and view the core stack:
[700556.898399] Call Trace:
[700556.898406] [<ffffffff8184bdb0>] ? bit_wait+0x60/0x60
[700556.898408] [<ffffffff8184b5b5>] schedule+0x35/0x80
[700556.898411] [<ffffffff8184e746>] schedule_timeout+0x1b6/0x270
[700556.898415] [<ffffffff810f90ee>] ? ktime_get+0x3e/0xb0
[700556.898417] [<ffffffff8184bdb0>] ? bit_wait+0x60/0x60
[700556.898420] [<ffffffff8184ad24>] io_schedule_timeout+0xa4/0x110
[700556.898422] [<ffffffff8184bdcb>] bit_wait_io+0x1b/0x70
[700556.898425] [<ffffffff8184b95f>] __wait_on_bit+0x5f/0x90
[700556.898429] [<ffffffff8119200b>] wait_on_page_bit+0xcb/0xf0
[700556.898433] [<ffffffff810c6de0>] ? autoremove_wake_function+0x40/0x40
[700556.898435] [<ffffffff81192123>] __filemap_fdatawait_range+0xf3/0x160
[700556.898437] [<ffffffff811921a4>] filemap_fdatawait_range+0x14/0x30
[700556.898439] [<ffffffff8119414f>] filemap_write_and_wait_range+0x3f/0x70
[700556.898444] [<ffffffff8129af08>] ext4_sync_file+0x108/0x350
[700556.898447] [<ffffffff812486de>] vfs_fsync_range+0x4e/0xb0
[700556.898449] [<ffffffff8124879d>] do_fsync+0x3d/0x70
[700556.898451] [<ffffffff81248a63>] SyS_fdatasync+0x13/0x20
[700556.898453] [<ffffffff8184f788>] entry_SYSCALL_64_fastpath+0x1c/0xbb
[700599.233973] mptscsih: ioc0: attempting task abort! (sc=ffff880fd344e100)
To work around the failures, you can restart the host. However, you might encounter the problem again. To avoid the problem, it is recommended that you upgrade your Linux kernel to release 4.17.17 or higher. Release 4.17.17 contains fixes for the kernel bug.
In an NSX-T environment, when you restart a master node, the management console becomes inaccessible.
In an NSX-T environment, when you restart a master node, the management console is inaccessible even though all the service pods are in a good state. This issue is caused because of non-persistent IPtable NAT rules, which help host port and pod communication through host IP. NSX-T does not support host port. IBM Cloud Private uses host port for management console.
To resolve the issue, run the following commands on all the master nodes. Use the network CIDR that you specified in the /<installation_directory>/cluster/config.yaml file.
iptables -tnat -N ICP-NSXT
iptables -tnat -A POSTROUTING -j ICP-NSXT
iptables -tnat -A ICP-NSXT ! -s <network_cidr> -d <network_cidr> -j MASQUERADE
Logging ELK pods are in CrashLoopBackOff state
Logging ELK pods continue to appear in CrashLoopBackOff state after upgrading to the current version and increasing memory.
This is a known issue in ElasticsSearch 5.5.1.
Note: If you have more than one data-pod, repeat steps 1-8 for each pod. For example, logging-elk-data-0, logging-elk-data-1, or logging-elk-data-2.
Complete the following steps to resolve this issue.
-
Check the log to find the problematic file that contains the permission issue.
java.io.IOException: failed to write in data directory [/usr/share/elasticsearch/data/nodes/0/indices/dT4Nc7gvRLCjUqZQ0rIUDA/0/translog] write permission is required -
Get the IP address of the management node where the logging-elk-data-1 pod is running.
kubectl -n kube-system get pods -o wide | grep logging-elk-data-1 -
Use SSH to log in to the management node.
-
Navigate to the
/var/lib/icp/logging/elk-datadirectory.cd /var/lib/icp/logging/elk-data -
Find all
.es_temp_filefiles.find ./ -name "*.es_temp_file" -
Delete all
*.es_temp_filefiles that you find in step 5.rm -rf *.es_temp_file -
Delete the old logging-elk-data-1 pod.
kubectl -n kube-system delete pods logging-elk-data-1 -
Wait 3-5 minutes, for the new logging-elk-data-1 pod to restart.
kubectl -n kube-system get pods -o wide | grep logging-elk-data-1
IBM Cloud Private CLI command to load archive fails when you expand the archive
If the cloudctl catalog load-archive command returns the following error, you need to reconstruct your archive file:
Expanding archive
FAILED
Extract the archive file and repackage it with the following command, which must contain the directory entries in the archive.
mkdir /tmp/cloudctl_archive
tar -xzvf -C /tmp/cloudctl_archive <archive_file>
tar -czvf -C /tmp/cloudctl_archive <new_archive_file>
cloudctl catalog load-archive -a <new_archive_file>
rm -rf /tmp/cloudctl_archive
IBM Cloud Private MongoDB pod fails to deploy with custom cluster_domain
For amd64 architecture, IBM Cloud Private has a patch (icp-3.1.0-build508873) on IBM® Fix Central to address a bug in the peer-finder application within the icp-mongodb-install image. To resolve the issue, you must
apply the patch before installation or upgrade. The patch replaces the icp-mongodb-install image with a new image that contains an updated version of the peer-finder application.
Cloning an IBM Cloud Private worker node is not supported
IBM Cloud Private does not support cloning an existing IBM Cloud Private worker node. You cannot change the host name and IP address of a node on your existing cluster.
You must add a new worker node. For more information, see Adding an IBM Cloud Private cluster node.
Installation can fail with a helm-api setup error
Installation can fail with the following errors during the initial deployment of the helm-api chart:
stderr: 'Error: secrets "rudder-secret" already exists'
You can view these errors in the install logs in the cluster/logs directory.
The errors occur because the Kubernetes secrets for rudder and helm-api are created in a pre-install hook. Issues like network issues that cause timeouts, incorrect configuration of the installation, and insufficient resources on the master node prevent the use of the secrets that were created by the prehook. When the deployment fails, the installer tries to deploy the chart four more times. On each retry, it fails when trying to recreate the secrets.
To resolve this issue, complete the following steps:
-
Run the uninstall procedure to remove all components of the installation.
-
Verify that the installation settings are correctly configured, including the
config.yamlfile, hosts, and other settings. -
Run the installation procedure again.
Cannot get secret by using kubectl command when encryption of secret data at rest is enabled
When you enable encryption of secret data at rest, and use kubectl command to get the secret, sometimes you might not be able to get the secret. You might see the following error message in kube-apiserver:
Internal error occurred: invalid padding on input
This error occurs because kube-apiserver failed to decrypt the encrypted data in etcd. For more information about the issue, see Random "invalid padding on input" errors when attempting various kubectl operations .
To resolve the issue, delete the secret and re-create it. Use the following command:
kubectl -n <namespace> delete secret <secret>
For more information about encrypting secret data at rest, see Encrypting Secret Data at Rest .