IBM Support

Portworx 2.5.0.1 / 2.5.5.deployment failed with error "Failed to load PX filesystem dependencies for kernel 4.18.0-147.8.1.el8_1.x86_64"

Troubleshooting


Problem

Portworx deployment failed to start when the  portworx's kernel module is older than CoreOS kernel version and unable to pull latest images from portworx image repository.   This causes the storage nodes from starting.
This scenario is commonly noticed in the AIRGAP environment where there is no direct access to http://mirrors.portworx.com for updating the px kernel module to match the coreos.
Example:
deployment failed with error "Failed to load PX filesystem dependencies for kernel 4.18.0-147.8.1.el8_1.x86_64" or "4.18.0-193.13.2.el8_2.x86_64"

Symptom

Portworx storage nodes are failed to  coming up 
  # oc get pods -n kube-system    portworx-api-5wcdk                   0/1     Running   1          6d2h    portworx-api-72fk4                   0/1     Running   1          6d2h    portworx-api-rj5c4                   0/1     Running   1          6d2h    portworx-api-vvhm8                   0/1     Running   1          6d2h    portworx-api-zgqnm                   0/1     Running   15         6d2h    portworx-operator-59b65cb986-pxk66   1/1     Running   3          22h    px-storage-cluster-9wdbp             0/1     Running   96         22h    px-storage-cluster-fxg7v             0/1     Running   88         22h    px-storage-cluster-gk4fk             0/1     Running   88         22h    px-storage-cluster-rphkx             0/1     Running   90         22h    px-storage-cluster-tzbzz             0/1     Running   89         22h    stork-6dbf96dcf5-f57f8               1/1     Running   0          22h    stork-6dbf96dcf5-wnsqr               1/1     Running   0          22h    stork-6dbf96dcf5-xwr9q               1/1     Running   4          22h    stork-scheduler-774d869d85-9sx5n     1/1     Running   0          22h    stork-scheduler-774d869d85-sp5lx     1/1     Running   4          22h    stork-scheduler-774d869d85-wnzc2     1/1     Running   0          22h
Log snippet from storage node logs, where is failed to access  mirrors.portworx.com and Failed load PX  filesystem dependencies for kernel
  # oc get pods -n kube-system | grep px-storage  # oc logs <storagenode-pod> -n kube-system
image 4992

Cause

Porworx images used in the install is older than the CoreOS  kernel version.   Example, portworx image being installed  PX 2.5.0.1 is compatible with OpenShift 4.3.8 But it is being installed on the CoreO 4.2.23

Environment

  • OpenShift 4.3
  • Portworx 2.5.0.1

Resolving The Problem

  1. Kernel module used in portworx is older than the CoreOS, That causes the portworx to pull the latest module from the mirrors.portworx.com
  2. Downloaded http://mirrors.portworx.com/build-results/pxfuse/for-installer/x86_64/4.18.0-147.el8_1.x86_64/version/8/px.ko 
  3. Copied the kernel modules to all the storage nodes(worker nodes) under the path /var/lib/osd/pxfs/latest/8.px.ko
  4. If the CoreOS version is just released, you may not find a matching version at the mirrors.portworx.com site. In this scenario, please follow this step to extract the kernel module from the supplied tar file and copy the kernel module  /var/lib/osd/pxfs/latest/
    1. tar --strip-components 4 -C /var/lib/osd/pxfs/latest -xvf /opt/pwx/oci/rootfs/pxlib_data/px-fslibs/px_modules.8.tgz x86_64/<4.18.0-193.13.2.el8_2.x86_64>/version/8/px.ko &&mv /var/lib/osd/pxfs/latest/{,8.}px.ko
    2. Repeat the above 3 steps on all the porworx storage nodes
  5. Restart the portworx service  on all worker nodes
    1.   ssh core@<wokernode>  sudo sh  systemctl restart portworx  systemctl status portworx  
      Portworx status
    2. image 5025
  6. Verify all Portworx pods are getting started successfully
  •   oc get pods -n kube-system
  • image 5026
Reboot the worker nodes to confirm that the Portworx kernel modules are persisted on the nodes and portworx storage nodes are getting restarted successfully on reboot. Once all nodes are restarted and up, run the pxctl status on any one of the storage/worker node.
  1. oc get nodes | grep compute
  2. ssh core@<worker-node>
  3. ssh su
  4. pxctl status
This return all the storage nodes are in online status
Note: Portworx registry server need to be whitelisted for the firewall to allow access the URL  http://mirrors.portworx.com
Portworx case: https://portworx.atlassian.net/servicedesk/customer/portal/2/PSP-6264

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m50000000ClUzAAK","label":"Administration"},{"code":"a8m50000000ClUuAAK","label":"Installation"},{"code":"a8m0z000000GoylAAC","label":"Troubleshooting"}],"ARM Case Number":"","Platform":[{"code":"PF064","label":"OpenShift 4.3"}],"Version":"3.0.0;3.0.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
29 September 2020

UID

ibm16245614