Master nodes show NotReady state

Master nodes show NotReady state after upgrade.

Symptoms

After the upgrade-k8s operation completes, master nodes go into NotReady state.

Causes

Before IBM Cloud Private version 3.2.1 with Amazon Web Services (AWS), you were able to install IBM Cloud Private even though the node name provided by AWS was different from the hostname. The AWS node name was based on the internal DNS name. In this context, if you attempt to upgrade IBM Cloud Private to version 3.2.1, you will see that master nodes can become NotReady after the upgrade-k8s operation completes. It is because the kubelet certificates on master nodes are generated based on hostname from the Ansible built-in variables. The mismatch between the node name and hostname prevents the kubelet service on all master nodes from joining the clusters. The nodes remain in NotReady state.

Resolving the problem

As a workaround to resolve this issue, regenerate the kubelet certificate after you run the upgrade-k8s operation. This action corrects the certificate before you run the upgrade-chart operation.

  1. Access the installer container on your boot node. For example,

    docker run  -e LICENSE=accept --rm -it --net host -v $(pwd):/installer/cluster ibmcom/icp-inception:3.2.1-ee bash
    

    Replace the image name with your image name.

  2. Regenerate Kubernetes certificates in a different directory.

    cd playbook/roles/kubernetes-certs/files/
    
    export CERT_DIR=/installer/cluster/cfc-certs/aws-kubernetes
    export ROOT_CA_CRT=/installer/cluster/cfc-certs/root-ca/ca.crt
    export ROOT_CA_KEY=/installer/cluster/cfc-certs/root-ca/ca.key
    
    ./make-ca-cert.sh "ip-10-38-74-106.ap-southeast-2.compute.internal ip-10-38-74-157.ap-southeast-2.compute.internal ip-10-38-74-34.ap-southeast-2.compute.internal" "127.0.0.1" "DNS:kubernetes,DNS:kubernetes.default,DNS:kubernetes.default.svc"
    
  3. Verify that the Kubernetes certificates that you created are on your boot node. For example,

    # ls -l cluster/cfc-certs/aws-kubernetes/
    total 108
    -rw------- 1 root root 6109 Jun 17 20:18 kube-controller-manager.crt
    -rw------- 1 root root 1704 Jun 17 20:18 kube-controller-manager.key
    -rw------- 1 root root 6080 Jun 17 20:18 kube-proxy.crt
    -rw------- 1 root root 1708 Jun 17 20:18 kube-proxy.key
    -rw------- 1 root root 6088 Jun 17 20:18 kube-scheduler.crt
    -rw------- 1 root root 1704 Jun 17 20:18 kube-scheduler.key
    -rw------- 1 root root 6108 Jun 17 20:18 kubecfg.crt
    -rw------- 1 root root 1704 Jun 17 20:18 kubecfg.key
    -rw------- 1 root root 6073 Jun 17 20:18 kubelet-client.crt
    -rw------- 1 root root 1704 Jun 17 20:18 kubelet-client.key
    -rw------- 1 root root 6228 Jun 17 20:18 kubelet-ip-10-38-74-106.ap-southeast-2.compute.internal.crt
    -rw------- 1 root root 1708 Jun 17 20:18 kubelet-ip-10-38-74-106.ap-southeast-2.compute.internal.key
    -rw------- 1 root root 6228 Jun 17 20:18 kubelet-ip-10-38-74-157.ap-southeast-2.compute.internal.crt
    -rw------- 1 root root 1704 Jun 17 20:18 kubelet-ip-10-38-74-157.ap-southeast-2.compute.internal.key
    -rw------- 1 root root 6227 Jun 17 20:18 kubelet-ip-10-38-74-34.ap-southeast-2.compute.internal.crt
    -rw------- 1 root root 1704 Jun 17 20:18 kubelet-ip-10-38-74-34.ap-southeast-2.compute.internal.key
    -rw------- 1 root root 6394 Jun 17 20:18 server.cert
    -rw------- 1 root root 1704 Jun 17 20:18 server.key
    

    Note: The information that you need includes the kubelet certificates and keys. For example,

    -rw------- 1 root root 6228 Jun 17 20:18 kubelet-ip-10-38-74-106.ap-southeast-2.compute.internal.crt
    -rw------- 1 root root 1708 Jun 17 20:18 kubelet-ip-10-38-74-106.ap-southeast-2.compute.internal.key
    -rw------- 1 root root 6228 Jun 17 20:18 kubelet-ip-10-38-74-157.ap-southeast-2.compute.internal.crt
    -rw------- 1 root root 1704 Jun 17 20:18 kubelet-ip-10-38-74-157.ap-southeast-2.compute.internal.key
    -rw------- 1 root root 6227 Jun 17 20:18 kubelet-ip-10-38-74-34.ap-southeast-2.compute.internal.crt
    -rw------- 1 root root 1704 Jun 17 20:18 kubelet-ip-10-38-74-34.ap-southeast-2.compute.internal.key
    
  4. Copy the related kubelet certificates and keys for each node name to the correct master node in the /etc/cfc/kubelet/ directory.

  5. Update kubeconfig file, /etc/cfc/kubelet/kubelet-config on each master node to make it point to the correct certificate and key. For example,

    users:
    - name: kubelet
      user:
        client-certificate: /etc/cfc/kubelet/kubelet-ip-10-38-74-34.ap-southeast-2.compute.internal.crt
        client-key: /etc/cfc/kubelet/kubelet-ip-10-38-74-34.ap-southeast-2.compute.internal.key
    
  6. Restart the kubelet service on the master node. Check node status to make sure that the node is in Ready state.

  7. After the kubelet on the master node joins the cluster, verify that your certificate is in the /etc/cfc/kubelet/ directory. The kubelet server certificate is automatically generated, and is used to serve the HTTPS kubelet service that listens on port 10250.

    # ls -l /etc/cfc/kubelet/kubelet-serv*
    -rw------- 1 root root 1655 Jun  9 22:45 /etc/cfc/kubelet/kubelet-server-2020-06-09-22-45-38.pem
    lrwxrwxrwx 1 root root   55 Jun  9 22:45 /etc/cfc/kubelet/kubelet-server-current.pem -> /etc/cfc/kubelet/kubelet-server-2020-06-09-22-45-38.pem
    
  8. Verify the status of your cluster.

    kubectl -n kube-system logs <pod name>
    kubectl -n kube-system exec -it <pod name> -c <container name>
    kubectl -n kube-system port-forward pod/<pod name>
    helm list --tls
    

You can now proceed to run upgrade-chart to continue the IBM Cloud Private upgrade.