NSX-T troubleshooting

Troubleshooting for NSX-T network issues.

Known issues

In HA clusters that use NSX-T 2.2, you might not be able to log in to the management console. After you specify the login credentials, you are redirected to the login page. You might have to try logging in multiple times until you succeed. This issue is intermittent due to the known issue in VMware: For Kubernetes service type ClusterIP, Client-IP based session affinity is not supported. For more information, see [NSX Container Plugin 2.4.1 Release Notes] Opens in a new tab](../images/icons/launch-glyph.svg "Opens in a new tab").

In an NSX-T environment, when you restart a master node, the management console is inaccessible even though all the service pods are in a good state. This issue is caused because of non-persistent IPtable NAT rules, which help host port and pod communication through host IP. NSX-T does not support host port. IBM Cloud Private uses host port for the management console.

To resolve the issue, run the following commands on all the master nodes. Use the network CIDR that you specified in the /<installation_directory>/cluster/config.yaml file.

iptables -tnat -N ICP-NSXT
iptables -tnat -A POSTROUTING -j ICP-NSXT
iptables -tnat -A ICP-NSXT ! -s  <network_cidr> -d <network_cidr> -j MASQUERADE

MustGather

  1. Get the pod status of both the NCP controller and node-agent pods.
    kubectl -n kube-system get pods -o wide -l tier=nsx-networking
    
  2. Get the log of the pods (NCP controller and node-agent pods) which are not in ready state.
  3. When node-agent pod is failing in a node, get the kubelet logs and ovs log from the node.

    journalctl -u kubelet
    ovs-vsctl show
    
  4. Create the configmap value of nsx-ncp-config and nsx-node-agent-config.

    kubectl -n kube-system get cm nsx-ncp-config -o yaml
    kubectl -n kube-system get cm nsx-node-agent-config -o yaml
    
  5. Create the secret value of nsx-secrets.

    kubectl -n kube-system get secrets nsx-secrets -o yaml
    

Troubleshooting

To avoid NSX-T network issues during installation, ensure that the following settings are correctly configured.

  1. Install the NSX-T CNI plug-in on all the nodes. As part of the installation, a validation check is added and a proper error is produced.
  2. Install and configure open vSwitch in all the nodes. As part of the installation, a validation check is added and a proper error is produced.
  3. The mandatory NSX-T resources that are given in config.yaml are configured in the NSX-T manager. If they are not configured, the NCP controller does not go to ready state and the installation waits for kube-dns. TASK [waitfor : Waiting for kube-dns to start]
  4. Tag the logical switch port with node name and cluster name in the NSX-T Manager. If the tag is not correct, the node-agent pod for the node goes to ready state.