Failed control nodes inspection

Message: Inspection failed for one or more control nodes.

Diagnostics

Note: Run all commands as a kni user from provisioner node (also known as RU7 or compute-1-ru7).
  1. Check the .openshift_install.log for errors.
  2. If the logs do not provide the necessary information, then continue the steps in this procedure for further diagnostics.
  3. Ping node by using the IP and hostname that are reserved for the node in DHCP.
  4. If ping does not work, the do the following steps to access the IMM user interface of the Baremetal node:
    Find IPv6 address of node and debug by using the following command:
    oc get bmh -A -o wide |grep control
    openshift-machine-api   control-1-ru2   OK       externally provisioned   isf-rackae4-2bztv-master-0   ipmi://[fd8c:215d:178e:c0de:a94:efff:fefe:2f95]:623   unknown            true             3h28m
    openshift-machine-api   control-1-ru3   OK       externally provisioned   isf-rackae4-2bztv-master-1   ipmi://[fd8c:215d:178e:c0de:a94:efff:fefd:cecd]:623   unknown            true             3h28m
    openshift-machine-api   control-1-ru4   OK       externally provisioned   isf-rackae4-2bztv-master-2   ipmi://[fd8c:215d:178e:c0de:a94:efff:fefe:3031]:623   unknown            true             3h28m
  5. Note down the IPv6 address of the node you are checking.
  6. Go to /home/kni/isfconfig folder and open kickstart-.json file to find the password of the IMM user USERID of node.
  7. Open the file and look for the string OCPRole with value equal to hostname (for example, control-1-ru3) of the node.
    Sample section:
      "ipv6ULA": "XXXXXXXXXXXXXXXXXXXXXX",
           "ipv6LLA": "XXXXXXXXXXXXX",
           "serialNum": "J1025PXX",
           "mtm": "7D2XCTO1WW",
           "ibmSerialNumber": "rackae402",
           "ibmMTM": "9155-C01",
           "type": "storage",
           "OCPRole": "control-1-ru3",
           "location": "RU2",
           "name": "IMM_RU2",
           "bootDevice": "/dev/sda",
           "users": [
              {
                 "user": "CEUSER",
                 "password": "XXXXX",
                 "group": "Administrator",
                 "number": 2
              },
              {
                 "user": "ISFUSER",
                 "password": "XXXXXX",
                 "group": "Administrator",
                 "number": 3
              },
              {
                 "user": "USERID",
                 "password": "XXXXXX",
                 "group": "Administrator",
                 "number": 1
              }
    
  8. Run the following command to create a tunnel to the IMM of the node:
    ssh -N -f -L :1443:[IPv6 address of IMM]:443 -L :3900:[IPv6 address of IMM]:3900 root@localhost
  9. To access the IMM user interface using the provisioner node IP address, open your browser and access the following IP.
    https://<provisioner ip>:1443
    • user - Enter USERID.

    • password - Use the value obtained in the previous step.

  10. From the IMM user interface page, open the remote console and check whether the node is up and shows a valid hostname prompt.

    • If the console shows localhost, then review your DHCP/DNS settings to ensure whether correct reservation is made for the node.
    • If the node shows a Red Hat Linux login prompt instead of Core OS, then it infers that the node Base management controller (BMC) is not responsive.
    Do the following steps to resolve the issue:
    1. Identify IMM of the missing node. Here is mapping of nodes to IMM:
      control-1-ru2 ==> imm_ru2
      control-1-ru3 ==> imm_ru3
      control-1-ru4 ==> imm_ru4
    2. For node(s) that did not show up in the previous steps, run the following sampe IMM command to connect to their IMMs: For example, if node control-1-ru3 did not show up, then from RU7/provisioner (compute-1-ru7), run the imm_ru3 command as a kni user.
    3. Wait for the successful connection to IMM. c. In the system prompt, run the resetsp on IMM prompt.
    4. Wait for 10 minutes.

Next actions

In the installation user interface, click the Retry to restart the installation.