Troubleshooting
Problem
Using MAC address file to import compute nodes with Platform Cluster Manager
Resolving The Problem
Deploying a large PCM cluster can be challenging if you have hundreds of compute nodes which need to PXE boot to be added to the cluster database. To simplify adding a large number of nodes, the kusu-addhost command supports --file or -f option to specify a file containing a list of MAC addresses. Using a MAC address file can significantly save time to deploy PCM on 100+ compute nodes. This article describes the best practice for using kusu-addhost with a MAC address file.
I. Introduction
To add compute nodes to your PCM cluster, you typically need to run kusu-addhost command and then PXE boot the nodes in the exact order you want them to be added to the cluster. This can be particularly time consuming and error prone if you are adding a large number of nodes. For example, if one node encounters a problem during PXE boot, and you do not realize this in time, then the subsequent nodes are added in the wrong order.
To help with this problem, kusu-addhost command supports using a MAC address file to import node's MAC address into the cluster database. The MAC address file can be used to import both managed and/or unmanaged nodes. Managed nodes are nodes in compute-rhel or compute-imaged node groups; unmanaged nodes are typically network switches or NAS boxes and are added to the unmanaged node group. This article only discusses adding managed nodes because this is by far the more common use case. For details on adding unmanaged nodes, refer kusu-addhost man page.
II. Adding managed nodes via a MAC address file
Follow these steps to add managed nodes using MAC address file(s):
- Obtain a list of MAC addressess for all your managed nodes. You have several ways to obtain these. One way is to connect with your browser to blade/rack chassis management controller and get the MACs for all the nodes in that chassis. Another way is to obtain the MAC address from the node's BIOS but this is not practical if you have large number of nodes. Lastly, its always a good idea to consult with your HW vendor for obtaining a list of MACs for your cluster nodes.
- Create MAC address file(s). Depending on the Node Naming Format (herein after, NNF) in your PCM cluster, you may need to create one or more files. If you are using the default NNF (such as compute-#RR-#NN), you need to create X files, where X is the number of blade chassis (if you're using blade servers) or number of racks (if you are using rack servers). If your NNF is like node-#NN, then you only need to create one MAC address file.
The file must have the following format:=== Beginning of File ========== MAC_address_1 MAC_address_2 MAC_address_3 ... === EOF ========================
- Import the MACs by using the kusu-addhost command. For example, your NNF is compute-#RR-#NN and you wish to import the nodes from the 1st blade chassis and the MAC addresses are in macfile1.txt file. You must use kusu-addhost command like this:
# kusu-addhost -f macfile1.txt -n
where,-j -r -c -n is the node groups you wish to add the nodes to (such as compute-rhel-5-x86_64) -j is the network interface of the nodes being installed to be associated with the MAC address. This is not the installer node's interface. -r is the rack number -c is the rank number (starting value for this rack)
- Repeat step 3 until you have imported all of your MAC address files. Save the files which might be useful at a later time.
- Double check that the cluster /etc/host file looks correct. You should not see all of your compute nodes in this file, with all IP addresses correctly assigned to each compoute node.
- Power on your nodes to provision them for the first time. You do not need to worry about the order in which you power on the nodes but you need to pay attention to number of nodes you power on at the same time. If you power one 100+ nodes, you will likely overload the installer node and some installations might crash. Best practice is to power on one chassis at a time, if you have blade servers. If you have rack servers, power on 20 nodes at a time. You should wait approximately 10 minutes between powering on the next batch of nodes.
Was this topic helpful?
Document Information
Modified date:
05 September 2018
UID
isg3T1016107