Installing a large Linux cluster, Part 2

Management server configuration and node installation

Getting started with a large Linux cluster

Content series:

This content is part # of # in the series: Installing a large Linux cluster, Part 2

Stay tuned for additional content in this series.

This content is part of the series:Installing a large Linux cluster, Part 2

Stay tuned for additional content in this series.

This is the second of several articles that cover the installation and setup of a large Linux computer cluster. The purpose of the series is to bring together in one place up-to-date information from various sources in the public domain about the process required to create a working Linux cluster from many separate pieces of hardware and software. These articles are not intended to provide the basis for the complete design of a new large Linux cluster; refer to the relevant reference materials and Redbooks™ mentioned throughout for general architecture pointers.

This series addresses systems architects and systems engineers to plan and implement a Linux cluster using the IBM eServer™ Cluster 1350 framework (see Related topics for more information about the framework). Some parts might also be relevant to cluster administrators for educational purposes and during normal cluster operation.

Part 1 of the series provides detailed instructions on setting up the hardware for the cluster. This second part takes you through the next steps after hardware configuration: software installation using the IBM systems management software, Cluster Systems Management (CSM), and node installation.

Additional parts of the series deal with the storage back-end of the cluster. The articles will cover the storage hardware configuration and the installation and configuration of the IBM shared file system, General Parallel File System (GPFS).

Configuring the management server

The software side of setting up a cluster is a two-stage process: first, install the cluster management server, as described in the first part of this article; and second, install the rest of the cluster, as described beginning in section "Installing nodes". Following this process enables you to use the management server to help configure the rest of the cluster and to prepare for post-installation maintenance and operation.

Installing Linux

Install a fresh operating system installation on the management server. Determine the following specifics of the management server. This example uses a System x 346 machine, which is a typical IBM management server, running Red Hat Enterprise Linux (RHEL) 3. However, this could be another type of computer running a different Linux distribution, such as Suse Linux Enterprise Server (SLES). The System x 346 machine is 64-bit capable, so the operating system installed is the x86_64 architecture version of RHEL 3 across a two-disk mirror using a ServeRAID 7k card. Again, your environment might be slightly different, but the basics of installing the CSM management server should be roughly the same.

Boot the server with the latest IBM ServeRAID support CD to configure the on-board disks for RAID 1 (mirroring). This assumes you have at least two disks in the server and you require protection from disk failure for your operating system.

With the disks configured as a single mirror, boot the server with the first RHEL CD to install the RHEL operating system. Depending on your console, you might need to alter the appearance of the installation. For example, for low-resolution consoles, you might need to boot the CD by typing linux vga=normal at the boot prompt. After you see the Linux installation GUI, install as normal, following these instructions:

  1. Select your language, keyboard map, mouse type, and so on.
  2. Configure disk partitions, as follows:
    • 128Mb /boot primary partition.
    • 2 GB swap partition.
    • Allocate the remaining space to an LVM partition without formatting.
  3. Perform logical volume (LVM) setup, as follows:
    • Name the volume group system.
    • Add logical volumes as shown in Table 1.
  4. Set up the network interfaces, as follows:
    • Activate eth0 on boot with fixed IP address according to our example hosts file above.
    • Set the hostname to
    • No gateway/DNS is required at this stage, but if you have external IP information, you can configure it during installation.
  5. Set firewall to no firewall to allow all connections. Again, if you have a requirement for IP tables, you can configure this later.
  6. Apply your local settings and select the appropriate time zone.
  7. Set the root password; our example password is cluster.
  8. Customize the package installation to include the following:
    • X Window system
    • KDE (that is, K desktop environment)
    • Graphical internet
    • Server configuration tools
    • FTP server
    • Network servers
    • Legacy software development
    • Administration tools
  9. Start the installation.
Table 1. Logical volume layout
Logical volumeMount pointSize
Root/8192 MB
Var/var8192 MB
Usr/usr8192 MB
Opt/opt4096 MB
Tmp/tmp2048 MB
Csminstall/csminstall10240 MB

Once you complete installation, you need to navigate through any post-installation setup screens. Make any custom post-installation changes to the management server for your environment. For example, you might need to configure the X server to work comfortably with your KVM (keyboard, video, and mouse) setup.

Installing CSM

Installing the Cluster Systems Management (CSM) software is generally trivial on a supported system. Good documentation is available in HTML and PDF formats in the IBM Linux Cluster documentation library (see Related topics).

The first step is to copy software onto the management server. Because you must perform the installation as the root user, you can store all this in the root home directory. Table 2 shows an appropriate directory structure.

Table 2. CSM software
/root/manuals/csm/ The CSM documentation PDFs
/root/manuals/gpfs/The GPFS documentation PDFs
/root/manuals/rsct/The RSCT documentation PDFs
/root/csm/The CSM software (contents of the CSM tar package)
/root/csm/downloads/Your open source download RPMS for CSM (for example autorpm)

To install CSM, install the csm.core i386 RPM package. This package works for the x86_64 architecture too. After you install that package, you have the command available to install the CSM management server. First, source the /etc/profile.d/ into your current shell to pick up the new path setting. Then, run the installms command and apply the CSM license to the system. Here are the commands you need to enter:

rpm -ivh /root/csm/csm.core*.i386.rpm
. /etc/profile.d/
installms -p /root/csm/downloads:/root/csm
csmconfig -L <Your License File>

Note: If you do not have a CSM license file, you can run the same csmconfig -L command without the license file to accept the 60-day CSM try-and-buy license. You must then apply a full CSM license to continue CSM function after the 60-day period expires.

Optimizing for large cluster usage

CSM is designed to be scalable. Also, Red Hat Linux works well in most standard situations. However, you can make some tweaks to the management server in order to make a large cluster environment run a little smoother. Here are some examples of ways to optimize the system:

Listen for DHCP requests on a specific interface.
Edit the /etc/sysconfig/dhcpd DHCPD configuration file so the DHCPDARGS is set to the appropriate interface. The variable DHCPDARGS is used in Red Hat Linux in the /etc/init.d/dhcpd DHCPD start-up script to start the DHCP daemon with the specified arguments. Ensure multiple arguments are all contained in quotes in order to listen on eth0 set, like this:
Increase ARP table size and timeouts.
If you have a single large network with much of or the entire cluster on the same subnet, the ARP table can become overloaded, giving the impression that CSM and network requests are slow. To avoid this, make the following changes to the running system, and add them as entries to the /etc/sysctl.conf file, too, in order to make the changes persistent:
net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.neigh.default.gc_thresh1 = 512
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_stale_time = 240
Increase the number of NFS daemons.
By default, the standard CSM fanout value is set to 16. This means commands run across the cluster are run 16 nodes at a time, including node installation. The standard Red Hat Linux setting for NFS is eight running daemons. You can make NFS scale better by increasing the number of NFSD threads to 16 to match default CSM fanout value. However, if you increase the fanout value you might also want to increase the number of NFS threads. Typically, a fanout of 32 with 32 NFS threads is enough to give good speed and reliability and also allows the installation of a single rack of 32 nodes to be installed concurrently. To do this, create the configuration file /etc/sysconfig/nfs, and add the following line:
Set up an NTP server.
The default Red Hat Linux configuration should work for the NTP server. Add a configuration line to the /etc/ntp.conf NTP configuration file to allow nodes on your cluster network to synchronize their time to the management server clock, as shown here:
restrict mask notrust nomodify notrap

If your management server can reach an outside timeserver, add this line to synchronize the management server clock to it:

Ensure the NTP server is running and starts automatically at boot time using the following instruction:
chkconfig ntpd on
service ntpd start

Installing nodes

The CSM management server is now installed with all setup and configuration steps completed. Before you can install any nodes, however, you need to complete additional configuration on the CSM management server to define how the nodes are installed. Do the installation steps in this section on the CSM management server.

Defining nodes

You can define nodes using any method described in the definenode manual page. However, an easy way to define large numbers of nodes is to use the node definition file. With this method, you create a stanza file and pass it as an argument to CSM to define all the nodes listed. It is easy to script the creation of the stanza file.

Listing 1 shows a short example node definition file. If you have similar properties for different nodes, you can define them at the top of the file in a default stanza. After this, each stanza should represent a node name with node-specific attributes listed under it. The example shows how to define three machines in the example cluster -- two compute nodes and one storage server.

Listing 1. Example node definition file
  ConsoleMethod = mrv
  ConsoleSerialDevice = ttyS0
  ConsoleSerialSpeed = 9600
  InstallAdapterName = eth0
  InstallCSMVersion = 1.4.1
  InstallMethod = kickstart
  InstallOSName = Linux
  InstallPkgArchitecture = x86_64
  ManagementServer =
  PowerMethod = bmc
  ConsolePortNum = 1
  ConsoleServerName = term002
  HWControlNodeId = node001
  HWControlPoint =
  InstallDistributionName = RedHatEL-WS
  InstallDistributionVersion = 4
  InstallServiceLevel = QU1
  ConsolePortNum = 2
  ConsoleServerName = term002
  HWControlNodeId = node002
  HWControlPoint =
  InstallDistributionName = RedHatEL-WS
  InstallDistributionVersion = 4
  InstallServiceLevel = QU1
  ConsolePortNum = 2
  ConsoleServerName = term001
  HWControlNodeId = stor001
  HWControlPoint =
  InstallDistributionName = RedHatEL-AS
  InstallDistributionVersion = 3
  InstallServiceLevel = QU5

The node definition file you created with your script would be far longer than this for a large scale cluster. However, when passed to CSM, this command creates nodes quickly:

definenode -f <node-def-filename>

Note that node-def-filename should be changed to match the name of the file where you stored the node definition file just described, for example, definenode -f //tmp/my_nodes.def.

The CSM node database should now contain a list of all your nodes. For the small example cluster, this would consist of 16 compute nodes, a user node, a scheduler node, and a storage server. The CSM management server does not appear in the CSM database. You can see a list of the nodes with the lsnodes command. You can use the lsnode -F command to see a more detailed list that you can also use to backup your CSM node definitions. Redirect the output from this command to a file, and you can redefine nodes by using the definenode -f command again.

Defining node groups

CSM allows nodes to be grouped together using some arbitrary conditions that later allow you to use other CSM commands against a particular group of nodes. This can be particularly useful for referring to certain types of nodes with similar attributes.

CSM allows the use of both dynamic and static node groups. Static node groups contain a set list of node names that the administrator maintains manually. For example, when using static node groups, you must manually add any newly defined nodes to any relevant node groups. Dynamic node groups are of more use in a large cluster and, with a little thought into careful setup, can save significant time and minimize typing on the command line. Dynamic node groups define a list of nodes. Members of the list are defined by a condition such that if a node meets the defined condition, it is automatically placed in that node group, including newly defined nodes. Table 3 shows some example dynamic node group definitions.

Table 3: Dynamic node groups
Definition commandComments
Nodegrp -w "Hostname like 'node%'" ComputeNodesCreate a ComputeNodes node group
Nodegrp -w "Hostname like 'schd%'" SchedulerNodesCreate a SchedulerNodes node group
Nodegrp -w "Hostname like 'stor%'" StorageNodesCreate a StorageNodes node group
Nodegrp -w "Hostname like 'user%'" UserNodesCreate a UserNodes node group
Nodegrp -w "Hostname like 'node%' && ConsoleServerName=='term002'" Rack02 nodegrp -w "Hostname like 'node%' && ConsoleServerName=='term003'" Rack03 nodegrp -w "Hostname like 'node%' && ConsoleServerName=='term...'" Rack... Create node groups for each rack based on Hostname and ConsoleServerName Assumes one console server for each rack autorpm)

Preparing Linux distributions

The CSM management server should contain the CD contents of all the Linux distributions you will install across the cluster. It should also be prepared for CSM installation on client machines, which you should do before any installations. CSM provides two commands to help, which must be run for each Linux distribution you are going to install.

To prepare the /csminstall/Linux tree with the required CSM data, run the copycsmpkgs command. For example:

copycsmpkgs -p /path/to/csm:/path/to/downloads InstallDistributionName=RedHatEL-WS 
	InstallDistributionVersion=4 InstallServiceLevel=QU1
copycsmpkgs -p /path/to/csm:/path/to/downloads InstallDistributionName=RedHatEL-AS 
	InstallDistributionVersion=3 InstallServiceLevel=QU5

To prepare the /csminstall/Linux tree with the required Linux distributions CDs, run the copycds command. For example:

copycds InstallDistributionName=RedHatEL-WS InstallDistributionVersion=4 
copycds InstallDistributionName=RedHatEL-AS InstallDistributionVersion=3 

Once you have the directory structure set up for the CDs, you can add any customized packages to install or upgrade during system installation, such as the following:

  • Copy to /csminstall/Linux/.../x86_64/install to ensure they are installed.
  • Copy to /csminstall/Linux/.../x86_64/updates to install only if an existing version is present.

You can create subdirectories with the name of a node group to install or update only RPMS on a particular group of nodes, if required.

Setting up CFM

CSM provides a mechanism called Configuration File Manager (CFM) that you can use to distribute files across the cluster. You can use CFM to send similar files across the cluster. If you set this up before node installation, the files will be distributed during the installation process.

CFM can contain links to files in other directories on the management server. The links are followed, rather than copied, when sent to the nodes. This is particularly useful for files such as the hosts file, as shown here:

mkdir /cfmroot/etc
ln -s /etc/hosts /cfmroot/etc/hosts

Instead of linking files, you can copy files into CFM instead. For example:

  • Copy the default NTP configuration file to /cfmroot/ntp.conf
  • Add a server line for your management server using the following:
    echo "" gt; /cfmroot/etc/ntp/step-tickers

The file will be distributed across the cluster.

Use CFM when you need to transfer a few files to specific locations on the cluster. However, it is generally not a good idea to overload CFM with large numbers of files on a large cluster. For example, do not use CFM to install extra software from a tar archive. When used in this way across a large cluster, CFM takes a long time to run, making it painful to use. Stick to getting software installed using the supported installation mechanisms. For example, use RPMs instead of tar files to install software, and copy just the configuration files (that is, files that are likely to change over time) into CFM.

Customizing node build

CSM interfaces with the standard network installation mechanism for the operating system you plan to install on each node. For example, this might be NIM on AIX®, autoYaST on Suse Linux, and kickstart on Red Hat Linux. Again, Red Hat is used here for the example of installing a node with kickstart and a kickstart configuration file.

Before starting kickstart setup, check that you have rpower control over all the nodes. This helps CSM get the UUID for each computer in later CSM versions. If the UUID is not available, or you have a CSM version older than, CSM tries to get the MAC address from the first Ethernet device of the nodes. In order for CSM MAC address collection to work, the terminal server configuration must match the settings in the BIOS of the nodes. Check the terminal server connection using the rconsole command. When you are satisfied you have established rpower control and a terminal server connection (if appropriate), continue kickstart configuration.

CSM provides default kickstart templates in the file /opt/csm/install/kscfg.tmpl.*. You can copy these templates to a different filename and customize them to better suit your environment, if required. The templates are a good starting point, and you should generally customize a template file rather than any other standard kickstart file. This is because the templates contain macros for a variety of CSM function, such as running a post-installation script. CSM contributes to the kickstart process by analyzing your kickstart template file before producing the final kickstart file for each node to use. The final file contains all the parsed macros and includes full scripts for everything defined in the template.

Generally, you might want to change the template in the following ways:

  • Alter the disk partitioning, perhaps to include LVM
  • Change the default password
  • Edit the package list to be installed

Once you have edited the kickstart template, run the CSM setup command to generate the end kickstart file and do the initial UUID or MAC address collection as follows:

csmsetupks -n node001 -k /opt/csm/install/your.kickstart.file -x

Note: Use the -x switch because the copycds command was run earlier.

Updating drivers

If you have hardware to be installed in the cluster that the operating system does not directly support, there might still be drivers available. This procedure also works for driver upgrades, when required. CSM can include additional or replacement drivers automatically for both the final installation and the RAM disk used when installing the operating system.

When using the System x hardware example, you might want to gain the performance and stability that the Broadcom network driver for the on-board Broadcom Ethernet adapters gives. To do this, follow these steps, which show how to use the Broadcom bcm5700 driver instead of the standard tg3 network driver Red Hat Linux provides:

  1. Because you are building a kernel module, ensure the kernel source installed for your target system matches the kernel level and type (UP or SMP).
  2. Download the latest bcm57xx driver from Broadcom (see Related topics), and unpack the driver source code.
  3. Run make from the src directory of the bcm driver you have unpacked to build against the running kernel.
  4. Copy the build driver (bcm5700.ko for 2.6 kernel or bcm5700.o for 2.4 kernels) to /csminstall/csm/drivers/lt;kernel versiongt;/x86_64 on the management server.
  5. If you want to build against other kernel versions, you can run make clean to wipe the current build and then run make LINUX=/path/to/your/kernel/source.

CSM uses the drivers from the directory structure under /csminstall/csm/drivers/lt;kernel versiongt;/lt;architecturegt; when building RAM disk images. These images are used to boot the system during installation if the kernel version matches the RAM disk kernel version. Be careful when creating drivers for installation images: some kernel version numbers are different for the installation kernel. For example, Red Hat typically appends the word BOOT to the end of the version string. If the kernel version matches the running kernel of the installed system, the driver is made available to the running operating system, as well. If you are unsure about the kernel versions, investigate inside the RAM disk images as described in the following section.

Modifying the RAM disk

This step is not typically recommended. However, some situations require it, such as when you are unsure about kernel versions. The command list below can also be helpful when investigating the RAM disk images when making new drivers and in other circumstances.

When storage is directly attached to a Red Hat Linux system using a host bus adapter (HBA) to the installation target, the storage driver (such as the qlogic qla2300 drivers) might be loaded before the ServeRAID drivers (used for the internal system disk that is the operating system disk). If this happens, the installation takes place on the wrong disk. /dev/sda represents an LUN on the attached storage rather than the local disk. In this situation, beware of overwriting data on your SAN instead of on a local disk when installing a new operating system. To avoid this, remove the qlogic drivers from the default Red Hat RAM disk image CSM uses to create the boot image for installation. Of course, you need drivers when the system is running, so use another mechanism, such as a post installation script, to install the drivers for the running operating system. This is recommended because the default Red Hat qlogic drivers are generally not failover drivers.

For example, remove the qla2300 drivers from the default RAM disk image for Red Hat Enterprise Linux Advanced Server Version 3. Table 4 shows the commands to do this.

Table 4: RAM disk commands
cd /csminstall/Linux/RedHatEL-AS/3/x86_64/RedHatEL-AS3-QU5/images/pxeboot Change to the directory containing the RAM disk image you need to modify.
cp initrd.img initrd.img.orig Back up the original image.
mkdir mnt Create a mount point.
gunzip -S .img initrd.img Unpack the image.
mount -o loop initrd.img /mnt Mount the image on the mount point
manual stepManually remove all references to qla[23]00 in mnt/modules/*.
cp mnt/modules/modules.cgz Copy the modules archive from the image to the current directory.
gunzip -c modules.cgz | cpio -ivd Unpack the modules archive.
rm modules.cgz Delete the modules archive.
rm 2.4.21-32.EL/ia32e/qla2*Delete the qlogic modules from the unpackaged modules archive.
find 2.4.21-32.EL -type f | cpio -–o -H crc | gzip -c -9 > modules.cgz Pack up the modules archive with qlogic modules removed.
rm -rf 2.4.21-32.EL Delete the unpacked modules archive.
mv -f modules.cgz mnt/modules Replace the old modules archive for the new one.
umount mnt Unmount the RAM disk image.
rmdir mnt Remove the mount point.
gzip -S .img initrd Pack up the RAM disk image again.

Note: To modify the RAM disk for Suse or SLES, make sure ips (the ServeRAID driver) appears before any HBA drivers in the /etc/sysconfig/kernel file under the INITRD_MODULES stanza. The Suse or SLES mechanism for creating RAM disk images ensures that the drivers are loaded in order.

Installing pre- and post-reboot scripts

Because each environment and cluster is different, you might need to apply some post-installation scripting to customize the operating system installation to your specific requirements. You can do this either before or after the first reboot into a newly installed system. This can be particularly useful for configuring secondary network adapters, and CSM provides an example script for this purpose. A secondary adapter configuration is required for the example cluster because of the dual network setup: one compute network and one storage network to each node. Follow these steps for secondary adapter configuration:

  1. Copy the default adapter configuration script CSM provides into the installprereboot script execution directory to enable the script to run at installation time, ensuring it can run, as follows:
    cp /csminstall/csm/scripts/adaptor_config_Linux /csminstall/csm/scripts/
    chmod 755 /csminstall/csm/scripts/installprereboot/100_adaptor_config._LinuxNodes
  2. Generate the adapter stanza file /csminstall/csm/scripts/data/Linux_adapter_stanza_file by writing a header such as the following:
  3. This configures all secondary (eth1) adapters to start on boot, and it takes the default settings for the broadcast address, network mask, and MTU size. You can configure the computer-specific network details in additional stanza lines in a similar way to the node definition files, as shown here:
    for node in $(lsnodes)
      ip=$(grep $node /etc/hosts | head -n 1 | awk '{print $1}')
      echo -e "$node:\n  IPADDR=$ip" gt;gt; Linux_adaptor_stanza_file
  4. This appends output similar to the following to the adapter stanza file to configure each computer with a different IP address, as follows:


There are two main shell environment variables applicable during node installation: CSM_FANOUT and CSM_FANOUT_DELAY. The former variable controls how many nodes are sent CSM instructions simultaneously, such as how many nodes are rebooted from the management server. The latter variable controls how long (in seconds) CSM waits before rebooting the next set of nodes to be installed. These variables are set to 16 nodes for fanout and to wait 20 minutes before rebooting the next set of nodes. These default values are acceptable for most installations but can be increased for large clusters.

To install the cluster in the classic way, complete the following steps:

  1. Configure the installation and install the compute nodes as follows:
    csmsetupks -N ComputeNodes -k 
    /opt/csm/install/your.kickstart.file.nodes -x
    installnode -N ComputeNodes
  2. Configure the installation and install the user nodes as follows:
    csmsetupks -N UserNodes -k /opt/csm/install/your.kickstart.file.user -x
    installnode -N UserNodes
  3. Configure the installation and install the scheduler nodes as follows:
    csmsetupks -N SchedulerNodes -k 
    /opt/csm/install/your.kickstart.file.schd -x
    installnode -N SchedulerNodes
  4. Configure the installation and install the storage nodes as follows:
    csmsetupks -N StorageNodes -k 
    /opt/csm/install/your.kickstart.file.stor -x
    installnode -N StorageNodes

For large cluster installations, use installation servers to stage the installation process, and parallelize the installation process as follows:

  1. Set the InstallServer attribute in CSM. For each node you want to install from an installation server, set the InstallServer attribute to the hostname of the installation server to use for that node. Any nodes without this attribute set defaults to installing from the central management server. In a large cluster environment where, for example, you have 32 nodes per rack, you could select the bottom node in each rack to be an installation server for the cluster. In this case, to configure node002 through node032 in rack 1 to install from node001 and have node001 install from the management server, use this command:
    chnode -n node002-node032 InstallServer=node001
  2. Create a dynamic node group containing all installation servers and another containing the clients as follows:
    nodegrp -w "InstallServer like '_%'" InstallServers
    nodegrp -w "InstallServer not like '_%'" InstallClients
  3. Configure the installation, and install the installation servers as follows:
    csmsetupks -N InstallServers -x
    installnode -N InstallServers
  4. Increase the CSM fanout value to reboot more nodes concurrently in order to take advantage of the increased bandwidth using installation servers provides. In the 32-nodes-per-rack example, the most efficient value for the CSM fanout is 32 multiplied by the number of installation servers (or racks, if one node per rack). In the example, you could also increase the number of NFS threads on each installation server to 32 to scale NFS a little better with each rack. Using this method, you can install hundreds or thousands of machines concurrently.
  5. Configure the installation and install the installation clients as follows:
    csmsetupks -N InstallClients -x
    installnode -N InstallClients


After completing all the steps detailed in the first two parts of this series, you have completed the hardware and software setup for your cluster, including setting up the systems management software and completing the node installation. The concluding parts of the series will guide you through setting up the storage back-end; specifically, performing storage hardware configuration and installing and configuring the IBM shared file system, General Parallel File System (GPFS).

Downloadable resources

Related topics

ArticleTitle=Installing a large Linux cluster, Part 2: Management server configuration and node installation