Installing a large Linux cluster, Part 1: Introduction and hardware configuration

Getting started installing a large Linux cluster

Create a working Linux® cluster from many separate pieces of hardware and software, including IBM® System x™ and IBM TotalStorage® systems. This part in this multipart series covers hardware configuration, including understanding architecture, planning logical network design, setting up terminal servers, and updating firmware.

Share:

Graham White (gwhite@uk.ibm.com), Systems Management Specialist, IBM, Software Group

Photo of Graham WhiteGraham White is a systems management specialist in the Linux Integration Centre within Emerging Technology Services at the IBM Hursley Park office in the United Kingdom. He is a Red Hat Certified Engineer, and he specializes in a wide range of open-source, open-standard, and IBM technologies Graham's areas of expertise include LAMP, Linux, security, clustering, and all IBM Systems hardware platforms. He received a BSc with honors in Computer Science with Management Science from Exeter University in 2000.



Mandie Quartly (mandie_quartly@uk.ibm.com), IT Specialist, IBM

Photo of Mandie QuartlyMandie Quartly is an IT specialist with the IBM UK Global Technology Services team. Mandie performs a cross-brand role, with current experience in both Intel and POWER™ platform implementations as well as AIX and Linux (Red Hat and Suse). She specializes in the IBM product General Parallel File System (GPFS). She received a PhD in astrophysics from the University of Leicester in 2001.



06 December 2006

Also available in Chinese Russian

Introduction to the large Linux cluster series

This is the first of multiple articles that cover the installation and setup of a large Linux computer cluster. The aim of the series is to bring together in one place up-to-date information from various places in the public domain on the process required to create a working Linux cluster from many separate pieces of hardware and software. These articles are not intended, however, to provide the basis for the complete design of a new large Linux cluster. Refer to the reference materials and Redbooks under Resources for general architecture pointers.

The first two parts of this series address the base installation of the cluster and include an overview of the hardware configuration and installation using the IBM systems management software, Cluster Systems Management (CSM). The first article introduces you to the topic and takes you through hardware configuration. The second article covers management server configuration and node installation. Subsequent parts of the series deal with the storage back-end of the cluster. They cover the storage hardware configuration and the installation and configuration of the IBM shared file system, General Parallel File System (GPFS ).

This series addresses systems architects and systems engineers to use when they plan and implement a Linux cluster using the IBM eServer Cluster 1350 framework (see Resources). Some parts might also be relevant to cluster administrators for educational purposes and during normal cluster operation.

Part 1: General cluster architecture

A good design is critically important before you undertake any configuration steps. The design has two parts:

  • Physical design
    • Rack layout for each rack type (for example, management racks and compute racks)
    • Floor plan for how the racks will be laid out during both the installation and production use, if the two are different
    • Inter-rack connection diagrams for network, power, console access, and so on
    • Intra-rack cabling for storage, terminal servers and so on
  • Logical design
    • Network design including IP address ranges, subnet configuration, computer naming conventions, and so on
    • CSM configuration for custom script locations, hardware settings, and monitoring requirements
    • Operating system requirements, custom package lists, and system configuration options
    • Storage layout, including file system layout, partitioning, replication, and so on

The example cluster (see Figure 1) consists entirely of Intel® or AMD-based IBM Systems computers with attached TotalStorage subsystems (see Resources for more information about these systems). For simplicity, copper gigabit Ethernet cable provides cluster interconnection. This cable provides good speed in most circumstances with bandwidth increases available between racks using bonded/port-channeled/etherchannel insert-your-favourite-trunking-term-here links.

The network topology takes a star shape, with all racks connecting back to a main switch in the management rack. The example cluster uses three networks: one for management/data (the compute network), one for the clustered file system (the storage network), and one for administrative device management. The first two networks are normal IP networks. The compute network is used for most tasks, including inter-process communications (such as MPI) and cluster management. The storage network is used exclusively for clustered file system communication and access.

Figure 1. Cluster architecture diagram
Cluster architecture diagram

Some additional design and layout details for the example cluster include:

  • Management server -- Management server function can reside on a single server or multiple servers. In a single server environment, the management server operates in standalone mode. You can also set up highly available management servers. You can use CSM high-availability (HA) software to "heartbeat" between two servers and manage dynamic failover between them if a failure condition occurs. Another possible method of introducing extra management servers is to use a replication setup if HA is not important in your environment. In this situation, you can back up the management server data to another live system, which you can bring online manually to take over management if necessary. In Figure 1, the management network connections are shown in red. The management server is the CSM server, which used exclusively to control the cluster within using the CSM function: taking care of system installation, monitoring, maintenance, and other tasks. In this cluster, there is one management server.
  • Storage servers and disks -- You can connect several storage servers to a disk-based backend using various mechanisms. Connecting storage to the cluster can be direct or through a storage area network (SAN) switch, either by fiber, copper, or a mixture of the two (see Figure 1). These servers provide shared storage access to the other servers within the cluster. If data backup is required, connect the backup device to the storage server using an extra copper or fiber link. For the example cluster, the storage back end is a single entity, providing shared file system access across the cluster. The next article in the series goes into detail about the setup, configuration, and implementation of the storage hardware and clustered file system.
  • User nodes -- Ideally, the compute nodes of a cluster should not accept external connections and should only be accessible to system administrators through the management server. System users can log in to user nodes (or login nodes) in order to run their workloads on the cluster. Each user node consists of an image with full editing capabilities, the required development libraries, compilers, and everything else required to produce a cluster-enabled application and retrieve results.
  • Scheduler nodes -- In order to run a workload on the cluster, users should submit their work to a scheduler node. A scheduler daemon, which runs on one or more scheduler nodes, uses a predetermined policy to run the workloads on the cluster. Like compute nodes, scheduler nodes should not accept external connections from users. The system administrator should manage them from the management server.
  • Compute nodes -- These nodes run the cluster workload, accepting jobs from the scheduler. The compute nodes are the most disposable part of the cluster. The system administrator can easily reinstall or reconfigure them using the management server.
  • External connections -- Example external connections are shown in green in Figure 1. These connections are considered to be outside of the cluster, and therefore, they are not described in this article.

Hardware configuration

After you assemble the racks and put them into place with all cabling completed, there is still a large amount of hardware configuration. Specific cabling details of any particular cluster are not covered in this article. The hardware configuration steps required before cluster installation are described with some specific examples using the example cluster design outlined above.

Logical network design

One of the most commonly overlooked tasks when installing a cluster is the logical network design. Ideally, the logical design should be on paper before cluster implementation. Once you have the logical network design, use it to create a hosts file. In a small cluster, you can write out the hosts file manually if there are not many devices on the network. However, it is usually best to produce a naming convention and write a custom script to produce the file.

Ensure all the devices on the network are represented in the hosts file. Some examples include the following (with example names):

  • Management servers (mgmt001 - mgmtXXX)
  • Storage servers (stor001 - storXXX)
  • Compute nodes (node001 - nodeXXX)
  • Scheduler nodes (schd001 - schdXXX)
  • User nodes (user001 - userXXX)

This naming convention covers only the five types of computer systems in the network and only one network, which is not nearly good enough. There are also the storage network and compute networks to factor in, plus a device management network. So this file needs to be expanded. Each node requiring access to the clustered file system needs an address on the storage network. Each node requires two addresses on the compute network: one for the compute address and another for the Baseboard Management Controller (BMC), which is used for hardware monitoring and power control. Table 1 outlines a much more comprehensive naming convention with example IP address ranges.

Table 1. Host file naming convention
DeviceCompute 192.168.0.0/24BMC 192.168.0.0/24Storage 192.168.1.0/24Device 192.168.2.0/24External ext n/w
Management servermgmt001mgmt001_dmgmt001_smgmt001_mmgmt001_e
Storage serverstor001stor001_dstor001_sstor001_mstor001_e
User nodesuser001user001_duser001_snonenone
Scheduler nodesschd001schd001_dschd001_s/nonenone
Compute nodesnode001node001_dnode001_snonenone
Compute switchesnonenonenonegigb01anone
Storage switchesnonenonenonegigb01bnone
Terminal serversnonenonenoneterm001none
Storage controller A/Bnonenonenonedisk01a/bnone
LCM/KVM/RCMnonenonenonecons001none

When implemented, this scheme produces a hosts file like the example you can access under Downloads. This is a small example cluster consisting of sixteen compute nodes, one management server, one storage server, one user node, and one scheduler node in two racks with the relevant devices attached. While not representing a large cluster, this is sufficient for this example cluster, and you can easily extend it to represent far larger clusters if required.

Ethernet switches

There are two physical networks: one for compute traffic and one for storage. A standard 32 nodes per rack requires two 48-port switches in each rack, one for each network. In smaller clusters, the management rack also requires two of the same switches. For larger clusters, 48 ports might not be enough, so a larger central switch might be required.

Each switch for the two main networks (ignoring the device management network) requires a slightly different configuration, because, as in the example, Gigabit Ethernet interconnects use jumbo frames for the storage network and a standard frame size for the compute network. The device management network setup is usually very simple: a flat-layer two-type network on a 10/100 switch is acceptable for device management purposes, so no further explanation is needed.

Example A: Extreme Networks switch

Here are the configuration steps for an Extreme Networks Summit 400-48t 48-port Gigabit Ethernet switch.

First, connect to each switch using the serial console port with a straight serial cable (9600, 8-N-1, no flow control) with a default user ID admin and no password. (Just press the Enter key at the prompt.)

For all switches, follow these steps:

  1. Enter unconfig switch all -- Wipes any existing configuration, if required.
  2. Enter configure vlan mgmt ipaddress 192.168.2.XXX/24 -- Sets the management IP address.
  3. Enter configure snmp sysname gigbXXX.cluster.com -- Sets the switch name.
  4. Enter configure sntp-client primary server 192.168.2.XXX -- Sets the NTP server to the management server.
  5. Enter configure sntp-client update-interval 3600 -- Sets time synchronization to hourly.
  6. Enter configure timezone 0 -- Sets the time zone.
  7. Enter enable sntp-client -- Turns on NTP.
  8. Enter configure ports 1-4 preferred-medium copper -- Changes default preferred medium from fiber to copper on ports 1-4, if required.

Now, to configure jumbo frames on the storage network switches, follow these steps:

  1. Enter create vlan jumbo -- Creates the jumbo frames vlan.
  2. Enter configure "mgmt" delete ports 1-48 -- Removes ports from the mgmt vlan.
  3. Enter configure "jumbo" add ports 1-48 -- Adds ports to the jumbo vlan.
  4. Enter configure jumbo-frame size 9216 -- Sets the maximum transmission unit (MTU) size.
  5. Enter enable jumbo-frame ports 1-48 -- Turns on jumbo frame support.

To enable trunking on a 2-port link, use enable sharing 47 grouping 47-48 (group ports 47 and 48, with 47 as the primary).

To complete the configuration, complete the following:

  1. Enter save configuration primary -- Writes switch configuration to flash in order to survive reboots.
  2. Enter use configuration primary

Example B: Force 10 Networks switch

Here are the configuration steps for a Force 10 Networks e600 multi-blade Gigabit Ethernet switch (with two 48-port blades) for routed networks where a central 48-port switch is not big enough.

Configure the chassis, line cards, and ports for an initial layer two configuration by doing the following:

  1. Connect to the switch using the serial console port with a straight serial cable (9600, 8-N-1, no flow control) with default no user ID or password required.
  2. Enter enable -- Enters super-user mode, no password required by default.
  3. Enter chassis chassis-mode TeraScale -- Initializes the switch to tera-scale mode.
  4. Reboot the switch when prompted. This will take a few minutes.
  5. After reboot, connect to the switch and enter super-user mode again by entering enable.
  6. Enter configure -- Enters configuration mode. Prompt looks like Force10(conf)#) .
  7. Enter Interface Range GigabitEthernet 0/0 - 47 (configure line card 0 ports 0 through 47, prompt looks like Force10(conf-if-range-ge0/1-47)#) .
  8. Enter mtu 9252 -- Sets jumbo frames, if required.
  9. Enter no shutdown -- Allows the port to activate.
  10. Enter exit -- Goes back to configuration mode.
  11. Interface Range GigabitEthernet 1/0 - 47. (The prompt looks like Force10(conf-if-range-ge0/1-47)#).
  12. Repeat steps 7-10 for each line card.

Configure the line cards and ports for layer 3 (VLan routing) by doing the following:

  1. Connect to the switch and enter super-user configuration mode by typing enable.
  2. Enter int port channel 1 -- Configures port channel 1.
  3. Enter channel-member gig 0/46-47 -- Adds line card 0 ports 46 and 47 to the vlan.
  4. Enter no shutdown -- Allows the port channel to activate; this option overrides port configuration for inactive/active ports.
  5. Enter ip add 192.168.x.x/24 -- Sets the IP address for the port channel; this is the gateway for your subnet.
  6. Enter mtu 9252 -- Sets jumbo frames, if they are required.

Now, turn on the DHCP helper to forward DHCP broadcasts across subnet boundaries by doing the following:

  1. Enter int range po 1-X -- Applies configuration to all the port channels you have configured.
  2. Enter ip helper 192.168.0.253 -- Forwards DHCP to your management server IP address.

Next, configure the switch for remote management (using telnet or SSH) by doing the following:

  1. Enter interface managementethernet 0 -- Configures the management port from the configure prompt.
  2. Enter ip add 192.168.2.x/24 -- Sets an IP address on the device management network and connects the admin port to the device management switch.
  3. Set a user ID and password in order to allow remote connections.

Finally, save the switch configuration, by entering write mem.

After the switch configuration is complete, you can run a few sanity checks on your configuration. Plug in a device, such as a laptop, at various points on the network to check connectivity. Most switches have the capability to export their configuration. Consider making a backup copy of your running switch configuration once you have the network setup correctly.

The two example switches are described because they are working, 100-percent non-blocking, and high-performance Gigabit Ethernet switches. Cisco Systems switches do not provide 100% non-blocking throughput, but can be used nonetheless.

Terminal servers

Terminal servers play an important role in large cluster installations that use earlier versions of CSM than CSM 1.4. Clusters using the early versions relied on terminal servers to gather MAC addresses for installation. With the compatibility of CSM and system UUIDs, terminal servers are not as important for the installation of a more modern IBM cluster. However, if you have slightly older hardware or software in a large cluster, terminal servers are still vital during system setup. Ensuring the correct setup of the terminal server itself can save a great deal of time later in the installation process. In addition to collecting MAC addresses, terminal servers can also be used to view terminals from a single point from POST and on into the operating system.

Ensure that the terminal server baud speed for each port matches that of the connecting computer. Most computers are set to a default of 9600 baud, so this might not be an issue. Also ensure the connection settings and flow control between the terminal server and each connecting system are the same. If the terminal server expects an authenticated connection, set this up in CSM or turn off authentication altogether.

Example C: MRV switch

Here is an example configuration for the MRV InReach LX series switch (see Resources for more information about this switch). Configure the MRV card by doing the following:

  1. Connect to the switch using the serial console port with a straight serial cable (9600, 8-N-1, no flow control).
  2. Log in at the prompt. The default username is InReach with password access.
  3. Enter enable -- Enters super user mode with default password system. You see a configuration screen the first time the device is configured. Otherwise, enter setup to get to the same screen.
  4. Enter and save the various network parameters as required.
  5. Enter config -- Enters configuration mode.
  6. Enter port async 1 48 -- Configures ports 1 through 48.
  7. Enter no authentication outbound -- Turns off internal authentication.
  8. Enter no authentication inbound -- Turns off external authentication.
  9. Enter no autobaud-- Fixes the baud rate.
  10. Enter access remote -- Allows remote connection.
  11. Enter flowcontrol cts -- Sets hardware flow control to CTS, which is the default on most IBM computers.
  12. Enter exit -- Goes back to configuration mode.
  13. Enter exit -- Goes back to default super user mode.
  14. Enter save config flash -- Saves the configuration and make it persistent across reboots.

After this initial configuration, you should have little else to do. Again, make sure the settings you made here match the settings on the connecting computers. You should now be able to telnet to your terminal servers in order to manage them in the future. As with the Ethernet switches, you can view the running configuration in order to do some sanity checking of the configuration on the terminal servers, if required. For example, the command show port async all char returns detailed information about each port on the terminal server.

Firmware updates and setting BMC addresses

If it is appropriate, check and update the firmware across your entire cluster. Consider the following elements:

  • Computer BIOS
  • Baseboard Management Controller (BMC) firmware
  • Device firmware
  • Network adapter firmware

You can obtain IBM system updates on the IBM support Web site >, and vendor-specific hardware updates are usually available directly from the vendors' Web sites (see Resources).

Updating firmware on IBM systems

Note: The following method of firmware update might not be supported in your area or for your hardware. You are advised to check with your local IBM representative before proceeding. This information is offered for example purposes only.

CSM code for remotely flashing firmware is still under development. Currently, if you need to flash many computers for BIOS, BMC, or other firmware updates, you are presented with a large problem. It is not reasonable to flash a large cluster with current methods, which involve writing a floppy disk or CD image and attending to each computer individually; an alternative is required. If you have no hardware power control (no BMC IP address is set), start by flashing the BMC firmware, which enables you to set the IP address at the same time. You only need to press all the power buttons once. For other firmware flashes, you can remotely power the systems on and off.

The following example is for IBM Systems 325 or 326 AMD processor-based systems. However, only small alterations are required to apply it to System x computers. The idea is to take a default firmware update image and modify it so that you can use it as a PXE boot image. Then you can boot a system over the network and have it unpack and flash the relevant firmware. Once the system is set to PXE boot, you only need to turn it on for the flash to take place.

Setting up a PXE boot server

A computer on the network running DHCP and TFTP servers is required. A CSM management node installed and running with CSM is a suitable candidate. However, if there are currently no installed computers on the network, use a laptop running Linux connected to the network. Make sure the PXE server is on the correct part of the network (in the same subnet), or that your switches are forwarding DHCP requests to the correct server across subnet boundaries. Then, complete the following steps:

  1. Bring up your PXE server with an IP address of 192.168.0.1.
  2. Install, configure, and start a simple DHCP server on the same computer. Here is a sample configuration:
    ddns-update-style ad-hoc;
    subnet 192.168.0.0 netmask 255.255.255.0 {
      range 192.168.0.2 192.168.255.254;
      filename "/pxelinux.0";
      next-server 192.168.0.1;
    }
  3. Install, configure, and start a TFTP server to run out of /tftpboot/. Install syslinux, which is provided as an RPM package for both Suse and Red Hat Linux.
  4. Copy the memdisk and pxelinux.0 files installed with the syslinux package into /tftpboot/.
  5. Create the directories /tftpboot/pxelinux.cfg/ to hold the configuration files and /tftpboot/firmware/ to hold the firmware images.
  6. Write a default PXE configuration containing entries for the firmware to upgrade to /tftpboot/pxelinux.cfg/default, such as the following:
    serial 0 9600
    default local
    #default bmc
    #default bios
    #default broadcom
    
    label local
    	localboot 0
    
    label bmc
    	kernel memdisk
    	append initrd=firmware/bmc.img
    
    label bios
    	kernel memdisk
    	append initrd=firmware/bios.img
    
    label broadcom
    	kernel memdisk
    	append initrd=firmware/broadcom.img

For reference, when a computer receives a DHCP address during PXE, the configuration files in /tftpboot/pxelinux.cfg are searched in a specific order with the first file found being the one used for the boot configuration for the requesting computer. The search order is determined by converting the requesting DHCP address into 8 hexadecimal digits and searching for the first matching filename in the configuration directory by expanding subnets -- by removing a digit right-to-left on each pass of the search.

As an example, consider a client computer getting the address 192.168.0.2 from the server during PXE boot. The first file search is for the hexadecimal version of this IP address /tftpboot/pxelinux.cfg/C0A80002. If this configuration file is not present, the next searched for is C0A8000, and so on. If no matches are found, the file named default is used. Therefore, putting the above PXE configuration in a file named default works for all computers, regardless of your DHCP configuration. However, for the example, writing the configuration to C0A800 (the 192.168.0.0/24 subnet) reduces the amount of searching.

Updating the Baseboard Management Controller (BMC) firmware and setting an IP address

Note: the procedure described here is for the AMD-based cluster nodes. However, you can use a similar procedure for the Intel-based nodes. Intel BMC updates are provided with the bmc_cfg.exe program (instead of lancfg.exe) to set the BMC address. You can drive this using the terminal servers with a script such as the sample script available under Downloads. Also, for Intel-based computers, you can usually set the BMC address in the system BIOS.

After you set the BMC address on a node, you have remote power control, which makes life easier when configuring the cluster. However, this method of updating the BMC relies on network boot, so if your computers are not set to PXE boot in the BIOS yet, you can update the BIOS first and return to the BMC update afterwards.

Download the latest BMC firmware update DOS image and follow the instructions to create a floppy disk boot image. This image contains a program called lancfg.exe that allows you to set an IP address on the BMC. The usual process is to insert the floppy disk and boot from it in order to apply the update. However, first create a PXE boot image from the floppy disk on your PXE boot server computer with the following command:

dd if=/dev/fd0 of=/tftpboot/firmware/bmc.img bs=1024

Now you can edit the DOS image as needed. For the BMC update, no modifications are required to the base image itself, except to copy a DOS power-off program into the image. At a high level, you power on the computer, it PXE boots to flash the BMC firmware, and it leaves the computer running in the DOS image. Using a script, you can then set the BMC address through the terminal server and power the computer off. In this way you know all the computers powered on are either flashing their BMC firmware or waiting for the IP address to be set. On any computers that are powered off, this process is completed. Download a suitable DOS-based power off command, such as the atxoff.com utility. Once you have a power-off utility, copy it to the image as follows:

mount -o loop /tftpboot/firmware/bmc.img /mnt
cp /path/to/poweroff.exe /mnt
umount /mnt

Now ensure your PXE boot configuration file can send the correct image by changing the appropriate comment to set the default to BMC in the /tftpboot/pxelinux.cfg/default file previously created. After testing on a single node, boot all computers from the power off state so the flash takes place across all the required nodes. When all the nodes have booted the PXE image, change the configuration back to localboot in order to minimize the chance of accidentally flashing a computer if one were to be rebooted.

You can now call the lancfg program and operate it through the terminal server (assuming the BIOS settings export the terminal over serial with the same settings as configured on the terminal server). The BMC IP address can be set using lancfg in a Perl script, such as the unsupported sample script available under Downloads. For example, to set the BMC address of all computers in a node group called Rack1 with gateway address 192.168.10.254 and netmask 255.255.255.0, run the following from the PXE boot server computer:

perl set-bmc-address.pl -N Rack1 -g 192.168.10.254 -m 255.255.255.0

You can customize this script based on your setup. When the script is completed, the computer turns off automatically after having its BMC IP address set using the DOS power-off program you copied to the boot image.

Updating the BIOS

If you have the default BIOS settings applied on all computers, you can do this step before the BMC update above. Flashing the BIOS is a two-stage process, resulting in the factory default settings being applied if performed without changes. Therefore, you need to flash and also apply a new appropriate configuration with any required changes for your cluster. Download the latest BIOS update DOS image, and follow the instructions to create a floppy disk boot image.

You need a saved configuration for the appropriate BIOS level and settings you require. In order to do this, manually update one computer. Boot a computer with the floppy disk image (use a USB floppy drive if the computer does not have one). Apply the update according to the readme file, and wait for this to finish as normal. Reboot the computer and make all the changes to settings you require in the BIOS. Options to consider are turning Numlock to off (if you don't have a number keypad on your keyboard), enabling the serial port, setting the console redirection through the serial port with the appropriate settings configured to match the terminal servers, and setting the boot order to ensure Network appears before Hard Disk. When the changes are complete, save them, and turn off the computer.

On another computer (such as the one you have set up for PXE booting), mount the floppy disk containing the BIOS update. Rename the autoexec.bat file to keep it as a backup on the floppy for later. This prevents the system from flashing the BIOS if this disk were booted again. Insert the disk back into the computer where the updated and configured BIOS options are set, and boot from your modified floppy disk image.

When the DOS prompt appears, ensure your current working directory appears on the a: drive. There is a program on the floppy called cmosram.exe that allows you to save the configuration of the BIOS to disk. Run this program to save the BIOS settings to floppy disk as follows:

cmosram /load:cmos.dat

Once the settings are in the autoexec.bat file, you are ready to apply the update. As a sanity check, test the floppy image you have in a computer to check that the flash happens automatically and the correct settings are applied. You also notice that the system remains on after flashing the BIOS. You can get the system to turn off automatically after the BIOS update in a similar way as described in the BMC update section by using a DOS power-off utility and calling it from the autoexec.bat file.

Once you are satisfied with your modified BIOS update image, you can create a PXE boot image from the floppy disk with the following command:

dd if=/dev/fd0 of=/tftpboot/firmware/bios.img bs=1024

Change the default PXE boot configuration file /tftpboot/pxelinux.cfg/default so it serves the BIOS image when the systems PXE boot. Now, by powering on a system connected to the network, it automatically flashes the BIOS without any user input, applies the correct BIOS settings, and powers off again. When all updates are complete, return the default PXE boot configuration to boot from local disk to avoid any accidents if a computer were to make a PXE request.


Updating the Broadcom firmware

After updating the BMC firmware and BIOS, updating the Broadcom firmware is a simple repeat of the same ideas. Follow these steps:

  1. Download the Broadcom firmware (see Resources), and follow the instructions to create a floppy disk boot image.
  2. Create a PXE boot image from the floppy disk using the following command: dd if=/dev/fd0 of=/tftpboot/firmware/broadcom.img bs=1024"
  3. Loop mount the image file using the following command: mount -o loop /tftpboot/firmware/broadcom.img /mnt
  4. Copy a DOS-based power-off program into the image directory.
  5. Change the autoexec.bat file to automatically update the Broadcom firmware in unattended mode, and turn off the computer when it finishes. For example, for an IBM Systems 326, machine type 8848, the autoexec.bat might look like the following:
    @echo off
    call sramdrv.bat
    echo.
    echo Extracting files...
    call a:\bin.exe -d -o %ramdrv%\update >NULL
    copy a:\command.com %ramdrv%\command.com
    copy a:\atxoff.com %ramdrv%\atxoff.com
    set COMSPEC=%ramdrv%\command.com
    if exist NULL del NULL
    %ramdrv%
    cd \update
    call update.bat 8848
    cd \
    atxoff
  6. Unmount the image.
  7. Check the default configuration /tftpboot/pxelinux.cfg/default to ensure the computer can boot the firmware update for the Broadcom adapter.
  8. Boot any computers that require the update.
  9. Return the configuration back to the local disk PXE configuration.

After you have updated the firmware across the cluster, you can continue hardware setup with the knowledge that fewer problems will arise later as a result of having the latest firmware code. However, you can repeat the process at any time should you need another update. Also, the principles behind this type of firmware update can also be applied to any other type of firmware you might need to flash, as long as you can obtain a firmware update image to PXE boot.


Conclusion

That concludes the instructions for hardware configuration for a large Linux cluster. Subsequent articles in the Installing a large Linux cluster series contain the steps to set up the software side of the cluster, including management server configuration and node installation procedures in the next article.


Download

DescriptionNameSize
Sample concept file and BMC IP address Perl script1es-linuxclusterintro.zip3KB

Note

  1. Implementation of scheme depicted in Table 1, and an unsupported sample Perl script used when setting BMC firmware IP address

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=182212
ArticleTitle=Installing a large Linux cluster, Part 1: Introduction and hardware configuration
publish-date=12062006