IBM LoadLeveler is a workload management system that allows users to submit, schedule, and execute jobs to a pool of resources. It provides dynamic scheduling and workload balancing for optimum utilization of cluster resources. It also provides a single point of control for effective job and workload management.
LoadLeveler runs as a set of daemons on each of its client machines. A group of client machines reporting to a central manager machine is referred to as a LoadLeveler cluster, and a LoadLeveler cluster is defined by a configuration file.
To get the most out of this article, you should have a basic understanding of IBM LoadLeveler and high-availability clusters. You should also be familiar with the first article in this series, High-availability middleware on Linux, Part 1: Heartbeat and Apache Web server.
Implementing LoadLeveler for high availability
Each machine in a LoadLeveler cluster performs one or more roles in scheduling jobs. These roles and the implications of their failure are as follows:
Scheduling machine: A job submission results in the submission being placed in a queue managed by a scheduling machine. The scheduling machine asks the central manager (this role is described next) to find a machine that can run the job, and the scheduling machine also keeps persistent information about the job on the disk.
The job information on one scheduling machine is not normally shared or accessed by other scheduling machines in the LoadLeveler cluster. The scheduling machines operate independently, and in the event of a scheduling machine failure, the job information that resides on the failed scheduling machine will be temporarily unavailable (but not lost). Jobs waiting to be scheduled will not be considered for execution during this time. It is critically important to have the scheduling machine re-established as quickly as possible. In the example of a high-availability configuration for a scheduling machine, the necessary local files and directories are placed on external shared disk storage, which makes them available to a backup scheduling machine in the event of scheduler node failure.
Central manager machine: The role of the central manager is to find one or more machines in the LoadLeveler cluster, based on the job's requirements, that will run the job. Upon finding the machine(s), it notifies the scheduling machine.
The central manager is the central point of control for LoadLeveler. LoadLeveler lets you define an alternate central manager that can take over the role of the primary central manager in case of failure. A later section discusses this in detail.
In the event of a central manager failure without an alternate central manager defined, jobs that have started will run to completion without loss of information, but their status will not be reported to the users. New jobs will be accepted by scheduling nodes and later forwarded to the central manager when it returns, or when an alternate takes over.
Executing machine: The machine that runs the job is known as the executing machine.
When an execution node fails, the jobs running on the node fail and require a restart when the node is restored. Jobs will start either from the beginning or from the last checkpoint. Job checkpointing can be selected as an option or coded in the application. The establishment of a backup execution node would immediately provide the capability to restart the job in a more timely fashion. With or without a backup node, jobs and their job information and checkpoints are not lost in the event of node failure as long as their disks are still accessible using the appropriate cabling and disk techniques.
Both the scheduling node and the execution nodes maintain persistent information about the state of their jobs. The scheduling node and the execution node use a protocol that ensures that the state information is kept on disk by at least one of them. This ensures that the state can be recovered from either the scheduling node or the execution node in event of failure. Neither the scheduling node nor the execution node discard the job information until it is passed on to, and accepted by, the other node. In the configuration illustrated in this article, the job information is stored on a shared disk.
- Submitting machine: This machine is used to submit, query, and cancel jobs and have no persistent data about a job. Lack of any persistent data makes high availability non-critical for these machines.
The role a client machine plays depends on which LoadLeveler daemons are configured on it. As a whole, the cluster provides the capability to build, submit, execute, and manage serial and parallel batch jobs.
The next sections discuss some of the built-in capabilities of LoadLeveler for high availability and how you can enhance the overall system availability by using heartbeat.
This section shows how to install IBM LoadLeveler 3.2 for Linux™ on three machines: ha1, ha2, and ha3. These steps are based on the installation steps outlined in Chapter 4 of the LoadLeveler Installation Memo document (see Resources for a link).
- Create directories for LoadLeveler as follows:
- Local directory: /var/loadl
- Home directory: /home/loadl
- Release directory: /opt/ibmll/LoadL/full
- Name of central manager machine: ha
- Log in as root
- Create the
loadlgroup name. The command below creates a grouploadlwith group ID of 1000 on all the nodes:
groupadd g 1000 loadl
- Create the
loadluser ID. The command below creates a useridloadlon all the nodes:
useradd c loadleveler_user -d /home/loadl -s /bin/bash \ -u 1000 g 1000 -m loadl - Install the LoadLeveler RPMs
- Install license RPM (where
LLIMAGEDIRcontains the installation RPMS)
cd $LLIMAGEDIR rpm -Uvh LoadL-full-license-3.2.0.0-0.i386.rpm
- Install the other RPM
cd /opt/ibmll/LoadL/sbin ./install_ll -y -d "$LLIMAGEDIR"
- Install license RPM (where
- Run the initialization script llinit
- Create the local directory:
mkdir /var/loadl
- Set ownership
chown loadl.loadl /var/loadl
- Switch to the
loadl
ID
su - loadl
- Change the current directory to the bin subdirectory in the release directory by entering the following:
cd /opt/ibmll/LoadL/full/bin
- Ensure that you have write privileges in the LoadLeveler home, local, and /tmp directories
- Enter the
llinitcommand as shown below:
./llinit -local /var/loadl -release /opt/ibmll/LoadL/full -cm ha
- Create the local directory:
Create a highly available scheduling machine configuration
In this setup:
- The machine ha1 will act as the primary scheduling machine
- The machine ha2 will act as a standby scheduling machine
- The machine ha3 will be used as a job execution machine
In a high-availability configuration, the necessary local files and directories for ha1 are placed on external shared disk storage, which makes them available to ha2 in the event of scheduler node failure. Here's how to set this up:
- Log in as root on the nodes ha1 and ha2.
- Create the following directories on the shared disk (/ha):
- /ha/loadl/execute
- /ha/loadl/spool
- Set ownership by running the following commands on the node ha1:
chown loadl.loadl /ha/loadl/ chown loadl.loadl /ha/loadl/execute chown loadl.loadl /ha/loadl/spool
- Switch to the
loadlID on the nodes ha1 and ha2:
su - loadl
- Set appropriate permissions on the shared directories using commands shown below on the node ha1:
chmod 0777 /ha/loadl/execute chmod 775 /ha/loadl/spool
- On the nodes ha1 and ha2, delete the execute and spool directories under /var/loadl.
rm -rf /var/loadl/execute rm rf /var/loadl/spool
- Create symbolic links to the shared directories using the following commands on the nodes ha1 and ha2:
ln -s /ha/loadl/execute /var/loadl/execute ln -s /ha/loadl/spool /var/loadl/spool
- Edit the machine's stanza. Shown below are the relevant portions of the LoadL_admin (under /home/loadl) file on the different nodes:
- ha1 acts as a primary scheduling machine and the central manager. In a production environment, I recommend putting the central manager and scheduling daemon on separate machines.
... ha1: type = machine central_manager = true schedd_host = true ha3: type = machine ... - ha2 acts as a backup scheduling machine and central manager
... ha2: type = machine central_manager = true schedd_host = true ha3: type = machine ... - ha3 acts as an execution machine
... ha: type = machine central_manager = true schedd_host = true ha3: type = machine ...
- ha1 acts as a primary scheduling machine and the central manager. In a production environment, I recommend putting the central manager and scheduling daemon on separate machines.
- Change the
RUNS_HEREflags in the LoadL_config (under /home/loadl) file on the different machines, as follows:
- ha1 and ha2
... SCHEDD_RUNS_HERE = True STARTD_RUNS_HERE = False ...
This will ensure that a job doesn't get scheduled to be executed on the scheduling machines. We want ha1 and ha2 to act as scheduling machines only.
- ha3
... SCHEDD_RUNS_HERE = False STARTD_RUNS_HERE = True ...
- ha1 and ha2
- Edit the /etc/hosts files on the three nodes as follows:
- ha1
... 9.22.7.46 ha.haw2.ibm.com ha 9.22.7.46 ha2.haw2.ibm.com ha2 ...
- ha2
... 9.22.7.46 ha.haw2.ibm.com ha 9.22.7.46 ha1.haw2.ibm.com ha1 ...
- ha3
... 9.22.7.46 ha.haw2.ibm.com ha 9.22.7.46 ha1.haw2.ibm.com ha1 9.22.7.46 ha2.haw2.ibm.com ha2 ...
- ha1
Configure heartbeat to manage the scheduling machine
To configure heartbeat to manage LoadLeveler:
- Create a script to start and stop the LoadLeveler processes. A very basic script is shown in Listing 1. You can further customize it to suit your setup. Place this script in the /etc/rc.d/init.d directory.
Listing 1. loadl script#!/bin/bash # # /etc/rc.d/init.d/loadl # # Starts the loadleveler processes # # chkconfig: 345 89 56 # description: Runs loadleveler . /etc/init.d/functions # Source function library. PATH=/usr/bin:/bin:/opt/ibmll/LoadL/full/bin #====================================================================== SU="sh" if [ "`whoami`" = "root" ]; then SU="su - loadl" fi #====================================================================== start() { echo "$0: starting loadleveler" $SU -c "llctl start" } #====================================================================== stop() { echo "$0: Stoping loadleveler" $SU -c "llctl stop" } case $1 in 'start') start ;; 'stop') stop ;; 'restart') stop start ;; *) echo "usage: $0 {start|stop|restart}" ;; esac - Now, configure the /etc/ha.d/haresources file (on both the scheduling machine nodes) to include the above loadl script. The relevant portion of the modified file is:
ha1.haw2.ibm.com 9.22.7.46 Filesystem::hanfs.haw2.ibm.com:/ha::/ha::nfs::rw,hard loadl
This line dictates that, on startup of heartbeat, have ha1 serve the cluster IP address, mount the shared filesystem, and then start the LoadLeveler daemons. On shutdown, heartbeat will first stop LoadLeveler, then un-mount the shared filesystem, and finally give up the IP.
Test scheduling machine failover
This section shows how to test the high availability of the scheduling daemon.
- Start the heartbeat service on the primary and then on the backup node with this command:
/etc/rc.d/init.d/heartbeat start
After heartbeat starts successfully, you should see a new interface with the IP address that you configured in the ha.cf file. Once you've started heartbeat, take a peek at your log file (default is /var/log/ha-log) on the primary and make sure that it is doing the IP takeover and then starting LoadLeveler. Use the
pscommand to make sure LoadLeveler daemons are running on the primary node. Heartbeat will not start any of the above processes on the backup. This happens only after the primary fails. - Start the loadleveler daemons on ha3, as user loadl, using the following command:
llctl start
- Make ha3 unavailable, as an executing machine, by draining the
startddaemon. This is needed to provide enough time, after submitting a job, to test failover. Use the following command as user loadl on node ha3:
llctl drain startd
- Check to see what jobs are scheduled on the LoadLeveler cluster using the command below on the ha1 machine as user loadl:
llq
The output of the command for our setup is shown in Figure 1. This command will list any old incomplete jobs on the LoadLeveler cluster.
Figure 1. Output of the command llq on node ha1
Check to see the status of machines on the LoadLeveler cluster by running the command below on the ha1 machine as user loadl:
llstatus
Figure 2. Output of the command llstatus on node ha1
In Figure 2 you can see that Scehdd is available on the ha1 machine, and Startd is drained on the ha3 machine because of step 3, above.
- Set ownership of the samples directory. Use the command shown below on the node ha3 (where the job will run):
chown loadl.loadl /opt/ibmll/LoadL/full/samples
- Submit a job. Submit one of the sample jobs that come with LoadLeveler by executing the commands below on machine ha1 as user loadl:
cd /home/loadl/samples llsubmit job1.cmd
If successful, you'll see a message similar to:
llsubmit: The job "ha1.haw2.ibm.com.23" with 2 job steps has been submitted.
The sample, job1, is a two-step job that should result in two new job steps being created. These two new jobs steps will go to an idle state (I) for the lack of availability of an executing machine. See Figure 3.
Figure 3. Output of the command llq on node ha1 after job1 is submitted
Now, fail the primary scheduling machine.
- Simulating failover. You can do this simply by stopping heartbeat on the primary system, using this command:
/etc/rc.d/init.d/heartbeat stop
You should see all the services come up on the backup machine in under a minute. You can verify that LoadLeveler is running on the backup by checking the /var/log/ha-log file and using the
pscommand on the backup machine. Once the backup has taken over control, run thellstatusandllqcommand on the backup as user loadl. Figures 4 and 5 show the output for these runs.
Figure 4. Output of the command llq, on node ha2, after failover
Figure 5. Output of the command llq on node ha1 after job1 is submitted
In Figures 4 and 5, notice:
- All the old incomplete jobs, including the two corresponding to the job submitted in step 6, above. This proves that the job information has survived a machine failover.
- Schedd is now available on the ha2 machine
- Startd is still down on the ha3 machine.
- Resume the startd daemon on the ha3 machine by executing the following command on ha3 as user loadl:
llctl resume startd
The ha3 machine is now available for executing jobs. The two jobs should go to a running state and should finally complete on the ha3 machine. You can verify job completion by taking a look at the .out files, created by the two steps of the job, in the /home/loadl/samples directory on the ha3 machine. For this run, you should see two files, job1.ha1.23.0.out and job1.ha1.23.1.out.
- Start the heartbeat service back on the primary. This should stop the LoadLeveler processes on the secondary and start them on the primary. The primary should also take over the cluster IP as well. Start the heartbeat service with this command:
/etc/rc.d/init.d/heartbeat start
You can see how, using shared disk, the jobs submitted before the failure of the primary scheduling node can be recovered after the backup has taken control.
Set up high availability for the central manager
If you decide to keep the central manager on a separate node (ha4) as the scheduling machine, you can leverage the built-in high availability features of the central manager. To try this configuration, define the ha4 machine as the central manager in the configuration files of all machines.
Problems with network communication or software or hardware failures can cause the central manager to be unusable. In such cases, the other machines in the LoadLeveler cluster will believe that the central manager machine is no longer operating. To remedy this situation, you can assign one or more alternate central managers in the machine stanza to take control.
The following machine stanza example defines the machine ha5 as an alternate central manager:
ha5: type = machine
central_manager = alt
|
If the primary central manager fails, the alternate central manager then becomes the central manager. When an alternate becomes the central manager, jobs will not be lost, but it may take a few minutes for all of the machines in the cluster to check in with the new central manager. As a result, job status queries may be incorrect for a short time.
When you define alternate central managers, you should set the following keywords in the configuration file:
CENTRAL_MANAGER_HEARTBEAT_INTERVAL = <amount of time in seconds> CENTRAL_MANAGER_TIMEOUT = < the number of heartbeat intervals> |
In the example below, the alternate central manager will wait for two intervals, where each interval is 30 seconds:
CENTRAL_MANAGER_HEARTBEAT_INTERVAL = 30 CENTRAL_MANAGER_TIMEOUT = 2 |
Test failover of the central manager machine
You can test failover by killing the central manager process called Loadl_negotiator on the ha4 machine. To prevent the central manager daemon from being restarted on the same node again, you should set the RESTARTS_PER_HOUR keyword to 0. This will bring up the central manager on the alternate, ha5, in about a minute.
Set up high availability for machines executing LoadLeveler
This setup is similar to setting up high availability for scheduling machines.
In this installment, you have seen how to implement high availability for a LoadLeveler cluster using the built-in capabilities of LoadLeveler, and how to further enhance availability by using open source software. Part 4 of this series will examine a high-availability implementation of IBM WebSphere® Application Server.
I would like to thank Waiman Chan for reviewing the high-availability implementation for IBM LoadLeveler, and Liana L. Fong for providing technical guidance on IBM LoadLeveler.
| Description | Name | Size | Download method |
|---|---|---|---|
| Sample code package for this series of articles | hahbcode.tar.gz | 25 KB | HTTP |
Information about download methods
- Read the other articles in this series:
- "High-availability middleware on Linux, Part 1: Heartbeat and Apache Web server"
- "High-availability middleware on Linux, Part 2: IBM WebSphere MQ"
- "High-availability middleware on Linux, Part 4: IBM WebSphere Application Server"
- For more details on installing and running IBM LoadLeveler, read the LoadLeveler documentation. In particular, you might want to read the following:
- "LoadLeveler for AIX 5L and Linux: Installation Memo" (GI11-2819-02)
- "LoadLeveler for AIX 5L and Linux: Diagnosis and Messages Guide" (GA22-7882-02)
- "LoadLeveler for AIX 5L and Linux: Using and Administering" (SA22-7881-02)
- Read the IBM Software Announcement for information on ordering IBM LoadLeveler.
- Check out the High-Availability Linux project Web site for more information on heartbeat, including Heartbeat success stories.
- You can download most of the software needed for this series of articles at these locations (note that not all of the downloads are free):
- Red Hat Enterprise Linux 3.0 (2.4.21-15.EL)
- Heartbeat 1.2.3
- IBM Java™ 2 SDK 1.4.2
- IBM WebSphere® MQ for Linux 5.3.0.2 with Fix Pack 7
- IBM WebSphere Base Edition 5.1.1 for Linux with Cumulative Fix 1
- IBM DB2® Universal Database™ Enterprise Server Edition V8.1 for Linux
- André Bonhôte shows how to build an HA NFS server in his article "Inner Pulse" (in PDF format) in the August 2003 issue of the European publication Linux Magazine.
- Check out the IBM Redbook, "Implementing High Availability on RISC/6000 SP" for implementing LoadLeveler high availability using HACMP.
- Learn about the features in DB2 Universal Database that provide high-availability capabilities in "An Overview of High Availability and Disaster Recovery for DB2 UDB" (developerWorks, April 2003).
- For a detailed discussion of
availability and how to plan for and maintain it in an enterprise middleware environment, read "Planning for Availability in the Enterprise" (developerWorks, December 2003).
- Get more information on load balancing and failover support for Linux on POWER in the article
"Creating a WebSphere Application Server V5 cluster" (developerWorks, January 2004).
- Find more resources for Linux developers in the developerWorks Linux zone.
- Get involved in the developerWorks community by participating in
developerWorks blogs.
- Browse for books on these and other technical topics.
- Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
- Innovate your next Linux development project with IBM trial software, available for download directly from developerWorks.
Hidayatullah H. Shaikh is a Senior Software Engineer on the IBM T.J. Watson Research Center's On-Demand Architecture and Development Team. His areas of interest and expertise include business process modeling and integration, service-oriented architecture, grid computing, e-commerce, enterprise Java, database management systems, and high-availability clusters. You can contact Hidayatullah at hshaikh@us.ibm.com.
Comments (Undergoing maintenance)





