Skip to main content

High-availability middleware on Linux, Part 3: IBM LoadLeveler

Matching high availability with server load balancing

Hidayatullah Shaikh (hshaikh@us.ibm.com), Senior Software Engineer, IBM
Hidayatullah H. Shaikh is a Senior Software Engineer on the IBM T.J. Watson Research Center's On-Demand Architecture and Development Team. His areas of interest and expertise include business process modeling and integration, service-oriented architecture, grid computing, e-commerce, enterprise Java, database management systems, and high-availability clusters. You can contact Hidayatullah at hshaikh@us.ibm.com.

Summary:  Workload management is critically important for an on-demand business. IBM® LoadLeveler® is a job-management system that allows users to run more jobs in less time by matching the jobs' processing needs with the available resources. Maintaining maximum system uptime of the job management system is increasingly important. Learn how you can achieve high availability for a LoadLeveler cluster using the built-in high-availability capabilities of LoadLeveler and further enhancing it using open source high-availability software.

Date:  28 Feb 2005
Level:  Intermediate
Activity:  2017 views

IBM LoadLeveler is a workload management system that allows users to submit, schedule, and execute jobs to a pool of resources. It provides dynamic scheduling and workload balancing for optimum utilization of cluster resources. It also provides a single point of control for effective job and workload management.

LoadLeveler runs as a set of daemons on each of its client machines. A group of client machines reporting to a central manager machine is referred to as a LoadLeveler cluster, and a LoadLeveler cluster is defined by a configuration file.

To get the most out of this article, you should have a basic understanding of IBM LoadLeveler and high-availability clusters. You should also be familiar with the first article in this series, High-availability middleware on Linux, Part 1: Heartbeat and Apache Web server.

Implementing LoadLeveler for high availability

Each machine in a LoadLeveler cluster performs one or more roles in scheduling jobs. These roles and the implications of their failure are as follows:

  • Scheduling machine: A job submission results in the submission being placed in a queue managed by a scheduling machine. The scheduling machine asks the central manager (this role is described next) to find a machine that can run the job, and the scheduling machine also keeps persistent information about the job on the disk.

    The job information on one scheduling machine is not normally shared or accessed by other scheduling machines in the LoadLeveler cluster. The scheduling machines operate independently, and in the event of a scheduling machine failure, the job information that resides on the failed scheduling machine will be temporarily unavailable (but not lost). Jobs waiting to be scheduled will not be considered for execution during this time. It is critically important to have the scheduling machine re-established as quickly as possible. In the example of a high-availability configuration for a scheduling machine, the necessary local files and directories are placed on external shared disk storage, which makes them available to a backup scheduling machine in the event of scheduler node failure.

  • Central manager machine: The role of the central manager is to find one or more machines in the LoadLeveler cluster, based on the job's requirements, that will run the job. Upon finding the machine(s), it notifies the scheduling machine.

    The central manager is the central point of control for LoadLeveler. LoadLeveler lets you define an alternate central manager that can take over the role of the primary central manager in case of failure. A later section discusses this in detail.

    In the event of a central manager failure without an alternate central manager defined, jobs that have started will run to completion without loss of information, but their status will not be reported to the users. New jobs will be accepted by scheduling nodes and later forwarded to the central manager when it returns, or when an alternate takes over.

  • Executing machine: The machine that runs the job is known as the executing machine.

    When an execution node fails, the jobs running on the node fail and require a restart when the node is restored. Jobs will start either from the beginning or from the last checkpoint. Job checkpointing can be selected as an option or coded in the application. The establishment of a backup execution node would immediately provide the capability to restart the job in a more timely fashion. With or without a backup node, jobs and their job information and checkpoints are not lost in the event of node failure as long as their disks are still accessible using the appropriate cabling and disk techniques.

    Both the scheduling node and the execution nodes maintain persistent information about the state of their jobs. The scheduling node and the execution node use a protocol that ensures that the state information is kept on disk by at least one of them. This ensures that the state can be recovered from either the scheduling node or the execution node in event of failure. Neither the scheduling node nor the execution node discard the job information until it is passed on to, and accepted by, the other node. In the configuration illustrated in this article, the job information is stored on a shared disk.

  • Submitting machine: This machine is used to submit, query, and cancel jobs and have no persistent data about a job. Lack of any persistent data makes high availability non-critical for these machines.

The role a client machine plays depends on which LoadLeveler daemons are configured on it. As a whole, the cluster provides the capability to build, submit, execute, and manage serial and parallel batch jobs.

The next sections discuss some of the built-in capabilities of LoadLeveler for high availability and how you can enhance the overall system availability by using heartbeat.


Install IBM LoadLeveler

This section shows how to install IBM LoadLeveler 3.2 for Linux™ on three machines: ha1, ha2, and ha3. These steps are based on the installation steps outlined in Chapter 4 of the LoadLeveler Installation Memo document (see Resources for a link).

  1. Create directories for LoadLeveler as follows:
    • Local directory: /var/loadl
    • Home directory: /home/loadl
    • Release directory: /opt/ibmll/LoadL/full
    • Name of central manager machine: ha

  2. Log in as root

  3. Create the loadl group name. The command below creates a group loadl with group ID of 1000 on all the nodes:

    groupadd g 1000 loadl
    



  4. Create the loadl user ID. The command below creates a userid loadl on all the nodes:

    useradd c loadleveler_user -d /home/loadl -s /bin/bash \
            -u 1000 g 1000 -m loadl
    



  5. Install the LoadLeveler RPMs
    1. Install license RPM (where LLIMAGEDIR contains the installation RPMS)

      cd $LLIMAGEDIR
      
      rpm -Uvh LoadL-full-license-3.2.0.0-0.i386.rpm
      



    2. Install the other RPM

      cd /opt/ibmll/LoadL/sbin
      
      ./install_ll -y -d "$LLIMAGEDIR"
      



  6. Run the initialization script llinit

    1. Create the local directory:

      mkdir /var/loadl
      



    2. Set ownership

      chown loadl.loadl /var/loadl
      



    3. Switch to the
      loadl

      ID

      su - loadl
      



    4. Change the current directory to the bin subdirectory in the release directory by entering the following:

      cd /opt/ibmll/LoadL/full/bin
      



    5. Ensure that you have write privileges in the LoadLeveler home, local, and /tmp directories

    6. Enter the llinit command as shown below:

      ./llinit -local /var/loadl -release /opt/ibmll/LoadL/full -cm ha
      


Create a highly available scheduling machine configuration

In this setup:

  • The machine ha1 will act as the primary scheduling machine
  • The machine ha2 will act as a standby scheduling machine
  • The machine ha3 will be used as a job execution machine

In a high-availability configuration, the necessary local files and directories for ha1 are placed on external shared disk storage, which makes them available to ha2 in the event of scheduler node failure. Here's how to set this up:

  1. Log in as root on the nodes ha1 and ha2.

  2. Create the following directories on the shared disk (/ha):

    • /ha/loadl/execute
    • /ha/loadl/spool
  3. Set ownership by running the following commands on the node ha1:

    chown loadl.loadl /ha/loadl/
    chown loadl.loadl /ha/loadl/execute
    chown loadl.loadl /ha/loadl/spool
    



  4. Switch to the loadl ID on the nodes ha1 and ha2:

    su - loadl
    



  5. Set appropriate permissions on the shared directories using commands shown below on the node ha1:

    chmod 0777 /ha/loadl/execute
    chmod 775 /ha/loadl/spool
    



  6. On the nodes ha1 and ha2, delete the execute and spool directories under /var/loadl.

    rm -rf /var/loadl/execute
    rm rf /var/loadl/spool
    



  7. Create symbolic links to the shared directories using the following commands on the nodes ha1 and ha2:

    ln -s /ha/loadl/execute /var/loadl/execute
    ln -s /ha/loadl/spool /var/loadl/spool
    



  8. Edit the machine's stanza. Shown below are the relevant portions of the LoadL_admin (under /home/loadl) file on the different nodes:

    1. ha1 acts as a primary scheduling machine and the central manager. In a production environment, I recommend putting the central manager and scheduling daemon on separate machines.

      ...
      ha1:   type = machine
             central_manager = true
             schedd_host = true
      ha3:   type = machine
      ...
      



    2. ha2 acts as a backup scheduling machine and central manager

      ...
      ha2:   type = machine
             central_manager = true
             schedd_host = true
      ha3:   type = machine
      ...
      



    3. ha3 acts as an execution machine

      ...
      ha:    type = machine
             central_manager = true
             schedd_host = true
      ha3:   type = machine
      ...
      



  9. Change the RUNS_HERE flags in the LoadL_config (under /home/loadl) file on the different machines, as follows:

    1. ha1 and ha2

      ...
      SCHEDD_RUNS_HERE        =       True
      STARTD_RUNS_HERE        =       False
      ...
      

      This will ensure that a job doesn't get scheduled to be executed on the scheduling machines. We want ha1 and ha2 to act as scheduling machines only.

    2. ha3

      ...
      SCHEDD_RUNS_HERE        =       False
      STARTD_RUNS_HERE        =       True
      ...
      



  10. Edit the /etc/hosts files on the three nodes as follows:

    1. ha1

      ...
      9.22.7.46    ha.haw2.ibm.com ha
      9.22.7.46    ha2.haw2.ibm.com ha2
      ...
      



    2. ha2

      ...
      9.22.7.46    ha.haw2.ibm.com ha
      9.22.7.46    ha1.haw2.ibm.com ha1
      ...
      



    3. ha3

      ...
      9.22.7.46    ha.haw2.ibm.com ha
      9.22.7.46    ha1.haw2.ibm.com ha1
      9.22.7.46    ha2.haw2.ibm.com ha2
      ...
      


Configure heartbeat to manage the scheduling machine

To configure heartbeat to manage LoadLeveler:

  1. Create a script to start and stop the LoadLeveler processes. A very basic script is shown in Listing 1. You can further customize it to suit your setup. Place this script in the /etc/rc.d/init.d directory.


    Listing 1. loadl script
    
    #!/bin/bash
    #
    #	/etc/rc.d/init.d/loadl
    #
    # Starts the loadleveler processes
    #
    # chkconfig: 345 89 56
    # description: Runs loadleveler
    
    . /etc/init.d/functions
    # Source function library.
    
    PATH=/usr/bin:/bin:/opt/ibmll/LoadL/full/bin
    #======================================================================
    SU="sh"
    if [ "`whoami`" = "root" ]; then
        SU="su - loadl"
    fi
    #======================================================================
    start() {
    	echo "$0: starting loadleveler"
    	$SU -c "llctl start"
    }
    #======================================================================
    stop() {
    	echo "$0: Stoping loadleveler"
    	$SU -c "llctl stop"
    }
    
    
    case $1 in
    'start')
        start
        ;;
    'stop')
        stop
        ;;
    'restart')
        stop
        start
        ;;
    *)
        echo "usage: $0 {start|stop|restart}"
        ;;
    esac
    



  2. Now, configure the /etc/ha.d/haresources file (on both the scheduling machine nodes) to include the above loadl script. The relevant portion of the modified file is:

    ha1.haw2.ibm.com 9.22.7.46 Filesystem::hanfs.haw2.ibm.com:/ha::/ha::nfs::rw,hard loadl
    

    This line dictates that, on startup of heartbeat, have ha1 serve the cluster IP address, mount the shared filesystem, and then start the LoadLeveler daemons. On shutdown, heartbeat will first stop LoadLeveler, then un-mount the shared filesystem, and finally give up the IP.


Test scheduling machine failover

This section shows how to test the high availability of the scheduling daemon.

  1. Start the heartbeat service on the primary and then on the backup node with this command:

    /etc/rc.d/init.d/heartbeat start
    

    After heartbeat starts successfully, you should see a new interface with the IP address that you configured in the ha.cf file. Once you've started heartbeat, take a peek at your log file (default is /var/log/ha-log) on the primary and make sure that it is doing the IP takeover and then starting LoadLeveler. Use the ps command to make sure LoadLeveler daemons are running on the primary node. Heartbeat will not start any of the above processes on the backup. This happens only after the primary fails.

  2. Start the loadleveler daemons on ha3, as user loadl, using the following command:

    llctl start
    



  3. Make ha3 unavailable, as an executing machine, by draining the startd daemon. This is needed to provide enough time, after submitting a job, to test failover. Use the following command as user loadl on node ha3:

    llctl drain startd
    



  4. Check to see what jobs are scheduled on the LoadLeveler cluster using the command below on the ha1 machine as user loadl:

    llq
    

    The output of the command for our setup is shown in Figure 1. This command will list any old incomplete jobs on the LoadLeveler cluster.


    Figure 1. Output of the command llq on node ha1
    Output of the command llq on node ha1

    Check to see the status of machines on the LoadLeveler cluster by running the command below on the ha1 machine as user loadl:

    llstatus
    


    Figure 2. Output of the command llstatus on node ha1
    Output of the command llstatus on node ha1

    In Figure 2 you can see that Scehdd is available on the ha1 machine, and Startd is drained on the ha3 machine because of step 3, above.

  5. Set ownership of the samples directory. Use the command shown below on the node ha3 (where the job will run):

    chown loadl.loadl /opt/ibmll/LoadL/full/samples
    



  6. Submit a job. Submit one of the sample jobs that come with LoadLeveler by executing the commands below on machine ha1 as user loadl:

    cd /home/loadl/samples
    
    llsubmit job1.cmd
    

    If successful, you'll see a message similar to:

    llsubmit: The job "ha1.haw2.ibm.com.23" with 2 job steps has been submitted.
    

    The sample, job1, is a two-step job that should result in two new job steps being created. These two new jobs steps will go to an idle state (I) for the lack of availability of an executing machine. See Figure 3.


    Figure 3. Output of the command llq on node ha1 after job1 is submitted
    Output of the command llq on node ha1 after job1 is submitted

    Now, fail the primary scheduling machine.

  7. Simulating failover. You can do this simply by stopping heartbeat on the primary system, using this command:

    /etc/rc.d/init.d/heartbeat stop
    

    You should see all the services come up on the backup machine in under a minute. You can verify that LoadLeveler is running on the backup by checking the /var/log/ha-log file and using the ps command on the backup machine. Once the backup has taken over control, run the llstatus and llq command on the backup as user loadl. Figures 4 and 5 show the output for these runs.


    Figure 4. Output of the command llq, on node ha2, after failover
    Output of the command llq, on node ha2, after failover

    Figure 5. Output of the command llq on node ha1 after job1 is submitted
    Output of the command llstatus, on node ha2, after failover

    In Figures 4 and 5, notice:

    1. All the old incomplete jobs, including the two corresponding to the job submitted in step 6, above. This proves that the job information has survived a machine failover.
    2. Schedd is now available on the ha2 machine
    3. Startd is still down on the ha3 machine.

  8. Resume the startd daemon on the ha3 machine by executing the following command on ha3 as user loadl:

    llctl resume startd
    

    The ha3 machine is now available for executing jobs. The two jobs should go to a running state and should finally complete on the ha3 machine. You can verify job completion by taking a look at the .out files, created by the two steps of the job, in the /home/loadl/samples directory on the ha3 machine. For this run, you should see two files, job1.ha1.23.0.out and job1.ha1.23.1.out.

  9. Start the heartbeat service back on the primary. This should stop the LoadLeveler processes on the secondary and start them on the primary. The primary should also take over the cluster IP as well. Start the heartbeat service with this command:

    /etc/rc.d/init.d/heartbeat start
    

    You can see how, using shared disk, the jobs submitted before the failure of the primary scheduling node can be recovered after the backup has taken control.


Set up high availability for the central manager

If you decide to keep the central manager on a separate node (ha4) as the scheduling machine, you can leverage the built-in high availability features of the central manager. To try this configuration, define the ha4 machine as the central manager in the configuration files of all machines.

Problems with network communication or software or hardware failures can cause the central manager to be unusable. In such cases, the other machines in the LoadLeveler cluster will believe that the central manager machine is no longer operating. To remedy this situation, you can assign one or more alternate central managers in the machine stanza to take control.

The following machine stanza example defines the machine ha5 as an alternate central manager:

ha5: type = machine
     central_manager = alt

If the primary central manager fails, the alternate central manager then becomes the central manager. When an alternate becomes the central manager, jobs will not be lost, but it may take a few minutes for all of the machines in the cluster to check in with the new central manager. As a result, job status queries may be incorrect for a short time.

When you define alternate central managers, you should set the following keywords in the configuration file:

CENTRAL_MANAGER_HEARTBEAT_INTERVAL = <amount of time in seconds>
CENTRAL_MANAGER_TIMEOUT = < the number of heartbeat intervals>

In the example below, the alternate central manager will wait for two intervals, where each interval is 30 seconds:

CENTRAL_MANAGER_HEARTBEAT_INTERVAL = 30
CENTRAL_MANAGER_TIMEOUT = 2


Test failover of the central manager machine

You can test failover by killing the central manager process called Loadl_negotiator on the ha4 machine. To prevent the central manager daemon from being restarted on the same node again, you should set the RESTARTS_PER_HOUR keyword to 0. This will bring up the central manager on the alternate, ha5, in about a minute.


Set up high availability for machines executing LoadLeveler

This setup is similar to setting up high availability for scheduling machines.


Conclusion

In this installment, you have seen how to implement high availability for a LoadLeveler cluster using the built-in capabilities of LoadLeveler, and how to further enhance availability by using open source software. Part 4 of this series will examine a high-availability implementation of IBM WebSphere® Application Server.


Acknowledgments

I would like to thank Waiman Chan for reviewing the high-availability implementation for IBM LoadLeveler, and Liana L. Fong for providing technical guidance on IBM LoadLeveler.



Download

DescriptionNameSizeDownload method
Sample code package for this series of articleshahbcode.tar.gz25 KB HTTP

Information about download methods


Resources

About the author

Hidayatullah H. Shaikh is a Senior Software Engineer on the IBM T.J. Watson Research Center's On-Demand Architecture and Development Team. His areas of interest and expertise include business process modeling and integration, service-oriented architecture, grid computing, e-commerce, enterprise Java, database management systems, and high-availability clusters. You can contact Hidayatullah at hshaikh@us.ibm.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=49044
ArticleTitle=High-availability middleware on Linux, Part 3: IBM LoadLeveler
publish-date=02282005
author1-email=hshaikh@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers