High-availability middleware on Linux, Part 1

Heartbeat and Apache Web server

Open source software offers a low-cost solution

Content series:

This content is part # of # in the series: High-availability middleware on Linux, Part 1

Stay tuned for additional content in this series.

This content is part of the series:High-availability middleware on Linux, Part 1

Stay tuned for additional content in this series.

Maintaining maximum system uptime is increasingly critical to the success of on demand computing. Unfortunately, many off-the-shelf solutions for high availability (HA) are expensive and require expertise. This series of five articles offers a lower-cost alternative to achieving HA services using publicly available software.

The step-by-step procedures in this series show how to build a highly available Apache Web server, WebSphere® MQ queue manager, LoadLeveler cluster, WebSphere Application Server cluster, and DB2® Universal Database on Linux™. A systems administrator can learn to use and maintain this system with minimal time investment. The techniques described in this series also apply to any number of services on Linux.

To get the most out of this series, you should have a basic understanding of WebSphere MQ, WebSphere Application Server, IBM LoadLeveler, DB2 Universal Database, and high-availability clusters.

Using any software product in a business-critical or mission-critical environment requires that you consider availability, a measure of the ability of a system to do what it is supposed to do, even in the presence of crashes, equipment failures, and environmental mishaps. As more and more critical commercial applications move onto the Internet, providing highly available services becomes increasingly important.

This article highlights implementation issues that you may encounter when implementing HA solutions. We'll review HA concepts, available HA software, hardware to use, and installation and configuration details about heartbeat (open source HA software for Linux) -- and we'll see how a Web server can be made highly available using heartbeat.

Hardware requirements

The test scenarios described in this series require the following hardware:

  • Four systems that support Linux, with Ethernet network adapters
  • One shared external SCSI hard drive (twin tail disk)
  • One IBM serial null modem cable

In my setup, I used IBM eServer™ xSeries® 335 machines with 1 GB of RAM. For shared disk, I used one of these machines as an NFS server. The software requirements for the complete setup are as follows, although for this article you need only Red Hat Enterprise Linux and heartbeat:

  • Red Hat Enterprise Linux 3.0 (2.4.21-15.EL)
  • heartbeat 1.2.2
  • IBM Java 2 SDK 1.4.2
  • WebSphere MQ for Linux with Fix Pack 7
  • LoadLeveler for Linux 3.2
  • WebSphere Base Edition 5.1.1 for Linux with Cumulative Fix 1
  • WebSphere ND 5.1 for Linux with Fixpack 1
  • DB2 Universal Database Enterprise Server Edition 8.1 Linux

You can get the test scenarios by downloading the code package listed in the Download section below. Table 1 describes the directories in hahbcode.tar.gz.

Table 1. What's in the sample code package
Directory      Contents
heartbeatSample configuration files for heartbeat
wwwHTML files for testing HA for Apache Web Server
mqScripts and code for WebSphere MQ HA:
  • mqseries: Script to start and stop WebSphere MQ queue manager and other processes as a Linux service
  • hascript: Scripts for creating HA queue manager
  • send (sh/bat): Script to put data on a queue
  • receive (sh/bat): Script to browse/get data from queue
loadlThe loadl file to start and stop LoadLeveler as a Linux service
wasScripts and code for WebSphere Application Server HA:
  • wasdmgr: Script to start and stop WebSphere ND Deployment Manager as a Linux service
  • wasnode: Script to start and stop WebSphere Node Agent as a Linux service
  • wasserver: Script to start and stop WebSphere Application Server as a Linux service
  • sample_ver_(1/2/3): Directories containing different versions of a sample enterprise application for testing WebSphere HA
db2Scripts to check database availability, create a table, insert rows in table, and select rows from a table

High availability concepts

High availability is the system management strategy of quickly restoring essential services in the event of system, component, or application failure. The goal is minimal service interruption rather than fault tolerance. The most common solution for a failure of a system performing critical business operations is to have another system waiting to assume the failed system's workload and continue business operations.

The term "cluster" has different meanings within the computing industry. Throughout this article, unless noted otherwise, cluster describes a heartbeat cluster, which is a collection of nodes and resources (such as disks and networks) that cooperate to provide high availability of services running within the cluster. If one of those machines should fail, the resources required to maintain business operations are transferred to another available machine in the cluster.

The two main cluster configurations are:

  • Standby configuration: The most basic cluster configuration, in which one node performs work while the other node acts only as standby. The standby node does not perform work and is referred to as idle; this configuration is sometimes called cold standby. Such a configuration requires a high degree of hardware redundancy. This series of articles focuses on cold standby configuration.
  • Takeover configuration: A more advanced configuration in which all nodes perform some kind of work, and critical work can be taken over in the event of a node failure. In a one-sided takeover configuration, a standby node performs some additional, non-critical, non-movable work. In a mutual takeover configuration, all nodes are performing highly available (movable) work. This series of articles does not address takeover configuration.

You must plan for several key items when setting up an HA cluster:

  • The disks used to store the data must be connected by a private interconnect (serial cable) or LAN to the servers that make up the cluster.
  • There must be a method for automatic detection of a failed resource. This is done by a software component referred to as a heartbeat monitor.
  • There must be automatic transfer of resource ownership to one or more surviving cluster members upon failure.

Available HA software

Much currently available software performs heartbeat monitoring and resource takeover functionality. Here is a list of available software for building high-availability clusters on various operating systems (see Related topics for links):

  • heartbeat (Linux)
  • High Availability Cluster Multiprocessing - HACMP (AIX)
  • IBM Tivoli System Automation for Multiplatforms (AIX, Linux)
  • Legato AAM 5.1 (AIX, HP-UX, Solaris, Linux, Windows)
  • SteelEye LifeKeeper (Linux, Windows)
  • Veritas Cluster Server (AIX, HP-UX, Solaris, Linux, Windows)

This series describes the open source HA software heartbeat. However, you can apply the concepts you learn here to any of the above software systems.

High-Availability Linux project and heartbeat

The goal of the open source project called High-Availability Linux is to provide a clustering solution for Linux that promotes reliability, availability, and serviceability (RAS) through a community development effort. The Linux-HA project is widely used and is an important component in many interesting high-availability solutions.

Heartbeat is one of the publicly available packages at the Linux-HA project Web site. It provides the basic functions required by any HA system such as starting and stopping resources, monitoring the availability of the systems in the cluster, and transferring ownership of a shared IP address between nodes in the cluster. It monitors the health of a particular service (or services) through either a serial line or Ethernet interface or both. The current version supports a two-node configuration where special heartbeat "pings" are used to check the status and availability of a service. Heartbeat provides the foundations for far more complex scenarios than the ones described in this series of articles, such as active/active configurations, where both nodes work in parallel and perform load balancing.

For more information on heartbeat and projects where it is being used, visit the Linux-HA project Web site (see Related topics for a link).

Cluster configuration

The test cluster configuration for these articles is shown in Figure 1. The setup consists of a pair of clustered servers (ha1 and ha2), both of which have access to a shared disk enclosure containing multiple physical disks; the servers are in cold standby mode. The application data needs to be on a shared device that both nodes can access. It can be a shared disk or a network file system. The device itself should be mirrored or have data protection to avoid data corruption. Such a configuration is frequently referred to as a shared disk cluster, but it is actually a shared-nothing architecture, as no disk is accessed by more than one node at a time.

Figure 1. Heartbeat cluster configuration in a production environment
Heartbeat cluster configuration in a production environment
Heartbeat cluster configuration in a production environment

For the test setup, I use NFS as the shared disk mechanism as shown in Figure 2, although I recommend using the option shown in Figure 1, especially in a production environment. A null modem cable connected between the serial ports of the two systems is used to transmit heartbeats between the two nodes.

Figure 2. Heartbeat cluster configuration using NFS for shared file system
Heartbeat cluster configuration using NFS for shared file system
Heartbeat cluster configuration using NFS for shared file system

Table 2 shows the configuration I used for both nodes. In your case, the host names and IP addresses should be known to either the DNS or the /etc/hosts files on both nodes.

Table 2. Test cluster configuration
RoleHostnameIP address
Shared (cluster) IP
Node1 (master)
Node2 (backup)
Node 3 (not shown)

Set up the serial connection

Use a null modem cable to connect the two nodes through their serial ports. Now test the serial connection, as follows:

On ha1 (receiver), type:

cat < /dev/ttyS0

On ha2 (sender) type:

echo "Serial Connection test" > /dev/ttyS0

You should see the text on the receiver node (ha1). If it works, change their roles and try again.

Set up NFS for a shared file system

As mentioned, I used NFS for shared data between nodes for the test setup.

  • The node is used as an NFS server.
  • The file system /ha is shared.

To get NFS up and running:

  1. Create a directory /ha on nfsha node.
  2. Edit the /etc/exports file. This file contains a list of entries; each entry indicates a volume that is shared and how it is shared. Listing 1 shows the relevant portion of the exports file for my setup.

    Listing 1. exports file
  3. Start the NFS services. If NFS is already running, you should run the command /usr/sbin/exportfs -ra to force nfsd to re-read the /etc/exports file.
  4. Add the file system /ha to your /etc/fstab file, on both the HA nodes -- ha1 and ha2 -- the same way as local file systems. Listing 2 shows the relevant portion of the fstab file for my setup:

    Listing 2. fstab file
    ...    /ha    nfs    noauto,rw,hard 0 0

    Later on, we will configure heartbeat to mount this file system.

  5. Extract the code sample, hahbcode.tar.gz, on this file system using the commands shown in Listing 3. (First download the code sample from the Download section below.)

    Listing 3. Extract sample code
    cd /ha
    tar  xvfz  hahbcode.tar.gz

Download and install heartbeat

Download heartbeat using the link in Related topics, then install it on both ha1 and ha2 machines by entering the commands in Listing 4 (in the order given).

Listing 4. Commands for installing heartbeat
rpm -ivh heartbeat-pils-1.2.2-8.rh.el.3.0.i386.rpm
rpm -ivh heartbeat-stonith-1.2.2-8.rh.el.3.0.i386.rpm
rpm -ivh heartbeat-1.2.2-8.rh.el.3.0.i386.rpm

Configure heartbeat

You must configure three files to get heartbeat to work: authkeys,, and haresources. I'll show you the specific configuration I used for this implementation; if you need more information, please refer to the heartbeat Web site and read their documentation (see Related topics).

1. Configure /etc/ha.d/authkeys

This file determines your authentication keys for the cluster; the keys must be the same on both nodes. You can choose from three authentication schemes: crc, md5, or sha1. If your heartbeat runs over a secure network, such as the crossover cable in the example, you'll want to use crc. This is the cheapest method from a resources perspective. If the network is insecure, but you're either not very paranoid or concerned about minimizing CPU resources, use md5. Finally, if you want the best authentication without regard for CPU resources, use sha1, as it's the hardest to crack.

The format of the file is as follows:

auth <number>
<number> <authmethod> [<authkey>]

For the test setup I chose the crc scheme. Listing 5 shows the /etc/ha.d/authkeys file. Make sure its permissions are safe, such as 600.

Listing 5. authkeys file
auth 2
2 crc

2. Configure /etc/ha.d/

This file will be placed in the /etc/ha.d directory that is created after installation. It tells heartbeat what types of media paths to use and how to configure them. This file also defines the nodes in the cluster and the interfaces that heartbeat uses to verify whether or not a system is up. Listing 6 shows the relevant portion of the /etc/ha.d/ file for my setup.

Listing 6. file
#	File to write debug messages to
debugfile /var/log/ha-debug
# 	File to write other messages to
logfile	/var/log/ha-log
#	Facility to use for syslog()/logger
logfacility	local0
#	keepalive: how long between heartbeats?
keepalive 2
#	deadtime: how long-to-declare-host-dead?
deadtime 60
#	warntime: how long before issuing "late heartbeat" warning?
warntime 10
#	Very first dead time (initdead)
initdead 120
#	Baud rate for serial ports...
baud	19200
#	serial	serialportname ...
serial	/dev/ttyS0
#	auto_failback:  determines whether a resource will
#	automatically fail back to its "primary" node, or remain
#	on whatever node is serving it until that node fails, or
#	an administrator intervenes.
auto_failback on
#	Tell what machines are in the cluster
#	node	nodename ...	-- must match uname -n
#	Less common options...
#	Treats as a pseudo-cluster-member
#	Used together with ipfail below...
#	Processes started and stopped with heartbeat.  Restarted unless
#		they exit with rc=100
respawn hacluster /usr/lib/heartbeat/ipfail

3. Configure /etc/ha.d/haresources

This file describes the resources that are managed by heartbeat. The resources are basically just start/stop scripts much like the ones used for starting and stopping resources in /etc/rc.d/init.d. Note that heartbeat will look in /etc/rc.d/init.d and /etc/ha.d/resource.d for scripts. The script file httpd comes with heartbeat. Listing 7 shows my /etc/ha.d/haresources file:

Listing 7. haresources file,hard httpd

This file must be the same on both the nodes.

This line dictates that on startup:

  • Have ha1 serve the IP
  • Mount the NFS shared file system /ha
  • Start Apache Web server

I will be adding more resources to this file in later articles. On shutdown, heartbeat will:

  • Stop the Apache server
  • Unmount the shared file system
  • Give up the IP

This assumes that the command uname -n displays; yours may well produce ha1, and if it does, use that instead.

Configure the Apache HTTP server for HA

In this step I will make a few changes to the Apache Web server setup so that it will serve files from the shared system and from filesystems local to the two machines ha1 and ha2. The index.html file (included with the code samples) will be served from the shared disk, and the hostname.html file will be served from a local file system on each of the machines ha1 and ha2. To implement HA for the Apache Web server:

  1. Log in as root.
  2. Create the following directories on the shared disk (/ha):

  3. Set appropriate permissions on the shared directories using commands shown below on the node ha1:

    chmod 775 /ha/www
    chmod 775 /ha/www/html
  4. On both the primary and backup machines, rename the html directory of the Apache Web server:

    mv /var/www/html /var/www/htmllocal
  5. Create symbolic links to the shared directories using the following commands on both the machines:

    ln -s /ha/www/html /var/www/html
  6. Copy the index.html file to the /ha/www/html directory on the node ha1:

    cp /ha/hahbcode/www/index.html /var/www/html

    You will have to change the cluster name in this file.

  7. Copy the hostname.html file to the /ha/www/htmllocal directory on both the machines:

    cp /ha/hahbcode/www/hostname.html /var/www/html

    Change the cluster name and the node name in this file.

  8. Create symbolic links to the hostname.html file on both the machines:

    ln -s /var/www/htmllocal/hostname.html /ha/www/html/hostname.html

Now you are ready to test the HA implementation.

Test HA for the Apache HTTP server

To test the high availability of the Web server:

  1. Start the heartbeat service on the primary and then on the backup node using this command:

    /etc/rc.d/init.d/heartbeat start

    If it fails, look in /var/log/messages to determine the reason and then correct it. After heartbeat starts successfully, you should see a new network interface with the IP address that you configured in the file. Once you've started heartbeat, take a peek at your log file (default is /var/log/ha-log) on the primary and make sure that it is doing the IP takeover and then starting the Apache Web server. Use the ps command to make sure the Web server daemons are running on the primary node. Heartbeat will not start any Web server processes on the backup. This happens only after the primary fails.

  2. Verify that the two Web pages are being served correctly on the ha1 node by pointing the browser at the following URLs (yours will differ if you use a different host name):

    Note that I am using the cluster address in the above URLs and not the address of the primary node.

    The browser should display the following text for the first URL:

    Hello!!! I am being served from a High Availability Cluster

    The browser should display the following text for the second URL:

    Hello!!! I am being served from a node in a High Availability Cluster
  3. Simulate failover by simply stopping heartbeat on the primary system using the command shown below:

    /etc/rc.d/init.d/heartbeat stop

    You should see all the Web server processes come up on the second machine in under a minute. If you do not, look in /var/log/messages to determine the problem and correct it.

  4. Verify that the two Web pages are being served correctly on the ha2 node by pointing the browser at the following URLs:

    The browser should display the following text for the first URL:

    Hello!!! I am being served from a High Availability Cluster

    The browser should display the following text for the second URL:

    Hello!!! I am being served from a node in a High Availability Cluster

    Note that the node serving this page now is ha2.

  5. Restart the heartbeat service back on the primary. This should stop the Apache server processes on the secondary and start them on the primary. The primary should also take over the cluster IP.

Thus, by putting the Web pages on the shared disk, a secondary machine can serve them to a client in the event of failure of the primary machine. The failover is transparent to the client accessing the Web pages. This technique can be applied to serving CGI scripts as well.


I hope you will try this technique for setting up a very highly available Web server using inexpensive hardware and readily available software. In the next article in this series, you'll see how to build a highly available messaging queue manager using WebSphere MQ.

Downloadable resources

Related topics

Zone=Linux, Open source
ArticleTitle=High-availability middleware on Linux, Part 1: Heartbeat and Apache Web server