High-availability middleware on Linux, Part 1: Heartbeat and Apache Web server

Open source software offers a low-cost solution

In this first of five articles, learn what it means for software to be highly available and how to install and set up heartbeat software from the High-Availability Linux project on a two-node system. You'll also learn how to configure the Apache Web server to run as a highly available service.

Hidayatullah Shaikh (hshaikh@us.ibm.com), Senior Software Engineer, IBM, Software Group

Hidayatullah H. Shaikh is a Senior Software Engineer on the IBM T.J. Watson Research Center's On-Demand Architecture and Development Team. His areas of interest and expertise include business process modeling and integration, service-oriented architecture, grid computing, e-commerce, enterprise Java, database management systems, and high-availability clusters. You can contact Hidayatullah at hshaikh@us.ibm.com.



12 October 2004

Also available in Russian Japanese

Maintaining maximum system uptime is increasingly critical to the success of on demand computing. Unfortunately, many off-the-shelf solutions for high availability (HA) are expensive and require expertise. This series of five articles offers a lower-cost alternative to achieving HA services using publicly available software.

The step-by-step procedures in this series show how to build a highly available Apache Web server, WebSphere® MQ queue manager, LoadLeveler cluster, WebSphere Application Server cluster, and DB2® Universal Database on Linux™. A systems administrator can learn to use and maintain this system with minimal time investment. The techniques described in this series also apply to any number of services on Linux.

To get the most out of this series, you should have a basic understanding of WebSphere MQ, WebSphere Application Server, IBM LoadLeveler, DB2 Universal Database, and high-availability clusters.

Introduction

Using any software product in a business-critical or mission-critical environment requires that you consider availability, a measure of the ability of a system to do what it is supposed to do, even in the presence of crashes, equipment failures, and environmental mishaps. As more and more critical commercial applications move onto the Internet, providing highly available services becomes increasingly important.

This article highlights implementation issues that you may encounter when implementing HA solutions. We'll review HA concepts, available HA software, hardware to use, and installation and configuration details about heartbeat (open source HA software for Linux) -- and we'll see how a Web server can be made highly available using heartbeat.


Hardware requirements

The test scenarios described in this series require the following hardware:

  • Four systems that support Linux, with Ethernet network adapters
  • One shared external SCSI hard drive (twin tail disk)
  • One IBM serial null modem cable

In my setup, I used IBM eServer™ xSeries® 335 machines with 1 GB of RAM. For shared disk, I used one of these machines as an NFS server. The software requirements for the complete setup are as follows, although for this article you need only Red Hat Enterprise Linux and heartbeat:

  • Red Hat Enterprise Linux 3.0 (2.4.21-15.EL)
  • heartbeat 1.2.2
  • IBM Java 2 SDK 1.4.2
  • WebSphere MQ for Linux 5.3.0.2 with Fix Pack 7
  • LoadLeveler for Linux 3.2
  • WebSphere Base Edition 5.1.1 for Linux with Cumulative Fix 1
  • WebSphere ND 5.1 for Linux with Fixpack 1
  • DB2 Universal Database Enterprise Server Edition 8.1 Linux

You can get the test scenarios by downloading the code package listed in the Download section below. Table 1 describes the directories in hahbcode.tar.gz.

Table 1. What's in the sample code package
Directory      Contents
heartbeatSample configuration files for heartbeat
wwwHTML files for testing HA for Apache Web Server
mqScripts and code for WebSphere MQ HA:
  • mqseries: Script to start and stop WebSphere MQ queue manager and other processes as a Linux service
  • hascript: Scripts for creating HA queue manager
  • send (sh/bat): Script to put data on a queue
  • receive (sh/bat): Script to browse/get data from queue
loadlThe loadl file to start and stop LoadLeveler as a Linux service
wasScripts and code for WebSphere Application Server HA:
  • wasdmgr: Script to start and stop WebSphere ND Deployment Manager as a Linux service
  • wasnode: Script to start and stop WebSphere Node Agent as a Linux service
  • wasserver: Script to start and stop WebSphere Application Server as a Linux service
  • sample_ver_(1/2/3): Directories containing different versions of a sample enterprise application for testing WebSphere HA
db2Scripts to check database availability, create a table, insert rows in table, and select rows from a table

High availability concepts

High availability is the system management strategy of quickly restoring essential services in the event of system, component, or application failure. The goal is minimal service interruption rather than fault tolerance. The most common solution for a failure of a system performing critical business operations is to have another system waiting to assume the failed system's workload and continue business operations.

The term "cluster" has different meanings within the computing industry. Throughout this article, unless noted otherwise, cluster describes a heartbeat cluster, which is a collection of nodes and resources (such as disks and networks) that cooperate to provide high availability of services running within the cluster. If one of those machines should fail, the resources required to maintain business operations are transferred to another available machine in the cluster.

The two main cluster configurations are:

  • Standby configuration: The most basic cluster configuration, in which one node performs work while the other node acts only as standby. The standby node does not perform work and is referred to as idle; this configuration is sometimes called cold standby. Such a configuration requires a high degree of hardware redundancy. This series of articles focuses on cold standby configuration.
  • Takeover configuration: A more advanced configuration in which all nodes perform some kind of work, and critical work can be taken over in the event of a node failure. In a one-sided takeover configuration, a standby node performs some additional, non-critical, non-movable work. In a mutual takeover configuration, all nodes are performing highly available (movable) work. This series of articles does not address takeover configuration.

You must plan for several key items when setting up an HA cluster:

  • The disks used to store the data must be connected by a private interconnect (serial cable) or LAN to the servers that make up the cluster.
  • There must be a method for automatic detection of a failed resource. This is done by a software component referred to as a heartbeat monitor.
  • There must be automatic transfer of resource ownership to one or more surviving cluster members upon failure.

Available HA software

Much currently available software performs heartbeat monitoring and resource takeover functionality. Here is a list of available software for building high-availability clusters on various operating systems (see Resources for links):

  • heartbeat (Linux)
  • High Availability Cluster Multiprocessing - HACMP (AIX)
  • IBM Tivoli System Automation for Multiplatforms (AIX, Linux)
  • Legato AAM 5.1 (AIX, HP-UX, Solaris, Linux, Windows)
  • SteelEye LifeKeeper (Linux, Windows)
  • Veritas Cluster Server (AIX, HP-UX, Solaris, Linux, Windows)

This series describes the open source HA software heartbeat. However, you can apply the concepts you learn here to any of the above software systems.


High-Availability Linux project and heartbeat

The goal of the open source project called High-Availability Linux is to provide a clustering solution for Linux that promotes reliability, availability, and serviceability (RAS) through a community development effort. The Linux-HA project is widely used and is an important component in many interesting high-availability solutions.

Heartbeat is one of the publicly available packages at the Linux-HA project Web site. It provides the basic functions required by any HA system such as starting and stopping resources, monitoring the availability of the systems in the cluster, and transferring ownership of a shared IP address between nodes in the cluster. It monitors the health of a particular service (or services) through either a serial line or Ethernet interface or both. The current version supports a two-node configuration where special heartbeat "pings" are used to check the status and availability of a service. Heartbeat provides the foundations for far more complex scenarios than the ones described in this series of articles, such as active/active configurations, where both nodes work in parallel and perform load balancing.

For more information on heartbeat and projects where it is being used, visit the Linux-HA project Web site (see Resources for a link).


Cluster configuration

The test cluster configuration for these articles is shown in Figure 1. The setup consists of a pair of clustered servers (ha1 and ha2), both of which have access to a shared disk enclosure containing multiple physical disks; the servers are in cold standby mode. The application data needs to be on a shared device that both nodes can access. It can be a shared disk or a network file system. The device itself should be mirrored or have data protection to avoid data corruption. Such a configuration is frequently referred to as a shared disk cluster, but it is actually a shared-nothing architecture, as no disk is accessed by more than one node at a time.

Figure 1. Heartbeat cluster configuration in a production environment
Heartbeat cluster configuration in a production environment

For the test setup, I use NFS as the shared disk mechanism as shown in Figure 2, although I recommend using the option shown in Figure 1, especially in a production environment. A null modem cable connected between the serial ports of the two systems is used to transmit heartbeats between the two nodes.

Figure 2. Heartbeat cluster configuration using NFS for shared file system
Heartbeat cluster configuration using NFS for shared file system

Table 2 shows the configuration I used for both nodes. In your case, the host names and IP addresses should be known to either the DNS or the /etc/hosts files on both nodes.

Table 2. Test cluster configuration
RoleHostnameIP address
Shared (cluster) IP ha.haw2.ibm.com9.22.7.46
Node1 (master)ha1.haw2.ibm.com9.22.7.48
Node2 (backup)ha2.haw2.ibm.com9.22.7.49
Node 3 (not shown)ha3.haw2.ibm.com9.23.7.50
NFS Serverhanfs.haw2.ibm.com9.2.14.175

Set up the serial connection

Use a null modem cable to connect the two nodes through their serial ports. Now test the serial connection, as follows:

On ha1 (receiver), type:

cat < /dev/ttyS0

On ha2 (sender) type:

echo "Serial Connection test" > /dev/ttyS0

You should see the text on the receiver node (ha1). If it works, change their roles and try again.


Set up NFS for a shared file system

As mentioned, I used NFS for shared data between nodes for the test setup.

  • The node nfsha.haw2.ibm.com is used as an NFS server.
  • The file system /ha is shared.

To get NFS up and running:

  1. Create a directory /ha on nfsha node.
  2. Edit the /etc/exports file. This file contains a list of entries; each entry indicates a volume that is shared and how it is shared. Listing 1 shows the relevant portion of the exports file for my setup.

    Listing 1. exports file
    ...
    /ha 9.22.7.48(rw,no_root_squash)
    /ha 9.22.7.46(rw,no_root_squash)
    /ha 9.22.7.35(rw,no_root_squash)
    /ha 9.22.7.49(rw,no_root_squash)
    /ha 9.22.7.50(rw,no_root_squash)
    ...
  3. Start the NFS services. If NFS is already running, you should run the command /usr/sbin/exportfs -ra to force nfsd to re-read the /etc/exports file.
  4. Add the file system /ha to your /etc/fstab file, on both the HA nodes -- ha1 and ha2 -- the same way as local file systems. Listing 2 shows the relevant portion of the fstab file for my setup:

    Listing 2. fstab file
    ...
    nfsha.haw2.ibm.com:/ha    /ha    nfs    noauto,rw,hard 0 0
    ...

    Later on, we will configure heartbeat to mount this file system.

  5. Extract the code sample, hahbcode.tar.gz, on this file system using the commands shown in Listing 3. (First download the code sample from the Download section below.)

    Listing 3. Extract sample code
    cd /ha
    tar  xvfz  hahbcode.tar.gz

Download and install heartbeat

Download heartbeat using the link in Resources, then install it on both ha1 and ha2 machines by entering the commands in Listing 4 (in the order given).

Listing 4. Commands for installing heartbeat
rpm -ivh heartbeat-pils-1.2.2-8.rh.el.3.0.i386.rpm
rpm -ivh heartbeat-stonith-1.2.2-8.rh.el.3.0.i386.rpm
rpm -ivh heartbeat-1.2.2-8.rh.el.3.0.i386.rpm

Configure heartbeat

You must configure three files to get heartbeat to work: authkeys, ha.cf, and haresources. I'll show you the specific configuration I used for this implementation; if you need more information, please refer to the heartbeat Web site and read their documentation (see Resources).

1. Configure /etc/ha.d/authkeys

This file determines your authentication keys for the cluster; the keys must be the same on both nodes. You can choose from three authentication schemes: crc, md5, or sha1. If your heartbeat runs over a secure network, such as the crossover cable in the example, you'll want to use crc. This is the cheapest method from a resources perspective. If the network is insecure, but you're either not very paranoid or concerned about minimizing CPU resources, use md5. Finally, if you want the best authentication without regard for CPU resources, use sha1, as it's the hardest to crack.

The format of the file is as follows:

auth <number>
<number> <authmethod> [<authkey>]

For the test setup I chose the crc scheme. Listing 5 shows the /etc/ha.d/authkeys file. Make sure its permissions are safe, such as 600.

Listing 5. authkeys file
auth 2
2 crc

2. Configure /etc/ha.d/ha.cf

This file will be placed in the /etc/ha.d directory that is created after installation. It tells heartbeat what types of media paths to use and how to configure them. This file also defines the nodes in the cluster and the interfaces that heartbeat uses to verify whether or not a system is up. Listing 6 shows the relevant portion of the /etc/ha.d/ha.cf file for my setup.

Listing 6. ha.cf file
...
#	File to write debug messages to
debugfile /var/log/ha-debug
#
#
# 	File to write other messages to
#
logfile	/var/log/ha-log
#
#
#	Facility to use for syslog()/logger
#
logfacility	local0
#
#
#	keepalive: how long between heartbeats?
#
keepalive 2
#
#	deadtime: how long-to-declare-host-dead?
#
deadtime 60
#
#	warntime: how long before issuing "late heartbeat" warning?
#
warntime 10
#
#
#	Very first dead time (initdead)
#
initdead 120
#
...
#	Baud rate for serial ports...
#
baud	19200
#
#	serial	serialportname ...
serial	/dev/ttyS0
#	auto_failback:  determines whether a resource will
#	automatically fail back to its "primary" node, or remain
#	on whatever node is serving it until that node fails, or
#	an administrator intervenes.
#
auto_failback on
#
...
#
#	Tell what machines are in the cluster
#	node	nodename ...	-- must match uname -n
node	ha1.haw2.ibm.com
node	ha2.haw2.ibm.com
#
#	Less common options...
#
#	Treats 10.10.10.254 as a pseudo-cluster-member
#	Used together with ipfail below...
#
ping 9.22.7.1
#	Processes started and stopped with heartbeat.  Restarted unless
#		they exit with rc=100
#
respawn hacluster /usr/lib/heartbeat/ipfail
...

3. Configure /etc/ha.d/haresources

This file describes the resources that are managed by heartbeat. The resources are basically just start/stop scripts much like the ones used for starting and stopping resources in /etc/rc.d/init.d. Note that heartbeat will look in /etc/rc.d/init.d and /etc/ha.d/resource.d for scripts. The script file httpd comes with heartbeat. Listing 7 shows my /etc/ha.d/haresources file:

Listing 7. haresources file
ha1.haw2.ibm.com 9.22.7.46 Filesystem::nfsha.haw2.ibm.com:/ha::/ha::nfs::rw,hard httpd

This file must be the same on both the nodes.

This line dictates that on startup:

  • Have ha1 serve the IP 9.22.7.46
  • Mount the NFS shared file system /ha
  • Start Apache Web server

I will be adding more resources to this file in later articles. On shutdown, heartbeat will:

  • Stop the Apache server
  • Unmount the shared file system
  • Give up the IP

This assumes that the command uname -n displays ha1.haw2.ibm.com; yours may well produce ha1, and if it does, use that instead.


Configure the Apache HTTP server for HA

In this step I will make a few changes to the Apache Web server setup so that it will serve files from the shared system and from filesystems local to the two machines ha1 and ha2. The index.html file (included with the code samples) will be served from the shared disk, and the hostname.html file will be served from a local file system on each of the machines ha1 and ha2. To implement HA for the Apache Web server:

  1. Log in as root.
  2. Create the following directories on the shared disk (/ha):

    /ha/www
    /ha/www/html
  3. Set appropriate permissions on the shared directories using commands shown below on the node ha1:

    chmod 775 /ha/www
    chmod 775 /ha/www/html
  4. On both the primary and backup machines, rename the html directory of the Apache Web server:

    mv /var/www/html /var/www/htmllocal
  5. Create symbolic links to the shared directories using the following commands on both the machines:

    ln -s /ha/www/html /var/www/html
  6. Copy the index.html file to the /ha/www/html directory on the node ha1:

    cp /ha/hahbcode/www/index.html /var/www/html

    You will have to change the cluster name in this file.

  7. Copy the hostname.html file to the /ha/www/htmllocal directory on both the machines:

    cp /ha/hahbcode/www/hostname.html /var/www/html

    Change the cluster name and the node name in this file.

  8. Create symbolic links to the hostname.html file on both the machines:

    ln -s /var/www/htmllocal/hostname.html /ha/www/html/hostname.html

Now you are ready to test the HA implementation.


Test HA for the Apache HTTP server

To test the high availability of the Web server:

  1. Start the heartbeat service on the primary and then on the backup node using this command:

    /etc/rc.d/init.d/heartbeat start

    If it fails, look in /var/log/messages to determine the reason and then correct it. After heartbeat starts successfully, you should see a new network interface with the IP address that you configured in the ha.cf file. Once you've started heartbeat, take a peek at your log file (default is /var/log/ha-log) on the primary and make sure that it is doing the IP takeover and then starting the Apache Web server. Use the ps command to make sure the Web server daemons are running on the primary node. Heartbeat will not start any Web server processes on the backup. This happens only after the primary fails.

  2. Verify that the two Web pages are being served correctly on the ha1 node by pointing the browser at the following URLs (yours will differ if you use a different host name):

    http://ha.haw2.ibm.com/index.html
    http://ha.haw2.ibm.com/hostname.html

    Note that I am using the cluster address in the above URLs and not the address of the primary node.

    The browser should display the following text for the first URL:

    Hello!!! I am being served from a High Availability Cluster ha.haw2.ibm.com

    The browser should display the following text for the second URL:

    Hello!!! I am being served from a node ha1.haw2.ibm.com in a High Availability Cluster ha.haw2.ibm.com
  3. Simulate failover by simply stopping heartbeat on the primary system using the command shown below:

    /etc/rc.d/init.d/heartbeat stop

    You should see all the Web server processes come up on the second machine in under a minute. If you do not, look in /var/log/messages to determine the problem and correct it.

  4. Verify that the two Web pages are being served correctly on the ha2 node by pointing the browser at the following URLs:

    http://ha.haw2.ibm.com/index.html
    http://ha.haw2.ibm.com/hostname.html

    The browser should display the following text for the first URL:

    Hello!!! I am being served from a High Availability Cluster ha.haw2.ibm.com

    The browser should display the following text for the second URL:

    Hello!!! I am being served from a node ha2.haw2.ibm.com in a High Availability Cluster ha.haw2.ibm.com

    Note that the node serving this page now is ha2.

  5. Restart the heartbeat service back on the primary. This should stop the Apache server processes on the secondary and start them on the primary. The primary should also take over the cluster IP.

Thus, by putting the Web pages on the shared disk, a secondary machine can serve them to a client in the event of failure of the primary machine. The failover is transparent to the client accessing the Web pages. This technique can be applied to serving CGI scripts as well.


Conclusion

I hope you will try this technique for setting up a very highly available Web server using inexpensive hardware and readily available software. In the next article in this series, you'll see how to build a highly available messaging queue manager using WebSphere MQ.


Download

DescriptionNameSize
Sample code package for this articlehahbcode.tar.gz25 KB

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=23294
ArticleTitle=High-availability middleware on Linux, Part 1: Heartbeat and Apache Web server
publish-date=10122004