Skip to main content

A highly available and scalable LDAP cluster in an AIX environment

Veronika Megler (vmegler@us.ibm.com), Certified Consulting IT Architect, IBM
Veronika Megler is a Certified Consulting IT Architect in IBM's Solutions Integration Technology Center, pSeries Solutions Development organization. She has worked in operations, application development, systems programming, systems management disciplines, and IT management consulting. She enjoys architecting solutions to solve real business problems, then proving they work in practice. You can contact her at vmegler@us.ibm.com.
Jay Bockelman (dwinfo@us.ibm.com), developerWorks, IBM
Jay Bockelman specializes in cluster technologies, system management tools, and application integration to provide highly available e-business solutions.

Summary:  How do you plan and deploy a high-availablity LDAP server with fail-over? Read about one team's approach and results.

Date:  01 Jun 2001
Level:  Introductory
Activity:  161 views
Comments:  

We wanted to create a high-availability LDAP server with known fail-over characteristics, using IBM products, that could selectively be used as a building block in a larger, more complex infrastructure. We wanted to understand the minimum time in which we could achieve fail-over, and to understand what would be the impact on applications interacting with the LDAP server during a fail-over.

This paper describes the design and implementation of the clustered environment we built. It includes:

  • A description of our overall cluster design, including how fail-over and fail-back work in this configuration
  • A description of our test configuration
  • Details on how we set up the software to implement the design
  • A summary of our test results
  • Some gotchas we discovered during the implementation and testing.

Summary of our results

Using HACMP and IBM SecureWay Directory configured with an LDAP master and an LDAP replica, we were consistently able to fail-over the IP address to the backup server in six seconds. Clients performing LDAP searches experienced total outages of 9-11 seconds during fail-over and during fail-back. The total outage seen by clients attempting to update the directory during a fail-over was 28 seconds. This could be reduced a few seconds more by adjusting the HACMP responsiveness, but this needs to be balanced against the risk of performing a takeover when the system is merely running slow.

Due to the need to re-synchronize the LDAP directories before a fail-back can be performed, the fail-back causes an outage of the LDAP master's update capability. This outage duration is for the total time taken to perform a database export, transfer to the primary server, and import. Unlike fail-over, this fail-back can be scheduled for a convenient time.


Introduction

Figure 1 shows, at the pattern level, our target solution.


Figure 1: Architecture for Replicated Directories
Figure 1: Architecture for Replicated Directories

In our environment, the LDAP directory service software was provided by IBM SecureWay Directory, which can be downloaded for free . This software provides both an LDAP server and an LDAP client. IBM SecureWay Directory uses two prerequisite products, both of which can also be downloaded for free:

  • IBM HTTP Server -- an Apache-based HTTP server, used to provide a GUI-based administration console for SecureWay Directory
  • IBM DB2 -- provided free of charge for the purposes of running SecureWay Directory, and used to provide the LDAP data storage, thereby providing a highly scalable directory infrastructure

Clustering is provided in the AIX environment by using IBM HACMP.

Our hardware consisted of two p-Series machines, with no shared components. This configuration gives the fastest possible cluster take-over times, since it avoids the lag-times required in clusters waiting for I/O to be flushed and file system definitions to be rebuilt before fail-over can be completed.


LDAP cluster design

IBM SecureWay Directory implements standard LDAP features and design principles . LDAP provides two primary scaling and availability methods as part of the standard:

  • Ability to partition a directory structure, if that is appropriate for the directory in question
  • Ability to define "masters" and their "replicas"

These methods can be used together -- that is, a partitioned directory structure may use masters and replicas. For any directory partition, you can increase availability by using the master and replica strategy.

There can only be one master server for a particular directory structure. All updates are made on the master.

One or more replicas may be defined for a master and may share the load of the master. In normal operations, the replica is a read-only copy and accepts changes only from the master. The replica can be searched by anyone running an LDAP client. If the replica receives a request to update or add an entry, it is sent a "referral," which gives the LDAP client the address of the master server. The LDAP client is then expected to re-send the change to the master.

A replica can be promoted to become a master. This is desirable, for example, during an outage where the existing master is no longer running. This is the feature that we use to build our high availability cluster. Later, the original master can take over again as master. However, at this point in time, the original master's directory will be out of date and must be synchronized with the current master. If this synchronization is not performed, changes to the current master will be lost.

Our design allows for a master LDAP server, running on what we will call the primary server in our HACMP cluster, and a single replica LDAP server, running on a backup server. The primary and backup servers are running within a single HACMP cluster. In our environment, there are no shared hardware components between the two AIX systems, that is, no shared disk.


Figure 2: HACMP Cluster with LDAP Master and Replica
Figure 2: HACMP Cluster with LDAP Master and Replica

Both LDAP servers are in active use from LDAP clients. LDAP client applications that are expected to have a high update component are configured to point to the master. LDAP client applications that are primarily read-only are pointed to the replica. From the clients' perspective, these are two separate LDAP servers; the clients are not aware that the two LDAP servers are running within a single clustered environment.

Fail-over and fail-back

When the primary server fails, the backup server takes over the IP address of the primary server, and the replica LDAP server is promoted to master. (These functions are provided by HACMP and a custom LDAP startup script, described further below.) At this point, the backup server responds to the IP addresses of both the primary and backup servers.

After the fail-over, changes are accepted on the backup server, now running as LDAP master. The LDAP clients are not aware of the change, except for a momentary outage of service when a client originally pointing to the replica sends an update, the update is now accepted, rather than referred to another LDAP server.


Figure 3: Real Configuration after Fail-Over
Figure 3: Real Configuration after Fail-Over

Now, the primary server no longer has the most up-to-date version of the LDAP data in its LDAP database. Therefore, an automatic return to the original configuration (fail-back) is undesirable, since this would cause the changes made to the backup's LDAP database to be inaccessible from the primary, and the two systems to have out-of-synch directories. An orderly fail-back should be planned.

Uses for this configuration

This configuration is useful in any of the following environments:

  • Where high availability of LDAP services is considered important
  • Where the LDAP load is sufficient that multiple LDAP servers are advantageous
  • Where some applications are expected to only read LDAP entries
  • Where some applications make few changes and are capable of accepting an LDAP redirect. These applications can be pointed to the replica for normal operations and redirect changes to the master. For example: since the SecureWay Directory client is capable of handling redirects and referrals without the knowledge of the calling application, any application that uses the SecureWay Directory client automatically satisfies the second condition

There are a few disadvantages with this scenario:

  1. In this configuration, only one unplanned fail-over can be handled.

    This could be resolved by extending the design to have a further member of the HACMP cluster that only accepts fail-over from the backup server ("backup_2"). This would make for more complex recovery actions required from the administrator.

  2. Directory synchronization is required on failback. This can be complex and should be thought through, tested and documented before a failure occurs.
  3. There is some downtime for the master during the fail-over time, and while the fail-back synchronization is occurring.
  4. There is a short period of downtime on the backup server, while the LDAP server is promoted from replica to master, and after the fail-back, demoted to replica again.

This downtime could be further hidden from the LDAP clients by placing an IP sprayer such as eNetwork Dispatcher from IBM in front of a collection of replicas. While the one replica is down, the other replicas will still be accepting searches. This configuration (which we have not tested) is shown in Figure 4.


Figure 4: HACMP Even Higher Availability Configuration
Figure 4: HACMP Even Higher Availability Configuration

While our implementation is product-specific, the design up to this point is generic: Any cluster software and any LDAP server could be implemented and would suffer from the same constraints and issues that are described in the product-specific sections of this paper.


Test configuration

For testing, we set up two pSeries servers with no shared physical components. We installed the same software on each server with the same versions and fix pack levels of AIX, IBM SecureWay Directory, IBM HTTP Server, HACMP, and DB2.

The machines were then placed in an HACMP cluster. Each server has two connections to our Ethernet network.

When in an HACMP cluster, the two servers monitor each other via a heartbeat. If one server is inaccessible, a takeover is performed. The surviving server will replace its standby address with the address of the other server, thus taking on its identity in addition to its own.

In the example shown below, if the primary server, EL19a28, becomes missing in action; EL19a29 (158.88.228.29) takes on EL19a28's IP address 158.88.228.28 to replace its standby address of 10.10.10.29. This is in addition to its own regular IP address. Thus, EL19a29 can be reached via both service IP addresses.


Figure 5: Physical Configuration of HACMP Cluster
Figure 5: Physical Configuration of HACMP Cluster

HACMP provides support for several takeover configurations--rotating, cascading, and concurrent, with multiple options within each configuration. We configured our resource takeover as cascading, because we wanted to ensure that our resource (LDAP) would consistently be started on our primary system and would fail over to our backup - and then stop until we repaired our primary. This is important when certain synchronization activities must take place, such as directory synchronization, before fail-back is allowed to occur. (Note that rotating could also work for our configuration but would require a different fail-back solution from that presented in this paper.)

We created a test rig on a third server, using LDAP Client software. This test rig consisted of two components:

  • A script that performed successive ldapadd commands, pointed to the LDAP master
  • A second script that performed repeated ldapsearch commands, pointed to the LDAP replica

These scripts were executed by multiple "users" on the same test server.

We then simulated a networking failure by yanking the communication cables out of the primary server, and we observed the results. There were no additional heartbeat paths between the two nodes, so when both Ethernet connections were severed, all heartbeat communication between the two cluster nodes ended. This caused an HACMP fail-over.

After confirming that the fail-over was successful, we synchronized the data back to the primary server and performed a "fail-back," while running the same test rig.

We collected and analyzed logs from HACMP on both servers and from the LDAP client and server processes.

Software Product Command Software Level
AIX oslevel
lslpp -l bos.rte
lslpp -l bos.up
4.3.3.0
4.3.3.50
4.3.3.50
IBM SecureWay Directorylslpp -l ldap*3.2.1.0
DB2lslpp -l | grep db2* 7.1.0.0
IBM HTTP Server/usr/HTTPServer/bin/httpd -v1.3.12.3
HACMPlslpp -l | grep cluster4.4.0.3

Software setup

This section describes:

SecureWay Directory setup

To set up IBM SecureWay Directory on AIX, you must:

  1. Install IBM SecureWay Directory.
  2. Install and configure a Web server to run the Web-based configuration tool. We installed IBM HTTP Server.
  3. Install IBM DB2 and create a DB2 database for LDAP data storage.

HTTP Server is used only for SecureWay Directory administration. After you install it, you can start the administration interface from a browser (we used Netscape Communicator) by pointing to the following web page: http://(your_hostname):80/ldap/index.html

The most common uses of the administration interface are to manipulate the slapd32.conf configuration file and to start and stop the slapd"daemon. You could dispense with the interface if you are sufficiently familiar with manipulating the slapd configuration file and with using the LDAP client tools. For simplicity, we recommend you use the administration interface for initial system setup and then move to manual manipulation for selected activities - described later.

The user and administrator do not generally see DB2; the interface is hidden by the administration panels and by the slapd"daemon. DB2 tables are automatically created based on the schemas defined to the server. An initial set of indexes is also automatically created on the tables. Some DB2 optimization facilities are provided from the LDAP administration console, without requiring knowledge of DB2. This results in reasonable DB2 performance. Once you experience a high volume of users or high workload, you might need to tune DB2 to improve overall IBM SecureWay Directory performance.

In addition, IBM SecureWay Directory is shipped with the DMT tool, which provides a full-screen LDAP client interface. This tool can be used to manage directory entries across multiple LDAP servers. During the tests, we found it useful to visually validate the directory trees across the master and replica servers and to edit the directory contents themselves. To start the DMT tool, enter "dmt" on the AIX command line.

Setting up an LDAP master and replica

To implement a master and a replica, we installed and configured SecureWay Directory software on both servers. For simplicity, the installation was identical on both servers--same userids, passwords, and file locations.

Initially, both LDAP directories considered themselves a master, but we configured them as master and replica following the steps in the IBM redbook, LDAP Implementation Cookbook, Section 7.6, as a guide. The book describes two processes: one when the database of the master server is not yet populated, and another when the replica is being created for a master that already contains data. The two processes differ in whether the data from the master is pre-loaded into the replica to give a consistent starting point.

Following these steps leads to the following actions:

  1. If the master is an existing LDAP directory, stop read-write activity on the master. To do this, make the database read only, using the administration GUI (under "Database->Settings"), and then restart the master. Then, back up the directory, copy the backup file over to the replica, and import it. You can do this via an LDIF export and import, or via a DB2 backup and restore. Both of these methods are accessed from the administration GUI, under "Database", or via the equivalent command line utilities.
  2. Configure the LDAP server that is to be a replica. This is performed under the panels "Server->Master/Replica Config". The panel requests the distinguished name of the master server, which must be specified as the same on each server (in our case, we called our master cn=el19a28m). What this configuration action actually does is add an additional stanza to the /etc/slapd32.conf configuration file on the replica server. The stanza is shown below:
    dn: cn=Master Server, cn=Configuration
    cn: Master Server
    ibm-slapdMasterDN: cn=el19a28m
    ibm-slapdMasterPW: >1d18VA9ceh6FovBb9jzZneMyEzl98zy5zaumFxJKJvR1upG2V0Pqh/
    CLLatoTRHghb+eslja9zMbflckMx6u3gUXRyDY+MUz3ecJlUzpfcVHOdiWsQ9WKiG3H5bCJUzd0Cuua9PzT9m+lJmCvdBXH+FQDFh<
    ibm-slapdMasterReferral: ldap://el19a28
    objectClass: ibm-slapdReplication

    Once the server is restarted with this revision, it restarts in replica mode.

  3. Add a definition for the replica on the master server, while the master server is running. This is done under "Replicas->Add a Replica". This panel asks for the common name (we chose cn=el19a29m), the host name (network address), and LDAP port (defaulted to 389) of the replica, and the master distinguished name and password. The master distinguished name must match the master dn that was specified when setting up the server above as a replica. For the replication update interval, we chose "immediate".

    Adding the replica does not change the configuration files. In fact, the replica definition is added into the LDAP database, with the distinguished name we defined. The following shows the LDIF extract of the entry created in our LDAP database.

    dn: cn=el19a29m, cn=localhost
    cn: el19a29m
    replicahost: el19a29.ent.sequent.com 
    replicabinddn: cn=el19a28m 
    replicacredentials: ldapssl 
    replicaport: 389
    replicabindmethod: SIMPLE 
    replicausessl: FALSE 
    replicaupdatetimeinterval: 0 
    seealso:: 
    description:: 
    objectclass: replicaObject 
    objectclass: top 
    

    An LDIF extract of the replica entry should be made of all replica definitions and kept up-to-date, so that the replicas can be redefined quickly and accurately during fail-back directory synchronization.

  4. Then, on the master, set the database back to "read/write" if required. Restart the master. It now starts propagating any updates to the defined replica.

We set up both servers to auto-start the LDAP daemon, slapd, with the default configuration for that server at boot time; i.e., the primary was auto-started with a "master" slapd32.conf, and the backup server with a "replica" slapd32.conf.

At this point, we started testing additions and deletions from our LDAP master and validated that they were propagated to our replica.

Note that when using the "Database->Import File" function from the administration console panels to add records in LDIF format after the master and replica are up and running, the changes are not forwarded to the replica until after the next restart of the master. However, the records were sent to the replica immediately when loading the same file by using the LDAP client, using the following command:

ldapadd   -D   "cn=ldapadmin"   -w   password   -f   filename.LDIF

Change log for the SecureWay Directory

IBM SecureWay Directory provides an additional mechanism to capture changes that are made to the directory. These changes are kept in a separate change log, which can be queried by the LDAP client search tools and can be extracted into LDIF format.

The change log is a separate DB2 database, defined and enabled via the administration GUI, under "Logs-> change log."

You use this capability to ensure that, if the master directory goes down without having fully propagated all changes to the replica, the changes are still captured and can be applied to bring the new master up-to-date.

If the rate of change of the directory is high and you are concerned that no directory updates should be lost during a failure, you should seriously consider using the change log. Extracting and applying change log entries must then be integrated into the LDAP recovery scenarios.

HACMP setup

Conceptually, setting up HACMP is straightforward. You:

  1. Install the software.
  2. Configure the HACMP environment.

However, this ignores a whole host of messy details!

An HACMP cluster is defined via a hierarchy of linked definitions, which provide for a wide variety of capabilities and configurations. The following diagram represents the HACMP definitions we used for our cluster. Each box in the diagram represents an HACMP definition and does not necessarily map to any physical component. For example, there were only two Ethernet adapters on each server, but three HACMP definitions, as shown below. We set up all of these definitions by using the smitty hacmp panels.


Figure 6: HACMP Definitions for Our Cluster
Figure 6: HACMP Definitions for Our Cluster

The cluster is the highest level entry within HACMP and provides an anchor point for all the cluster resource definitions. The cluster topology is comprised of the following components: the cluster definition, the cluster nodes, the network adapters, and the network modules. In our case, there is one cluster defined: WES1.

Within the cluster, each server is defined as a node. In our case, there are two nodes defined: el19a28 and el19a29. Note that the node names are logically sorted in their ASCII order within HACMP in order to decide which nodes are considered to be neighbors for heartbeat purposes. In order to build a logical ring, a node always talks to its up- and downstream neighbor in their node name's ASCII order. The uppermost and the lowest node are also considered neighbors. This becomes important in unexpected ways. See the "Gotchas" section.

Each of the network adapters used within the cluster must be defined and linked to the node name via the function it will perform. When each server is booted, it comes up with its boot IP address on one adapter and the standby address on the other. When the cluster software is started, the cluster software performs initialization,and then takes on its service address to replace its boot address. After a fail-over, the standby address will be replaced on the surviving server by the service address of the failing server.

There are six adapters defined here: boot_28, srvc_28, and stby_28 on EL19a28, and boot_29, srvc_29, and stby_29 on EL19a29. The following diagram shows the "boot_28" definition. The "adapter function" field setting defines to HACMP which function (service, standby or boot) this adapter and IP address should be used for. It also links the adapter to the node name. All the IP addresses used by all the adapters must also have been defined in /etc/hosts; the service addresses are defined to DNS.


Figure 7: HACMP: Change/Show an Adapter
Figure 7: HACMP: Change/Show an Adapter

Each application running within the cluster is considered a resource (along with volume groups, disks, file systems, service IP addresses, etc). Each resource is part of a resource group.

A resource group is a combination of related resources that need to be together to provide a particular service to the world outside the cluster. The resource group also defines the actions to be taken by the cluster management software when it detects a heartbeat failure. A resource group also defines the list of nodes that can acquire those resources and serve them to the cluster's clients.

There are two resource groups defined here: ldap28 (which will define our LDAP master) and ldap29 (which will define our LDAP replica). Our resource groups are both defined as cascading and define the nodes included in the cluster. In our case, there are no shared file systems or volume groups, thereby allowing for the fastest possible recovery with no danger of corrupted file systems.


Figure 8: HACMP Change/Show a Resource Group
Figure 8: HACMP Change/Show a Resource Group

A resource group is associated with a service adapter or network address by which it provides service (no matter which server in the cluster currently "owns" that address). In our case, the resource will be the LDAP master, and the address applications use to access the master is the IP address of srvc_28. A resource group is assigned resources and attributes, such as how the shared file systems should be recovered and checked for consistency.


Figure 9: HACMP Change/Show an Application Server
Figure 9: HACMP Change/Show an Application Server

Next, the resource group identifies the application servers associated with this resource group.


Figure 10: Resource Group Definition, Part 1
Figure 10: Resource Group Definition, Part 1

Figure 11: Resource Group Definition, Part 2
Figure 11: Resource Group Definition, Part 2

The application server defines the actions that must be taken to manage orderly transition of the application from one server to the next. These are essentially scripts written by your friendly sysprog to perform any appropriate actions, such as starting daemons, checking data integrity, or notifying Operations, when a failure of a cluster node occurs. Each application server defines a start script and a stop script:

  • start_script - Script to start the application server when a node joins the cluster
  • stop_script - Script to stop the application server when a node leaves cluster gracefully

There are two application servers defined in our configuration: hacmp_ldap28 for the ldap28 resource group, and hacmp_ldap29 for the ldap29 resource group. Each has a start and stop script associated with it; predictably, we called these hacmp_ldap_stop and hacmp_ldap_start.

Another important component that drives the responsiveness of the cluster in fail-over situations is the HACMP heartbeat. It is defined in the "Topology and Group Services Configuration." The interval between heartbeats is defined in seconds. The fibrillate count, specified in seconds, specifies the number of successive heartbeats that can be missed before the interface is considered to have failed. The fibrillate count and the heartbeat interval determine how soon a failure can be detected. The time needed to detect a failure can be calculated using the formula:

(heartbeat interval) * (fibrillate count) * 2 seconds


Figure 12: HACMP Heartbeat Configuration
Figure 12: HACMP Heartbeat Configuration

Adjusting the fibrillate count down could speed up the fail-over times we experienced. However, it also opens the cluster further to the possibility of performing a take-over when the other server is merely a little slow in responding.

Note that whenever the HACMP definitions are changed, HACMP must be synchronized before the cluster can successfully be initialized. This option can be found under "Cluster Configuration->Cluster Topology."

Once all definitions have been set up and synchronized, the cluster software can be started on both servers using a smitty fast-path, "smitty clstart." The cluster software will not start successfully until the start and stop scripts exist and are executable. Similarly, "smitty clstop" can be used to stop the cluster software.

To test if HACMP is running, use "ps -ef | grep cluster". There should be two processes running. In addition, use "netstat -in" to check the IP addresses. Note that, after the cluster management software has started, there is often a delay of some time (less than a minute, but long enough to make us panic) before the IP boot addresses change to the service configuration.

Setting up the HACMP/LDAP configuration

This is where the real action finally happens! Since the primary server, EL19a28, will only be in the HACMP cluster when the slapd daemon is running as a master, no specific action is required on this server. All recovery for the slapd daemon is done manually. Therefore, the start and stop scripts on EL19a28 take no action.

The start and stop scripts on the backup server, EL19a29, however, have some work to do. When a takeover occurs, the start_script on EL19a29 must stop the LDAP daemon and restart it in master mode. Here, this is done by having two .conf files:

  • One for normal (i.e. replica) operations, which is used during the automatic startup of the slapd daemon
  • The second, with the master configuration (i.e., without the replica definition described above). This master configuration file is created manually by editing the .conf file and removing the replica definition, and is then saved in a convenient directory under a new name, such as /etc/slapd32.conf.master.

The hacmp_ldap_start script on EL19a29 kills the slapd daemon currently running as a replica, and starts it in master mode by using the master configuration file.

So El19a29: /home/ldap/hacmp_ldap_start contains the following:

echo "`date` Starting slapd startup as Master"  >> /tmp/hacmp.out    
PID=`cat /etc/slapd.pid`                                   
kill -9 $PID                                               
/bin/slapd -f /etc/slapd32.conf.master                            
echo "`date` Completed slapd startup as Master" >> /tmp/hacmp.out    

The hacmp_ldap_stop script is executed when the HACMP cluster management software on EL19a29 detects a fail-back. When a fail-back occurs, the reverse actions are taken: the slapd daemon is killed, and slapd is restarted with the replica configuration file.

EL19a29: /home/ldap/hacmp_ldap_stop contains the following:

echo "`date` Starting slapd on el19a29 startup as Replica"  >> /tmp/hacmp.out 
PID=`cat /etc/slapd.pid`                                                      
kill -9 $PID                                                                  
/bin/slapd -f /etc/slapd32.conf                                      
echo "`date` Completed slapd on el19a29 startup as Replica" >> /tmp/hacmp.out 

To leave this as a long-term production environment, we would want to add a method into each of these files that would notify the system administrator that a fail-over had occurred. The administrator should then review the current set-up and the occurrence and start planning recovery actions.

Fail-back

To perform fail-back, follow these general steps:

  • First, identify what failed and what repairs are required.
  • Re-IPL the primary, but do not start the cluster software.
  • Synchronize the data in the two LDAP directories.
  • Restart the primary's cluster software, thus causing the backup server to fail back the primary's IP address.

Each of these steps is now described in more detail.

  1. Identify what happened. If a hardware or software failure has occurred, repairs may be required. More insidious are communications failures (see "Gotchas" below). Make any required repairs and recovery, short of bringing the primary server back into the cluster. Choose a suitable time, such as scheduled maintenance or application downtime, to perform the remaining recovery steps.

  2. Re-IPL the primary server, but do not start the cluster management software. If the primary server is brought into the cluster at this point, it will take over its service address again and will begin accepting LDAP requests. It will, however, still have its pre-repair directory and will not have been synchronized.

    The primary server will come up using its boot IP address. This will allow access via telnet, ftp, and other protocols, using the boot IP address (since the service address is still in use by the backup server).

  3. At a planned fail-back time, synchronize the data in the two directories.

    If the change log described above was implemented, this is the time to extract the collected entries from the primary server's directory and apply them to the backup server's LDAP directory, thus capturing any changes not propagated just prior to the fail-over.

    Then, back up the LDAP directory on the backup server and restore it on the primary server. This will bring the two directories back into synchronization. There are two SecureWay Directory-supplied backup and restore options:

    • LDIF export followed by LDIF import (or bulkload, which performs the same function but is faster for large directory sizes). LDIF export can be performed while the LDAP daemon is running. However, the directory should be made read only before this process begins and should remain that way until the fail-back process is complete (see "Changing the database to read only"). This is so that additional changes are not made to the directory, which are then lost during the fail-back process.

      LDIF export creates one file, which can be copied to the primary server for importing. Note, however, that the LDIF import does not delete records that should no longer exist. Therefore, the master directory must be emptied before this import is performed. This can be achieved by choosing "Database->Configure" in the administration GUI on the primary server. This gives the option to delete the existing database and create a new one. Create a new database, then perform the LDIF import.

    • DB2 backup and DB2 restore. The DB2 backup must be performed without the LDAP server running. This greatly increases the visible downtime, since LDAP searches cannot be performed against the directory during this time. Once the backup has completed, the LDAP daemon on the backup server can be restarted but should be changed to read only mode while the remainder of the restore is performed, so that the directories do not immediately become unsynchronized again.

      The DB2 backup creates a number of files (by default in /var/ldap/backup) and creates a copy of the slapd32.conf file in the same directory. All files in this directory should be copied onto the primary server, in binary format. Then the DB2 restore can then be performed. When the restore is performed via the administration GUI panels, be sure to select the option that restores the data only, not the configuration files. This prevents the slapd32.conf file from the replica from over-writing the master's configuration file.

    Both of these can be performed under the administration GUI "Database" option. An additional option is to use native DB2 backup and restore facilities. We did not fully explore this option.

    In all cases, the replica definitions will not exist in this newly created or over-written directory, and therefore the replicas must be redefined. If an LDIF extract of the replica entry was made after replica definition, as described above, the file can be imported via the LDIF import, or via the LDAP client command, ldapadd.

  4. Restart the master LDAP server to ensure that it recognizes any configuration or replica settings. Now, start the cluster software on the primary server, using "smitty clstart". This will bring the primary server back into the HACMP cluster, and will cause it to take over its original service address from the backup server.

    This event will cause the hacmp_ldap_stop script described above to run, which will stop the LDAP daemon on the backup server and restart it in replica mode. Note, however, that the master .conf file on the backup server was changed to read only while synchronizing the databases. Change this back in preparation for the next fail-over.

We're now back to square one and ready for the next failure!

Changing the database to read only

The SecureWay Directory can be changed to read only via one of two methods:

  1. Using the administration GUI, go to "Database->Settings->Permissions", and choose "Read Only". Note that the "Permissions" option appears on the administration panels only when this server is configured as a master. The server will need to be restarted for this to take effect; this can also be performed via the administration panels ("Current State->Start/Stop").

    Note that the administration panels point to the default conf file name and location, /etc/slapd32.conf. This method can only be used if the master, when running on the backup, is using this configuration file. If so, however, the administration panels cannot be used to manage the replica as a replica, since the replica configuration file cannot also be /etc/slapd32.conf. So, this method is not recommended.

  2. The other option is to manually change the .conf file. Change the "ibm-slapdReadOnly" entry in the "dn: cn=Directory, …" entry from FALSE to TRUE:
    dn: cn=Directory, cn=RDBM Backends, cn=IBM SecureWay, cn=Schemas, cn=Configuration
    cn: Directory
    …
    # ibm-slapdReadOnly: FALSE
    ibm-slapdReadOnly: TRUE
    … 
    

    As with the first method the server must then be restarted. This can be performed manually using the following command:

    /bin/slapd -f /etc/slapd32.conf.master

    During the time that the LDAP master is read-only, any request to perform an update will be rejected, and a message returned to the requesting client:

    ldap_add: dsa is unwilling to perform


Test results

The following table summarizes the various downtime duration, as captured from the HACMP logs and from the logs of our LDAP client test rigs. These duration were consistent across all runs performed. While these duration are representative and the tasks involved will not change, the duration experienced in other server configurations may vary substantially from those documented here.

Activity: Start & End Event Seconds
Take Over
Time taken for node_down to be discovered & confirmed; i.e., from "rsct,connect" to "EVENT START: node_down" 2-3
Time taken from "EVENT_START" to completion of IP address takeover 6
Time taken from completion of IP address takeover to completion of hacmp_ldap28 script execution 2
Time taken to restart the backup server's LDAP daemon as master 9
Total outage time as seen from ldap_search -h=el19a28 28
Total outage time as seen from ldap_search -h=el19a29 9


Fail Back
Time taken from EL19a29's completion of "release takeover address" to completion of EL19a28's "acquire service address" 48
Time taken from "EVENT START: node up" to "EVENT COMPLETED: release_takeover_addr srvc_28" 14
Time taken to restart the backup server's LDAP daemon as replica 9
Total outage time as seen from ldap_search -h=el19a28 31
Total outage time as seen from ldap_search -h=el19a29 11
Total time that LDAP update function is unavailable Time to backup, copy & restore database, plus fail-back time

Notes

  • IP address takeover on a fail-over was completed 6 seconds after the primary server was identified as "down." At this point, requests sent to the primary IP address were in fact being received and handled by the backup server.
  • Restart of the LDAP daemon to switch between master and replica modes took 9 seconds. This outage is seen both by requests to the master, and requests to what was the replica. This outage occurs during both the fail-over and during the fail-back.
  • During fail-back, the outage time seen by the LDAP client for the primary's IP address is much longer than that taken for the fail-over.
  • During fail-back, update functions are unavailable to the LDAP client for the duration of the backup, copy and restore of the LDAP database, in addition to the actual HACMP fail-back. Especially as the LDAP database increases in size, it is imperative that this outage time is appropriately scheduled.
  • It is not clear why the outage seen by the client is, in the case of fail-back, shorter than what appears to be the IP takeover outage. Our best guess is that the IP address is already configured and responding while HACMP is still checking completion of associated processes. This does not appear to affect fail-over, where the outage seen by the client is more or less in line with the outages reported by HACMP logs.
  • We attempted to identify the difference between arrival times of the updates between the master and replica. Any differences that occurred were less than our ability to measure, either from within the two servers, or from LDAP clients residing on a separate machine.
  • In our testing, we were also unable to create a scenario during normal operations where changes were not propagated to the replica. If we were able to raise our update rate sufficiently high, we could create a situation where the master becomes backlogged and cannot push the updates through sufficiently quickly.
  • No scenarios were found where it was impossible to recover changes. However, in some cases ensuring all changes are captured does increase administrative overhead and complexity; for example, implementing the change log and extracting and applying change log updates as part of the fail-back.
  • The downtime for the replica search function could be further hidden from the LDAP clients by placing an IP sprayer such as IBM's eNetwork Dispatcher in front of a collection of replicas, including the replica that will perform the takeover. This replica would be identified by Network Dispatcher as down and would receive no new traffic during the duration of the daemon restart. The promotion of the replica to master would also need to create replica definitions for the remainder of the replicas, and the demotion to replica would require equivalent clean-up actions.

Gotchas

During our testing, we stumbled over some "gotchas". Some of these were obvious in retrospect; however, they were rather less obvious in advance!

  1. If the configurations are used as described in this document, there are three copies of the slapd32.conf configuration file:
    • Master configuration file on the primary machine
    • Replica configuration file on the backup machine
    • Master configuration file on the backup machine

    These three versions of the configuration file must be kept synchronized manually. Generally, the difference between the master and the replica configurations on the backup machine would only be the addition of the "dn: cn=Master Server, cn=Configuration" stanza in the replica conf file. If items such as the userids and location of the DB2 subsystem, the LDAP administration password and so on were kept the same, the master conf files could be kept the same on the primary and backup servers. Depending on the installation, it might be possible to maintain the three files by copying over the master to the backup server, and then adding a single stanza to the master conf file to create the replica conf file.

  2. The IBM SecureWay Directory console assumes that the configuration file is '/etc/slapd32.conf', and will report and update the entries contained in that configuration file, even if this is not the desired configuration file. Using the console, the fail-over mechanism can still be implemented, but must be performed manually; that is, the administrator can manually choose the option 'Promote replica to master', and restart the server. This, obviously, does not allow automated fail-over.

    In general, if the configuration file in use is not /etc/slapd32.conf, the administration GUI for SecureWay Directory should not be used.

  3. When fail-over occurs, depending on the nature of the failure, the primary server may be left in a variety of states. If, as in our case, the failure is a network communications failure, each of the members of the HACMP cluster may believe that they are the surviving member, and therefore take over the other's service address. While the failure is still in effect, only one server is accessible from the network, and so there is no conflict. The problems begin when network connectivity is re-established for the "missing" cluster member.

    The easiest and most secure way to bring the primary server back online without disrupting operations is via the method described in the remainder of this paper.

    However, if the cables are just plugged back in again, the primary connects to the network once again with its original IP address, and still retains the backup's service IP address as well. There were now two servers on the network with the primary IP address, and two servers with the backup IP address. Very shortly thereafter, in our case, the backup server shut itself down, and powered itself off.

    This happens because a "DGSP" (Diagnostic Group Shutdown Partition) message is sent when a node loses communication with the cluster and then tries to re-establish communication. It is intended to ensure that, if disks and IP addresses are in the process of being taken over, the cluster does not become corrupted. There are rules governing which server will be shut down; in our case, with two machines in the cluster, the server with the higher alphabetic node name (our backup server, EL19a29) will be shut down.

    Given that we have a cascading cluster, we probably want our backup server to in fact have the lower alphabetic node name, while defining our cluster so that the highest priority server (the first one in the cascading cluster definition) has the higher alphabetic node name…

    Here are the messages we received on EL19a29 when we plugged the cables back in again.

    Aug 16 16:06:20 el19a29 topsvcs[10050]: rsct,connect.C,         1.56,1555           
    Aug 16 16:06:21 el19a29 grpsvcs[10542]: RSCT,NS.C,        1.107,4021                
    Aug 16 16:06:24 el19a29 clstrmgr[16800]: Thu Aug 16 16:06:24 announcementCb: 
                 GS announcment code=512; exiting
    Aug 16 16:06:24 el19a29 clstrmgr[16800]: Thu Aug 16 16:06:24 handleQuit: Called
    Aug 16 16:06:24 el19a29 clstrmgr[16800]: Thu Aug 16 16:06:24 clstrmgr on node 2 
                 is exiting with code 0
    Aug 16 16:06:24 el19a29 grpglsm[16270]: RSCT,hagsglam.C,           1.33,1724          
    Aug 16 16:06:24 el19a29 topsvcs[10050]: rsct,connect.C,           1.56,1475           
    Aug 16 16:06:24 el19a29 haemd[16540]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.30,L#=1348,                       
                  haemd: 2521-032 Cannot dispatch group services (1).
    Aug 16 16:06:24 el19a29 clsmuxpd[12046]: clRGInfoGetRGHandle() failed, error: : The system
                  call does not exist on this system.
    Aug 16 16:06:24 el19a29 clsmuxpd[12046]: Error from ha_em_receive_response():  EMAPI error number
     10 EMAPI error message 2521-649 An attempt to receive a command response was unsuccessful; read()
    detected end-of-file; connection with Event Manager lost. : The system call does not exist on
    this system. Aug 16 16:06:24 el19a29 clsmuxpd[12046]: Event Manager API Disconnected:: The system call does not
    exist on this system. Aug 16 16:06:24 el19a29 HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES. Aug 16 16:06:25 el19a29 HACMP for AIX: clexit.rc : Halting system immediately!!!

    While on El19a28, our primary server, the cluster.log shows the following messages.

    Aug 16 16:04:44 el19a28 HACMP for AIX: EVENT START: network_up el19a28 ether1 
    Aug 16 16:04:44 el19a28 HACMP for AIX: EVENT COMPLETED: network_up el19a28 ether1 
    Aug 16 16:04:44 el19a28 HACMP for AIX: EVENT START: network_up_complete el19a28 ether1 
    Aug 16 16:04:44 el19a28 HACMP for AIX: EVENT COMPLETED: network_up_complete el19a28 ether1 
    Aug 16 16:04:51 el19a28 HACMP for AIX: EVENT START: join_standby el19a28 10.10.10.28 
    Aug 16 16:04:51 el19a28 HACMP for AIX: EVENT COMPLETED: join_standby el19a28 10.10.10.28
    Aug 16 16:06:03 el19a28 HACMP for AIX: 
    EVENT START: swap_adapter el19a28 ether1 10.10.10.28 158.88.229.28 
    Aug 16 16:06:11 el19a28 snmpd[16034]: NOTICE: SMUX packet from (127.0.0.1+32770+1)
    Aug 16 16:06:11 el19a28 snmpd[16034]: NOTICE: SMUX trap: (6 74) (127.0.0.1+32770+1)
    Aug 16 16:06:20 el19a28 topsvcs[15752]: rsct,connect.C,           1.56,1555           
    Aug 16 16:06:22 el19a28 HACMP for AIX: EVENT START: swap_aconn_protocols en1 en0 
    Aug 16 16:06:22 el19a28 HACMP for AIX: EVENT COMPLETED: swap_aconn_protocols en1 en0 
    Aug 16 16:06:22 el19a28 HACMP for AIX: 
    EVENT COMPLETED: swap_adapter el19a28 ether1 10.10.10.28 158.88.229.28 
    Aug 16 16:06:23 el19a28 HACMP for AIX: 
    EVENT START: swap_adapter_complete el19a28 ether1 10.10.10.28 158.88.229.28 
    Aug 16 16:06:23 el19a28 HACMP for AIX: 
    EVENT COMPLETED: swap_adapter_complete el19a28 ether1 10.10.10.28 158.88.229.28 
    Aug 16 16:06:25 el19a28 topsvcs[15752]: rsct,connect.C,           1.56,1475
    …
    

    This is obviously a matter of operational concern; it is very conceivable that someone could trip over the cables causing the system to be disconnected, and then, noticing this, could plug them back in again. This will generally take long enough that the cluster has failed over (allowing some updates to occur), and when the primary server is plugged in again, will then cause the backup to shut down, as described above. Now, the backup server is down, but has additional changes; the primary server is now up again.

    What happens now is dependent on exactly what happened during failover. Here, again, the change log capability described in "Change log for the SecureWay Directory" could be used to capture the changes that have occurred on both systems, and apply them to the current "master" directory to create a single, up-to-date directory with no missing changes. This master can then be used to synchronize the other directories.

  4. Initially, at HACMP cluster management start-up time, the cluster software would not reliably take over the service address, but would remain on the boot address. This occurred when we had modified the start and stop scripts pointed to by the cluster software, and they were no longer executable or did not exist. Once we realized this, repaired the scripts and resynchronized, this problem did not recur.

Appendix: HACMP log contents during fail-over and fail-back

There are two cluster-manager specific logs that we found useful:

  • /usr/es/adm/cluster.log
  • /tmp/hacmp.log

Hacmp.log contains extensive messages about every action that the cluster management software takes, including the parameters used. Cluster.log provides a higher level summary, but also sometimes provides error messages not found in hacmp.log. The messages shown in this paper are all quoted from cluster.log.

The following is an example of the messages in the cluster.log of the backup server, EL19a29, during a fail-over. Notice that it is easy to identify the high level events that occur, including when the "application servers" are started and finish.

Aug 17 16:27:29 el19a29 topsvcs[22054]: rsct,connect.C,           1.56,1555           
Aug 17 16:28:07 el19a29 HACMP for AIX: EVENT START: node_up el19a28 
Aug 17 16:28:08 el19a29 HACMP for AIX: EVENT START: node_up_remote el19a28 
Aug 17 16:28:08 el19a29 HACMP for AIX: EVENT START: stop_server hacmp_ldap28 
Aug 17 16:28:15 el19a29 HACMP for AIX: EVENT COMPLETED: stop_server hacmp_ldap28 
Aug 17 16:28:15 el19a29 HACMP for AIX: EVENT START: release_vg_fs 
Aug 17 16:28:16 el19a29 HACMP for AIX: EVENT COMPLETED: release_vg_fs 
Aug 17 16:28:16 el19a29 HACMP for AIX: EVENT START: release_takeover_addr srvc_28 
Aug 17 16:28:21 el19a29 HACMP for AIX: EVENT COMPLETED: release_takeover_addr srvc_28 
Aug 17 16:28:21 el19a29 HACMP for AIX: EVENT COMPLETED: node_up_remote el19a28 
Aug 17 16:28:21 el19a29 HACMP for AIX: EVENT COMPLETED: node_up el19a28 
Aug 17 16:29:14 el19a29 HACMP for AIX: EVENT START: node_up_complete el19a28 
Aug 17 16:29:14 el19a29 HACMP for AIX: EVENT START: node_up_remote_complete el19a28 
Aug 17 16:29:15 el19a29 HACMP for AIX: EVENT COMPLETED: node_up_remote_complete el19a28 
Aug 17 16:29:15 el19a29 HACMP for AIX: EVENT COMPLETED: node_up_complete el19a28

The following is an example of the log messages from the backup server during a fail-back.

Aug 17 16:27:29 el19a29 topsvcs[22054]: rsct,connect.C,           1.56,1555           
Aug 17 16:28:07 el19a29 HACMP for AIX: EVENT START: node_up el19a28 
Aug 17 16:28:08 el19a29 HACMP for AIX: EVENT START: node_up_remote el19a28 
Aug 17 16:28:08 el19a29 HACMP for AIX: EVENT START: stop_server hacmp_ldap28 
Aug 17 16:28:15 el19a29 HACMP for AIX: EVENT COMPLETED: stop_server hacmp_ldap28 
Aug 17 16:28:15 el19a29 HACMP for AIX: EVENT START: release_vg_fs 
Aug 17 16:28:16 el19a29 HACMP for AIX: EVENT COMPLETED: release_vg_fs 
Aug 17 16:28:16 el19a29 HACMP for AIX: EVENT START: release_takeover_addr srvc_28 
Aug 17 16:28:21 el19a29 HACMP for AIX: EVENT COMPLETED: release_takeover_addr srvc_28 
Aug 17 16:28:21 el19a29 HACMP for AIX: EVENT COMPLETED: node_up_remote el19a28 
Aug 17 16:28:21 el19a29 HACMP for AIX: EVENT COMPLETED: node_up el19a28 
Aug 17 16:29:14 el19a29 HACMP for AIX: EVENT START: node_up_complete el19a28 
Aug 17 16:29:14 el19a29 HACMP for AIX: EVENT START: node_up_remote_complete el19a28 
Aug 17 16:29:15 el19a29 HACMP for AIX: EVENT COMPLETED: node_up_remote_complete el19a28 
Aug 17 16:29:15 el19a29 HACMP for AIX: EVENT COMPLETED: node_up_complete el19a28

And, on the primary server, El19a28, that is taking back control, the following messages are seen.

Aug 17 16:27:23 el19a28 topsvcs[20712]: rsct,bootstrp.C,          1.135,3265          
Aug 17 16:27:25 el19a28 grpsvcs[21552]: RSCT,pgsd.C,           1.36, 518              
Aug 17 16:27:27 el19a28 grpglsm[21948]: RSCT,hagsglam.C,           1.33,1468          
Aug 17 16:27:38 el19a28 clstrmgr[22484]: Fri Aug 17 16:27:38 HACMP/ES Cluster Manager Started
AIX Cluster SNMP Multiplexing Peer Daemon (clsmuxpd)" (11/ 127.0.0.1+32773+1)
Aug 17 16:28:22 el19a28 HACMP for AIX: EVENT START: node_up el19a28 
Aug 17 16:28:23 el19a28 HACMP for AIX: EVENT START: node_up_local 
Aug 17 16:28:24 el19a28 HACMP for AIX: EVENT START: acquire_service_addr srvc_28 
Aug 17 16:28:31 el19a28 HACMP for AIX: EVENT START: acquire_aconn_service en0 ether1 
Aug 17 16:28:31 el19a28 HACMP for AIX: EVENT START: swap_aconn_protocols en0 en1 
Aug 17 16:28:32 el19a28 HACMP for AIX: EVENT COMPLETED: swap_aconn_protocols en0 en1 
Aug 17 16:28:32 el19a28 HACMP for AIX: EVENT COMPLETED: acquire_aconn_service en0 ether1 
Aug 17 16:29:12 el19a28 HACMP for AIX: EVENT COMPLETED: acquire_service_addr srvc_28 
Aug 17 16:29:12 el19a28 HACMP for AIX: EVENT START: get_disk_vg_fs 
Aug 17 16:29:12 el19a28 HACMP for AIX: EVENT COMPLETED: get_disk_vg_fs 
Aug 17 16:29:12 el19a28 HACMP for AIX: EVENT COMPLETED: node_up_local 
Aug 17 16:29:13 el19a28 HACMP for AIX: EVENT START: node_up_local 
Aug 17 16:29:13 el19a28 HACMP for AIX: EVENT START: get_disk_vg_fs 
Aug 17 16:29:13 el19a28 HACMP for AIX: EVENT COMPLETED: get_disk_vg_fs 
Aug 17 16:29:13 el19a28 HACMP for AIX: EVENT COMPLETED: node_up_local 
Aug 17 16:29:13 el19a28 HACMP for AIX: EVENT COMPLETED: node_up el19a28 
Aug 17 16:29:13 el19a28 HACMP for AIX: EVENT START: node_up_complete el19a28 
Aug 17 16:29:14 el19a28 HACMP for AIX: EVENT START: node_up_local_complete 
Aug 17 16:29:14 el19a28 HACMP for AIX: EVENT START: start_server hacmp_ldap28 
Aug 17 16:29:14 el19a28 HACMP for AIX: EVENT COMPLETED: start_server hacmp_ldap28 
Aug 17 16:29:15 el19a28 HACMP for AIX: EVENT COMPLETED: node_up_local_complete 
Aug 17 16:29:15 el19a28 HACMP for AIX: EVENT START: node_up_local_complete 
Aug 17 16:29:15 el19a28 HACMP for AIX: EVENT COMPLETED: node_up_local_complete 
Aug 17 16:29:15 el19a28 HACMP for AIX: EVENT COMPLETED: node_up_complete el19a28



About the authors

Veronika Megler is a Certified Consulting IT Architect in IBM's Solutions Integration Technology Center, pSeries Solutions Development organization. She has worked in operations, application development, systems programming, systems management disciplines, and IT management consulting. She enjoys architecting solutions to solve real business problems, then proving they work in practice. You can contact her at vmegler@us.ibm.com.

Jay Bockelman specializes in cluster technologies, system management tools, and application integration to provide highly available e-business solutions.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=87630
ArticleTitle=A highly available and scalable LDAP cluster in an AIX environment
publish-date=06012001
author1-email=vmegler@us.ibm.com
author1-email-cc=
author2-email=dwinfo@us.ibm.com
author2-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers