Approaches for solving problems with Tivoli Directory Server synchronization
How to fix inconsistent data issues in Tivoli Directory Servers
IBM® Tivoli® Directory Server(TDS) is a powerful and authoritative enterprise directory infrastructure that is a critical enabler for enterprise security.
It plays a key role in building the enterprise identity data infrastructure for applications such as identity management, portals, and Web services.
TDS acts as a data repository that enables users or applications to find resources that have the characteristics needed for a particular task. For example, TDS can be leveraged by products for authentication operations to verify a user's identity.
Thus, TDS becomes a critical IT infrastructure component and implemented as a cluster for high availability and performance. Any errors in its implementation can have a very serious impact to a live production environment. Various errors can result in data inconsistency between the directory server cluster members. This can cause directory entries to be synchronized among TDS cluster members.
An example symptom of synchronization problem would be where a user experiences inconsistent authentication behavior because their log-in credentials are different across different TDS cluster members. This article will highlight some of the strategies that can be used to deal with directory synchronization problems.
Tivoli Directory Server environments for high availability
A highly available environment will have a minimum of two servers: primary and secondary. If anything happens to the primary server, the secondary server will takeover
and perform operations seamlessly without any external intervention.
Tivoli Directory Server is implemented as a cluster in a variety of replication topologies for high availability and performance reasons. Two typical scenarios are outlined below
Replication Scenario 1: Peer-Peer replication topology(master-master)
Tivoli Directory Server cluster is implemented with peer to peer replication topology. A load balancer is typically configured to load balance between directory server instances for both read and write requests. There can be several servers acting as masters for directory information. Peer master servers replicate all client updates to the other peer masters, but do not replicate updates received from other master servers. Peer replication can improve performance, availability, and reliability. Performance is improved by providing a local server to handle updates in a widely distributed network. Availability and reliability are improved by providing a backup master server ready to take over immediately if the primary master fails.
A load balancer, such as IBM WebSphere® Edge Server, has a virtual host name that applications use when sending updates to the directory. The load balancer is configured to send those updates to either server.
This topology is mainly to perform load balancing when the frequency of both read and writes are too high for a single server.
Figure 1.Peer-Peer replication topology.
Replication Scenario 2: Master-replica replication topology
TDS servers are implemented with a master-replica topology when the volume of write requests are low and high availability for write operations is not a major concern. A load balancer sends write request to the master server and read request to any of the servers.
The master server can contain a directory or a subtree of a directory. The master is writable, which means it can receive updates from clients for a given subtree. The replica server contains a copy of the directory or a copy of part of the directory of the master server. The replica is read only; it cannot be directly updated by clients. Instead, it refers client requests to the master server, which performs the updates and then replicates them to the replica server. A master server can have several replicas.
Figure 2. Master-replica topology.
It is a good practice to configure the environment to route write updates to only one server and read requests to both of the servers. If the first server becomes unavailable, then write requests should go to the replica server. When the primary server is available again, write requests should be routed back to the primary server
Common replication problems
Replication is a technique used by directory servers to improve performance, availability, and reliability. The replication process keeps data in multiple directory servers synchronized.
Data inconsistency in directory servers arises mainly due to replication issues. Keeping the directory servers synchronized requires a diligent approach, including monitoring and maintenance. Any negligence in directory administration can result in significant differences in directory data across the cluster.
In a clustered environment of TDS, data needs to be kept consistent among clustered members. Any changes done to master server needs to be propagated to peer or replica servers through replication technique.
Due to various reasons, sometimes changes done to one server may not get replicated to other servers. This result in data inconsistency among them.
Data inconsistency among TDS cluster members can occur when:
1) One server contains entries that do not exist on another TDS cluster member.
2) Entry exists on both server but their attributes are different.
Causes and resolution of directory server synchronization problems
In a multi master replication environment, the same entry can be modified by multiple servers.
For example, if one server received a modify request while second server receives rename request for the same entry, then a potential race condition can occur.
Updates to the same entry made by multiple servers might cause inconsistencies in directory data because conflict resolution is based on the timestamp of the entries. An entry may get an update at one server while it is getting updated on the other server. When the update arrives at the replica, it may have a later timestamp than the replicated entry. To resolve replication conflicts, a regular database entry which has a later timestamp is not replaced by a replicated entry which has an earlier timestamp and the replication fails. The replication queue will be blocked and other replication requests will not be processed. The queue keeps increasing resulting in the inconsistency in the data.
To disable conflict resolution on the server based on timestamp, the IBMSLAPD_REPL_NO_CONFLICT_RESOLUTION environment variable can be defined. If the variable is defined before a server is started, the server operates in a no replication conflict resolution mode. In this mode, the server does not try to compare entries timestamps for replicated entries in an attempt to resolve conflicts between the entries. This environment variable is checked during server startup and, therefore, changing it while server is running will not have any effect to the server.
Figure 3.Setting the parameter in ibmslapd.conf
When a replication conflict is detected, the replaced entry is archived for recovery purposes in the lost and found log. In such cases, manual intervention is necessary to keep the data in consistent state. System administrators need to periodically check the lost and found logs and if any entries are present they have to update it manually on the server.Lost and found log can be found in logs default directory.
Setting up a load balancer is one method of resolving data conflict resolution. The load balancer is configured to send those updates to only one server. If that server is down or unavailable because of a network failure, the load balancer sends the updates to the next available peer server until the first server is back on line and available.
If the replication queue gets stuck, the blocking entry can be removed through Web admin. tool.
Web Administration Tool.
Figure 4.Skipping blocking entry
Sometimes a replication queue can become too large and contain too many blocking entries. It becomes a very tedious task to remove each blocking entry manually. In that case, a replication error table can be used.
Configure the value of ibm-slapdReplMaxErrors parameter through either Web admin. tool or in the directory server configuration file.
Whenever any entry gets blocked in the replication queue, it will be logged into ibmslapd.log file and it will be skipped from the queue to let other entries proceed further. It will keep skipping the blocking entries until it reaches the value defined by ibm-slapdReplMaxErrors.
Restart the server for configuration changes to get effected.
Analyze the log file for the blocking entries.
Figure 5.Setting ibm-slapdReplMaxErrors parameter through Web admin. tool.
It is recommended to monitor the replication queue through monitoring tools or check periodically using following command.
idsldapsearch -h hostname -p port -b "o=ibm,c=us" -s "sub" \ "objectclass=ibm-replicationAgreement" ibm-replicationState
In some cases, Bulkload or idsldif2db may not complete successfully when loading data from one TDS to other TDS server. When running idsbulkload, inspect the output messages carefully. If errors occur during execution, the directory might be incompletely populated. You may need to drop all the LDAP tables, or drop the database (recreate an empty database), and start over. If this happens, no data was added to the directory, and the idsbulkload must be attempted again.
Try to minimize human errors, and avoid any changes related to configuration and replication without confirmation of expected results. First test any changes in QA and then replicate those changes in production.
To synchronize TDS cluster members and bring them back in consistent state, the following approaches are adapted.
1) Importing the data from one TDS server using idsdb2ldif and exporting it to other server using idsldif2db or bulkload.
2) Using idsldapdiff utility.
3) Using Tivoli Directory Integrator between directory server with ldap connector in update mode.
to TDI product documentation for more information.
The db2ldif and ldif2db tools are used to synchronize directory entries when entries are present on one server but not on the replica.
The idsldapdiff utility and Tivoli Directory Integrator can be used to synchronize entries that exist on the both servers but have different attributes. They can also sync data in cases where entries are present on one TDS cluster member and not on a replica customer member.
The idsldapdiff command line utility is designed to compare two directory subtrees on two different directory
servers to determine if their contents match. It identifies differences in a replica server and its master
and can be used to synchronize replicas.
The tool traverses each entry in the directory subtree on the supplier server and compares its contents with the corresponding entry on the consumer server. Because information about each entry needs to be read, running the utility can take a long time and can generate lots of read requests to the supplier and consumer servers. Depending on how many differences are found and whether the fix operation is specified, the utility can also generate an equal amount of write requests to the consumer server.
Idsldapdiff performs two passes to make the servers are in sync. In the first pass, idsldapdiff traverses the Supplier server and does the following: Adds any extra entries on the supplier and to the consumer. Compares and fixes entries that exist on both the servers. In the second pass, idsldapdiff traverses the Consumer to check for any extra entries on the Consumer
Requirement For running the utility
Run the utility when no updates are being made to either of the directory servers.
The administrator needs to suspend all update activity to the two subtrees being compared.
This must be done manually before invoking the compare tool.
If the tool is run while updates are being made, it cannot be guaranteed that all discrepancies are accurately reported or fixed.
Use the tool with the server administration control (a flag) if the fix operation is requested. The server administration control allows the tool to write to a read-only replica and it also allows it to modify operational attributes such as ibm-entryUuid.
The idsldapdiff utility can be used to bring a master and replica server in sync before starting replication. The tool requires that the base DN which is being compared exists on both servers. If the base DN does not exist on either of the two servers, the utility gives an error and exits.
Ideally the tool is used only once between servers when replication is initially setup. For example, if the topology has two peer masters and two replica servers you might want to run idsldapdiff between peer 1 and peer 2. Then, if replication is suspended, run idsldapdiff concurrently between peer 1 and replica 1 and between peer 2 and replica 2.
If replication is set up correctly, every change to the directory on the master servers is propagated to the replicas. However if a problem occurs, the tool can be run to identify and correct replication problems.
This utility is a diagnostic and corrective tool it is not designed to run as routine maintenance. Depending on the replication-related errors observed in the log files, an administrator might decide to run the utility.
idsldapdiff -sh hostname sp 389 -sD cn=root -sw password -ch consumerhostname -cp 389 \ -cD cn=root -cw password -b o=ibm,c=us -a -F
-sh Specifies the host name.
-sp Specify an alternate TCP port other than default where the LDAP server is listening.
-sD Use dn to bind to the LDAP directory. dn is a string-represented DN
-sw supplier password
-ch Specifies the host name.
-cD Use dn to bind to the LDAP directory. dn is a string-represented DN
-cw consumer password
-b Use searchbase as the starting point for the search.
-a Specifies inclusion of server administration control for writing to a
-F This is the fix option. If specified, content on the consumer replica is modified
to match the content of the supplier server.
This cannot be used if the -S is also specified.
-L filename. If the -F option is not specified, use this option to generate an
LDIF file for output.
-C countnumber Counts the number of non-matching entries.
-S Specifies to compare the schema on both of the servers
-O Displays DNs only for non-matching entries
This article described the various solutions that can be adopted while solving problems related to data synchronization. It can be used when TDS servers goes live in production and data synchronization issues starts building up.
- Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.