Monitoring and diagnosing advanced replication problems

An LDAP administrator can monitor the state of advanced replication processing and troubleshoot problems by using LDAP search requests to retrieve operational attributes available for the roots of the replication contexts (entries with an objectclass of ibm-replicationContext) and replication agreements (entries with an objectclass of ibm-replicationAgreement). Because these are operational attributes, either the + attribute or each individual attribute must be requested on a search request in order to be returned. Also, operational attributes cannot be used in search filters.

The following tables describe the operational attributes for the replication context and replication agreement entries. Replication context entries use the auxiliary objectclass of ibm-replicationContext and replication agreement entries use the structural objectclass ibm-replicationAgreement. See Table 1 for the operational attributes for the ibm-replicationContext objectclass. See Table 2 for the operational attributes for the ibm-replicationAgreement objectclass.

When retrieved for a replication context or replication agreement entry, the operational attributes provide information concerning that entry. It is important to take notice of attributes that have values that contain failureId or changeId values. The failureId and changeId numbers increase sequentially. However, some numbers might be skipped by the server for various reasons. For example, if Db2® is restarted while the server is running, the changeId might skip numbers. These IDs are often required when working with the Control replication error log and the Control replication queue extended operations with the ldapexop utility. See ldapexop utility for more information about the ldapexop utility.

Table 1. ibm-replicationContext operational attributes
Attribute and description
ibm-replicationThisSeverIsMaster

A boolean (true or false) indicating whether the server is the master of the replication context. If set to true, the server is the master of the replication context. If set to false, the server is a not the master of the replication context.

ibm-replicationIsQuiesced

A boolean (true or false) indicating whether the replication context is quiesced. If set to true, the replication context is quiesced. If set to false, the replication context is not quiesced.

Updates under a quiesced replication context are restricted to an LDAP root administrator if using the Server Administration control (OID 1.3.18.0.2.10.15), and any replication master DNs with authority under this context. Advanced replication continues for a quiesced context. If the server is restarted, all replication contexts are then unquiesced.

See Table 1 for the optional non-operational attribute for the ibm-replicationContext objectclass.

Table 2. ibm-replicationAgreement operational attributes
Attribute and description
ibm-replicationChangeLDIF

The LDIF representation of the next pending change that has not yet been replicated and has resulted in advanced replication being stalled to the consumer server. If there is not a stalled replication change, the value is N/A.

Examples of when an advanced replication queue might be stalled include:
  1. A replication change failed because of an LDAP_TIMEOUT return code.
  2. The backend replication table has reached the maximum number of errors allowed on the supplier server within this backend while attempting to replicate a change to a consumer server. See Table 2 for more information about the ibm-slapdReplMaxErrors attribute value.
ibm-replicationFailedChangeCount

Specifies the number of advanced replication operations that have failed in this replication agreement. This number is shared among all replication agreement entries on the backend level by the ibm-slapdReplMaxErrors attribute in the CDBM backend configuration entry cn=Replication, cn=Configuration. See Table 2 for more information about the ibm-slapdReplMaxErrors attribute value.

ibm-replicationFailedChanges

A multi-valued attribute that lists all the logged replication operations that have failed. The number of attribute values is shared among all replication agreement entries on the backend level by the ibm-slapdReplMaxErrors attribute in the CDBM backend configuration entry cn=Replication, cn=Configuration. See Table 2 for more information about the ibm-slapdReplMaxErrors attribute value.

A string value of the form: failureId timestamp returnCode numOfAttempts changeId operation entryDn

The failureId identifies the update that has failed to replicate to the consumer server. The failureId is used with the Control replication error log extended operation to display, delete, or retry the failing replication update. The ldapexop utility supports the Control replication error log extended operation. See ldapexop utility for more information about the ldapexop utility.

The timestamp is the time in Zulu format when this operation was last attempted to be replicated to the consumer server.

The returnCode is the LDAP return code from the consumer server.

The numOfAttempts is the number of times the error has been tried again on the consumer server.

The changeId is the ID that this failureId had when it was in the pending replication queue.

The operation indicates the update operation that encountered the failure. It has one of the following values: add, delete, modify, or modifydn

The entryDn indicates the distinguished name of the entry that caused the failure.

Example:
ibm-replicationfailedchanges: 1 20050407202221Z 68 1 170814 add cn=entry-85,o=IBM,c=US

failureId: 1
timestamp: 20050407202221Z
returnCode:  68
numOfAttempts:  1
changeId: 170814
operation:  add
entryDn: cn=entry-85,o=IBM,c=US
ibm-replicationLastActivationTime

Specifies the Zulu format timestamp when advanced replication actively began replicating queued updates.

ibm-replicationLastChangeID

Specifies the replication change ID of the last successfully completed advanced replication update.

ibm-replicationLastFinishTime

Specifies the Zulu format timestamp when advanced replication updates in the queue were all attempted and the server awaits a new scheduled start time or more operations to appear in the advanced replication queue. See Schedule entries for more information about replication schedule entries.

ibm-replicationLastResult

A description of the result from the last advanced replication operation or connection attempt to a consumer server.

A string value of the form: timestamp changeId returnCode operation entryDn

The timestamp is the time in Zulu format when this operation was last attempted to be replicated to the consumer server.

The changeId is the ID of the last replication update.

The returnCode is the LDAP return code from the consumer server.

The operation indicates the last LDAP operation. It has one of the following values: add, connect, delete, modify, or modifydn

The entryDn indicates the distinguished name of the entry that was last added, deleted, modified, or renamed. If operation is connect, entryDn is set to NULL.

Example:
ibm-replicationLastResult: 20050412140436Z 19 81 add cn=testpendingchange,o=ibm,c=us

timestamp: 20050412140436Z
changeId: 19
returnCode: 81
operation: add
entryDn: cn=testpendingchange,o=ibm,c=us
ibm-replicationLastResultAdditional

The descriptive reason code message text that supplements the return code message with the purpose of providing additional information from the last replication attempt.

ibm-replicationNextTime

Specifies the Zulu format timestamp of the next time advanced replication would begin if pending changes existed. When this value is set to 19000101000000z, replication begins immediately when a change is ready to be replicated if the ibm-replicationState operational attribute is set to active.

ibm-replicationPendingChangeCount

The number of replication operations that are waiting to be replicated to a consumer server.

ibm-replicationPendingChanges

A multi-valued attribute that lists all changes waiting to be replication to a consumer server.

A string value of the form: changeId operation entryDn

The changeId is the ID of the pending replication update.

The operation indicates the LDAP operation that is pending. It has one of the following values: add, delete, modify, or modifydn

The entryDn indicates the distinguished name of the entry that is to be added, deleted, modified, or renamed.

Example:
ibm-replicationpendingchanges: 19 add cn=test1,o=ibm,c=us

changeId: 19
operation: add
entryDn: cn=test1,o=ibm,c=us
ibm-replicationState
Identifies the current state of the advanced replication queue. It has one of the following values:
  • active - Indicates that advanced replication is occurring from this replication agreement.
  • binding - Indicates that the replication agreement is in the process of authenticating with the consumer server.
  • connecting - Indicates that the replication agreement is attempting to contact the consumer server.
  • on hold - Indicates that the replication agreement is on hold. Replication updates to the consumer server are queued until the replication agreement is resumed.
  • ready - Indicates immediate replication mode, ready to send updates as they occur.
  • retrying - Indicates that the server retries the current change every 60 seconds until it succeeds. The retrying state occurs when a consumer server is restarted, the replication backend table is full, the current replicated update is failing, or when there is an LDAP_TIMEOUT return code from the consumer server. Retrying is a likely symptom that advanced replication might be stalled and LDAP administrator intervention is required to get it running again. See Recovering from advanced replication errors for the steps on how to recover from out of sync conditions between supplier and consumer servers.
  • suspended - Indicates that the replication agreement is suspended. No additional replication updates are sent to the consumer server by this agreement (until it returns to the ready state).
  • waiting - Indicates that the replication agreement is currently waiting for the next scheduled replication to occur. See Schedule entries for more information about replication schedule entries.

See Table 4 for the required non-operational attributes for the ibm-replicationAgreement objectclass. See Table 5 for the optional non-operational attributes for the ibm-replicationAgreement objectclass.

Recovering from advanced replication errors

Replication errors can be handled proactively, before they are allowed to accumulate, or reactively, after replication has already stalled. Replication stalls occur when the number of failures reaches the limit as specified by the ibm-slapdReplMaxErrors attribute value in the cn=Replication,cn=configuration entry. See Table 2 for more information about the cn=Replication,cn=configuration entry.

When replication is stalled, the latest failed change occupies the beginning of the pending changes queue. The latest failed change gets retried every minute until it succeeds or the failed change is removed from the queue by an LDAP administrator with the appropriate authority. See Administrative group and roles for more information about administrative role authority. When this failed change occupies the lead position in the pending replication queue, all other replication updates are blocked and replication is stalled.

The options for handling stalled replication are:
  1. Increase the size of the ibm-slapdReplMaxErrors attribute in the cn=Replication,cn=configuration entry. This allows more replication failures to be stored in the backend where the replication agreement entry exists.
  2. Delete or retry one or more failed replication changes.
  3. Skip the latest failed replication change.
  4. If the stalled replication problem is severe enough, the entire replication context where the replication agreement entry exists might need to be resynchronized. In order to do this, you must:
    1. Quiesce the replication context
    2. Suspend replication for all replication agreements
    3. Delete all failed replication changes for all replication agreements
    4. Skip all pending changes for all replication agreements
    5. Resynchronize the replication context
    6. Resume replication for the suspended replication agreements
    7. Unquiesce the replication context
The following operational attributes in the replication agreement entry can be queried to determine what to do:
  1. The ibm-replicationChangeLdif operational attribute in the replication agreement entry shows the LDIF representation of the latest failure. The ibm-replicationLastResult and ibm-replicationLastResultAdditional operational attributes in the replication agreement have further detail for the reason the change failed.
  2. The ibm-replicationPendingChanges operational attribute in the replication agreement shows the change ID, the operation type, and the target DN of the next changes to be replicated. The number of pending changes that are displayed is limited by the ibm-slapdMaxPendingChangesDisplayed attribute in the cn=Replication,cn=configuration entry. See Table 2 for more information about the ibm-slapdMaxPendingChangesDisplayed attribute. See Table 2 for more information about the ibm-replicationPendingChanges operational attribute.
  3. The ibm-replicationFailedChanges operational attribute in the replication agreement shows each of the failed changes, including the failure ID. See Table 2 for more information about the ibm-replicationFailedChanges operational attribute.
  4. The Control replication error log extended operation can be used to display information about a failure by providing the failureId obtained from the ibm-replicationFailedChanges operational attribute. The controlreplerr extended operation -show option in the ldapexop utility can be used to display the latest failure. See ldapexop utility for more information about the ldapexop utility.
When the latest and all previous failures are understood, an LDAP administrator must decide whether to fix the replication failures individually or resynchronize the entire replication context. The options are:
  1. Increase the size of the ibm-slapdReplMaxErrors attribute in the cn=Replication,cn=configuration entry. This allows more replication failures to be stored in the backend where the replication agreement entry exists. See Table 2 for more information about the ibm-slapdReplMaxErrors attribute.
  2. Delete or retry one or more failed changes for the replication agreement by using the Control replication error log extended operation with the ldapexop utility. The -retry option on the controlreplerr extended operation in the ldapexop utility allows a single failure (identified by its failureId) to be retried or all failures to be retried. The ability to retry all failures is especially useful when you have corrected the problem that caused a change to fail the first time. When a failed change is retried successfully, it is removed from the list of failed changes and there is space for a new one. The -delete option on the controlrepler extended operation in the ldapexop utility allows a single failure (identified by its failureId) to be deleted or all failures to be deleted. This delete option is especially useful when a change is deemed to be unnecessary, the problem has been fixed manually, or a synchronization tool such as the ldapdiff utility has been used to resynchronize the directories. Deleting a failed change frees space in the list of failed changes so that a new failure can be added. See ldapexop utility for more information about the ldapexop utility. See ldapdiff utility for more information about the ldapdiff utility.
  3. Skip the latest failure for the replication agreement by using the Control replication queue extended operation. The ldapexop utility supports the Control replication queue extended operation that allows the next pending change (identified by its changeId) or all pending changes to be skipped. This extended operation is useful when the ibm-slapdReplMaxErrors attribute in the cn=Replication,cn=configuration entry is set to 0 in which case the replication failure is not allowed and replication stalls on the first failure. Also, the Control replication queue extended operation is useful when replication failures are not deleted, the ibm-slapdReplMaxErrors attribute value is increased, or after using the ldapdiff utility to resynchronize the replication context. See ldapexop utility for more information about the ldapexop utility.
  4. If there are multiple failed and pending replication changes, the entire replication context where the replication agreement entry exists might need to be resynchronized. In order to do this, you must:
    1. Quiesce the replication context on all servers in the replication topology by using the Cascading control replication extended operation on the ldapexop utility. The Cascading control replication extended operation is targeted against the master server which in turn quiesces the replication context on all consumer servers. A quiesced replication context only accepts updates from an LDAP root administrator when using the Server Administration control and any replication master server DNs with authority under this context. See Cascading control replication for more information about the Cascading control replication extended operation. See ldapexop utility for more information about the ldapexop utility.
    2. Suspend replication for all replication agreements in the replication context by using the Control replication extended operation on the ldapexop utility. A suspended replication agreement queues all replication changes updates until it is resumed. See Control replication for more information about the Control replication extended operation.
    3. Use the ldapdiff utility with the -L option to compare the replication contexts on each of the servers within the replication context. The -L option allows the entry differences to be written to an output LDIF file. See ldapdiff utility for more information about the ldapdiff utility.
    4. Delete all failed replication changes for all replication agreements by using the Control replication error log extended operation on the ldapexop utility. See Control replication error log for more information about the Control replication error log extended operation.
    5. Skip all pending replication changes by using the Control replication queue extended operation on the ldapexop utility. See Control replication queue for more information about the Control replication queue extended operation.
    6. Resynchronize the replication context by using the fix option on the ldapdiff utility.
    7. Resume replication for all suspended replication agreements by using the Control replication extended operation on the ldapexop utility.
    8. Unquiesce the replication context on all servers in the replication topology by using the Cascading control replication extended operation on the ldapexop utility.

The other methodology for handling replication failures is to take a proactive, preventive approach. An LDAP administrator monitors the replication failure queue and resolves problems before the queue reaches capacity and replication stalls. An LDAP administrator with the appropriate authority can use the Control replication error log extended operation and the ibm-replicationFailedChanges and ibm-replicationState operational attributes in the replication agreement entry to monitor the current replication status. See Administrative group and roles for information about administrative authority.

Advanced replication error recovery example

This advanced replication error recovery example uses the master-replica topology that has been configured in Creating a master-replica topology. This example assumes the ibm-slapdReplMaxErrors attribute value in the cn=Replication,cn=configuration entry is set to one.

An LDAP administrator periodically monitors the replication status of the replication agreement in the o=ibm,c=us replication context by querying the replication agreement operational attribute values. See Table 2 for more information about the replication agreement operational attributes.
Note: Operational attributes are only returned on search requests when either the + attribute is specified or each operational attribute is requested.
The current replication status from the master to the replica can be determined by using the following ldapsearch command to retrieve the replication agreement entry. See z/OS IBM Tivoli Directory Server Client Programming for z/OS for more information about the ldapsearch utility.
ldapsearch -p 389 –h server1.us.ibm.com -D adminDn -w adminPw -b o=ibm,c=us
 "(objectclass=ibm-replicationAgreement)" "*" ibm-replicationChangeLdif 
 ibm-replicationFailedChangeCount ibm-replicationFailedChanges ibm-replicationLastActivationTime
 ibm-replicationLastChangeID ibm-replicationLastFinishTime ibm-replicationLastResult
 ibm-replicationLastResultAdditional ibm-replicationNextTime ibm-replicationPendingChangeCount
 ibm-replicationPendingChanges ibm-replicationState
The ldapsearch command returns the following entry:
cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, o=ibm, c=us
objectclass=top
objectclass=ibm-replicationAgreement
ibm-replicaconsumerid=Replica
ibm-replicaurl=ldap://server1.us.ibm.com:389
ibm-replicacredentialsdn=cn=ReplicaBindCredentials,o=ibm, c=us
description=Replication agreement from master to replica
cn=Replica
ibm-replicationonhold=FALSE
ibm-replicationstate=retrying
ibm-replicationpendingchanges=46 modify OU=SUB,O=IBM,C=US
ibm-replicationpendingchangecount=1
ibm-replicationnexttime=19000101000000
ibm-replicationlastresultadditional=R004071 DN 'OU=SUB,O=IBM,C=US' does not exist 
 (ldbm_process_request:406)
ibm-replicationlastresult=20090206145054Z 46 32 modify OU=SUB,O=IBM,C=US
ibm-replicationlastfinishtime=20090206144954Z
ibm-replicationlastchangeid=45
ibm-replicationlastactivationtime=20090206144354Z
ibm-replicationfailedchanges=12 20090206144954Z 32 1 45 add cn=entry,ou=sub,o=ibm,c=us
ibm-replicationfailedchangecount=1
ibm-replicationchangeldif=
dn: ou=sub,o=ibm,c=us
control: 2.16.840.1.113730.3.4.2 true
control: 1.3.18.0.2.10.19 false:: MIGPMCAKAQIwGwQNbW9kaWZpZXJzTmFtZTEKBAhjbj1
 hZG1pbjAwCgECMCsED21vZGlmeVRpbWVzdGFtcDEYBBYyMDA5MDIwNjE0NDg1My41ODM4MjVaMDk
 KAQIwNAQYUmVwbGljYXRpb25CYXNlVGltZXN0YW1wMRgEFjIwMDkwMjA2MTM0ODQ2Ljc4Njg4NFo
 =
changetype: modify
add: description
description: A small division
The following analysis of the replication agreement entry can be performed:
  1. The ibm-replicationState operational attribute value is set to retrying which indicates replication is currently stalled. Replication is stalled because the number of replication failures exceeds one. (The ibm-slapdMaxReplErrors attribute value has been set to one in the cn=Replication,cn=configuration entry).
  2. The ibm-replicationChangeLdif operational attribute in the replication agreement shows the LDIF representation of the latest failure. The LDIF shows that the last failure is a modify of the ou=sub,o=ibm,c=us entry on the consumer server. The ibm-replicationLastResult and ibm-replicationLastResultAdditional operational attributes in the replication agreement indicate that the modify failed on the consumer server because the ou=sub,o=ibm,c=us entry does not exist.
  3. The ibm-replicationPendingChanges operational attribute in the replication agreement shows the changeId of the next pending update is 46. The next pending change is also the same modify operation of the ou=sub,o=ibm,c=us entry. It will be replicated to the consumer server after the add failure in the ibm-replicationFailedChanges operational attribute is resolved.
  4. The ibm-replicationFailedChanges operational attribute in the replication agreement shows one failed replication update. The attribute value indicates that the failureId is 12, the LDAP return code from the consumer server is 32, it is an add operation of the cn=entry,ou=sub,o=ibm,c=us entry, and the supplier server has tried once to replicate the update.

To determine why the addition of the cn=entry,ou=sub,o=ibm,c=us entry failed, the ldapexop utility can be used to perform a Control replication error log extended operation to show the failed replication change. See ldapexop utility for more information about the ldapexop utility.

The following ldapexop command can be used to show the LDIF representation of failed replication change that has a failureId of 12.
ldapexop -p 389 –h server1.us.ibm.com -D adminDn -w adminPw -op controlreplerr 
 -ra "cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, 
 o=ibm,c=us" -show 12
The ldapexop command returns the following:
dn: cn=entry,ou=sub,o=ibm,c=us
control: 2.16.840.1.113730.3.4.2 true
control: 1.3.18.0.2.10.19 false:: MIGnMDAKAQAwKwQPbW9kaWZ5dGltZXN0YW1wMRgEFjI
   wMDkwMjA2MTQ0MzU0LjY1NzcwMFowIAoBADAbBA1tb2RpZmllcnNuYW1lMQoECGNuPWFkbWluMDA
   KAQAwKwQPY3JlYXRldGltZXN0YW1wMRgEFjIwMDkwMjA2MTQ0MzU0LjY1NzcwMFowHwoBADAaBAx
   jcmVhdG9yc25hbWUxCgQIY249YWRtaW4=
changetype: add
cn: entry
ibm-entryuuid: A091A000-4CAA-198C-8D7D-402084027431
sn: entry
objectclass: person
objectclass: top

An LDAP administrator can either fix the replication differences manually or use the ldapdiff utility to resynchronize the replication contexts on all servers in the replication topology. The ldapdiff utility is a useful tool for comparing and verifying that the entries within a replication context on supplier and consumer server are synchronized. For the purposes of this example, an LDAP administrator has chosen to resynchronize the replication context by using the ldapdiff utility. See ldapdiff utility for more information about the ldapdiff utility.

Before you use the ldapdiff utility to compare or fix entries within a replication context, quiesce the replication context on all servers within the replication topology by using the Cascading control replication extended operation quiesce option on the ldapexop utility. See ldapexop utility for more information about the ldapexop utility.

The following ldapexop command quiesces the o=ibm,c=us replication context on the master and replica server in the replication topology:
ldapexop –p 389 –h server1.us.ibm.com –D adminDn –w adminPw –op cascrepl –action 
 quiesce –rc “o=ibm,c=us”

After the replication context is quiesced on all servers, the Control replication extended operation can be used to suspend replication for all replication agreements within the replication context.

The following ldapexop command suspends replication for all replication agreements in the replication context o=ibm,c=us. The cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, o=ibm, c=us is the only replication agreement within the o=ibm,c=us replication context so that it is the only agreement that is suspended.
ldapexop –p 389 –h server1.us.ibm.com –D adminDn –w adminPw –op controlrepl 
 –action suspend –rc “o=ibm,c=us” 
The following ldapdiff command is run to compare the entries within the replication context o=ibm,c=us on the master server on server1.us.ibm.com and the replica server on server2.us.ibm.com. If there are any differences between the two servers, they are written to the output LDIF file called differences.ldif. The ldapdiff -a option is specified to write the Server Administration control to the output LDIF for each entry that is different between the two servers. See Server Administration for more information about the Server Administration control.
ldapdiff –a -b “o=ibm,c=us” -L differences.ldif -sh server1.us.ibm.com -sp 389 
 -sD adminDn -sw adminPw
 -ch server2.us.ibm.com -cp 389 -cD adminDn –cw adminPw
where differences.ldif contains:
dn: ou=sub,o=ibm,c=us
control: 1.3.18.0.2.10.15 true
control: 1.3.18.0.2.10.19 false::
 MIGnMB8KAQAwGgQMY3JlYXRvcnNOYW1lMQoECGNuPWFkbWluMDAKAQAwKwQP
 Y3JlYXRlVGltZVN0YW1wMRgEFjIwMDkwMjA2MTM0ODQ2Ljc4Njg4NFowIAoB
 ADAbBA1tb2RpZmllcnNOYW1lMQoECGNuPWFkbWluMDAKAQAwKwQPbW9kaWZ5
 VGltZVN0YW1wMRgEFjIwMDkwMjA2MTQ0ODUzLjU4MzgyNVo=
changeType: add
ibm-entryuuid: C01B9000-3FBE-198C-98A7-402084027431
ou: sub
description: A small division
objectclass: organizationalUnit
objectclass: top

dn: cn=entry,ou=sub,o=ibm,c=us
control: 1.3.18.0.2.10.15 true
control: 1.3.18.0.2.10.19 false::
 MIGnMB8KAQAwGgQMY3JlYXRvcnNOYW1lMQoECGNuPWFkbWluMDAKAQAwKwQP
 Y3JlYXRlVGltZVN0YW1wMRgEFjIwMDkwMjA2MTQ0MzU0LjY1NzcwMFowIAoB
 ADAbBA1tb2RpZmllcnNOYW1lMQoECGNuPWFkbWluMDAKAQAwKwQPbW9kaWZ5
 VGltZVN0YW1wMRgEFjIwMDkwMjA2MTQ0MzU0LjY1NzcwMFo=
changeType: add
ibm-entryuuid: A091A000-4CAA-198C-8D7D-402084027431
objectclass: person
objectclass: top
sn: entry
cn: entry

The contents of the differences.ldif file indicates that the ou=sub,o=ibm,c=us entry does not exist on the consumer server. This explains why the addition of the child entry cn=entry,ou=sub,o=ibm,c=us failed on the consumer server.

Before synchronizing entries within a replication context on the master and replica servers, all replication failures are deleted and all pending replication changes are skipped. Replication failures are deleted by using the Control replication error log extended operation on the ldapexop utility. Pending replication changes are skipped by using the Control replication queue extended operation on the ldapexop utility.

The following ldapexop command deletes all failed replication failures from the backend replication table, cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, o=ibm, c=us:
ldapexop -p 389 –h server1.us.ibm.com -D adminDn -w adminPw -op controlreplerr
 -delete all -ra "cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, 
 o=ibm, c=us" 
The following ldapexop command skips (deletes) all pending replication changes from the replication queue:
ldapexop -p 389 –h server1.us.ibm.com -D adminDn -w adminPw -op controlqueue 
 –skip all -ra "cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, 
 o=ibm, c=us"

To synchronize the o=ibm,c=us replication context on the master and replica servers, run the ldapdiff utility again with the -F (Fix) option specified or use the ldapmodify command to add the entries in the differences.ldif file to the consumer server.

Because the master and replica servers are now synchronized, the replication agreement can now be resumed and the replication context unquiesced. The replication agreement is resumed by using the Control replication extended operation on the ldapexop utility. The replication context is unquiesced on all servers in the replication topology by using the Cascading control replication extended operation on the ldapexop utility.

The following ldapexop command resumes replication for the replication agreement, cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, o=ibm, c=us:
ldapexop –p 389 –h server1.us.ibm.com –D adminDn –w adminPw –op controlrepl 
 –action resume –ra “cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, 
 o=ibm, c=us” 
The following ldapexop command unquiesces the replication context o=ibm,c=us on all servers in the replication topology:
ldapexop –p 389 –h server1.us.ibm.com –D adminDn –w adminPw –op cascrepl 
 –action unquiesce –rc “o=ibm,c=us”
The current replication status from the master to the replica can be determined by using the following ldapsearch command to retrieve the replication agreement entry:
ldapsearch -p 389 –h server1.us.ibm.com -D adminDn -w adminPw -b “o=ibm,c=us”
 “(objectclass=ibm-replicationAgreement)” "*" ibm-replicationChangeLdif 
 ibm-replicationFailedChangeCount ibm-replicationFailedChanges ibm-replicationLastActivationTime
 ibm-replicationLastChangeID ibm-replicationLastFinishTime ibm-replicationLastResult
 ibm-replicationLastResultAdditional ibm-replicationNextTime ibm-replicationPendingChangeCount
 ibm-replicationPendingChanges ibm-replicationState
The ldapsearch command returns the following entry:
cn=Replica, ibm-replicaServerId=Master, ibm-replicaGroup=default, o=ibm, c=us
objectclass=top
objectclass=ibm-replicationAgreement
ibm-replicaconsumerid=Replica
ibm-replicaurl=ldap://server2.us.ibm.com:389
ibm-replicacredentialsdn=cn=ReplicaBindCredentials,o=ibm, c=us
description=Replication agreement from master to replica
cn=Replica
ibm-replicationonhold=FALSE
ibm-replicationstate=ready
ibm-replicationpendingchangecount=0
ibm-replicationnexttime=19000101000000
ibm-replicationlastfinishtime=20090206165454Z
ibm-replicationlastchangeid=46
ibm-replicationlastactivationtime=20090206144354Z
ibm-replicationfailedchangecount=0
ibm-replicationchangeldif=N/A

Because the ibm-replicationState operational attribute value in the replication agreement entry is set to ready, replication from the master to the replica is now no longer stalled.