4040 Replication suspended and a full resynchronization is required for one or more volume groups.
Explanation
Replication between the production and the recovery system is suspended. The suspended state occurs when errors exist in the replication configuration or replication is stopped intentionally. A full resynchronization of one or more volume groups is required.
User response
This event is logged when a system that uses policy-based replication is recovered after a severe system outage. After recovery, the system suspends replication to maintain the existing recovery point of replicated volume groups and prevent potential data loss from being replicated.
When replication is restarted after a system outage, a full resynchronization of the copies is required. The resynchronization process temporarily uses more capacity to maintain the current recovery point until a full resynchronization completes. Replication must be restarted for each volume group individually. You can choose to limit the number of volume groups that are resynchronizing at one time to manage the additional capacity required during the synchronization. Before you restart resynchronization, prioritize the order of the volume groups, monitor system capacity, and provide additional capacity, if necessary.
lsvolumegroupreplication <volumegroup_id/name>
local_location 1
location1_system_name system1
location1_replication_mode production
location1_volumegroup_id 1
location1_status suspended
In this case, volume group 1 is reporting the suspended status on the production system.
If the recovery copy is suspended, the lsvolumegroupreplication command displays the following results:
local_location 2
location2_system_name system2
location2_replication_mode recovery
location2_volumegroup_id 5
location2_status suspended
In this case, volume group 5 on the recovery system is suspended.
Depending on the system that displays the suspended status, different user actions are required. Use the following information to resynchronize data:
chvolumegroupreplication -unsuspend <volumegroup_id/name>
This starts a resynchronization of the specified volume group. If you have multiple volume groups that need resynchronization, you can choose to limit the number of volume groups that are resynchronizing at one time to manage the additional capacity required during the synchronization.
If the production copy is suspended, it indicates either a planned or unplanned outage on the production location. Depending on when the outage occurred and state of the data, you might have data loss if the outage is greater than of your Recovery Point Objective (RPO).
chvolumegroupreplication -unsuspend <volumegroup_id/name>
If the data is not consistent, complete the following step:
For volume groups where replication is managed using external software:
If the replication for a volume group is managed by external orchestration software, such as VMware Site Recovery Manager (SRM), use the appropriate workflow in that application to failover to the recovery copy.
chvolumegroupreplication -unsuspended <volume group ID | name>
Use the appropriate application workflow to restart replication.
After the data is resynchronized on the original production system, you can change the direction of the replication back to the original configuration by using the appropriate application workflow.
chvolumegroupreplication -mode independent <volumegroup_id/name>
This command fails over to the recovery system and the recovery volume groups. Hosts are able to access the volumes while the volume group is in independent mode.
If you are satisfied with the data, run the following command on the production system:
chvolumegroupreplication -unsuspended <volume group ID | name>
Run the following command on the recovery system to restart replication using this copy as the production copy:
chvolumegroupreplication -mode production <volume group ID | name>
chvolumegroupreplication -mode independent <volume group ID | name>
chvolumegroupreplication -mode production <volume group ID | name>