Recovering from lost or damaged FILE volumes in deduplicated storage pool.

Troubleshooting

Problem

Recovering from a lost or damaged FILE volume in a Tivoli Storage Manager deduplicated storage pool.

Environment

All supported V6/V7 Tivoli Storage Manager server environments using deduplicated file storage pools. This document does not apply to 7.1.3.000+ deduplicated directory-container pools.

Resolving The Problem

Removing a deduplicated volume can potentially affect other deduplicated volumes and introduce problems into the deduplication engine.

If a primary file volume associated with a deduplicated file storage pool has been lost from the underlying operating system, or otherwise damaged in some way such that it needs to be removed from the Tivoli Storage Manager inventory, the below procedure should be followed. This procedure ensures that any data that can be recovered from a copy storage pool or a replication target server is completed prior to removing the volume. It also outlines how to identify if deleting the volume has introduced issues into the deduplication engine that will require further assistance from IBM support.

If you are not using copy storage pools or replication to protect your primary deduplicated file storage pool, then contact IBM support to investigate other clean-up options.

The high-level summary of the process is as follows:

Part 1: Recover as much of data on the damaged/lost volume(s) as possible.
Part 2: Remove any remaining irrecoverable data from the damaged/lost volume(s).
Part 3: Recover any data that was referencing irrecoverable data that was removed.
Part 4: Remove any remaining data that cannot be recovered.

Part 1: Recovering data using a copy storage pool (all supported levels) and/or a node replication target (7.1.1.0+ only):

1. For any file volume that no longer exists and is not mountable (DESTROYED), update the missing volume(s) to DESTROYED status (for example, the file volume was unexpectedly removed from the underlying filesystem):

IMPORTANT:

UPDATE VOLUME

<volume name>

ACCESS=DESTROYED

2. For any file volume that still exists and is mountable (READONLY), but may contain damaged objects, update the volume(s) to read-only and audit them (for example, the file volume still exists but some objects on that volume cannot be accessed during read operations):

IMPORTANT:

UPDATE VOLUME

<volume name>

ACCESS=READONLY

AUDIT VOLUME <volume name>

3. Manually initiate client backups, or wait for all normally scheduled clients to run a complete backup cycle, in an attempt to recover any damaged data that may still exist on the client filesystems.

4. Regardless of whether a copy storage pool exists or not, issue the RESTORE STGPOOL command against the storage pool containing the damaged or lost volumes (and wait for the process to end).

RESTORE STGPOOL <stgpool name> PREVIEW=NO MAXPROCESS=<n>

5. If the data is (or might be) replicated, attempt to use node replication to recover the affected data on the missing or damaged volume(s) by issuing the following command and waiting for the process to end (monitor the process on the source and target servers):

REPLICATE NODE * RECOVERDAMAGED=ONLY WAIT=YES

6. For any file volume(s) identified in step 2 above (READONLY), attempt to move any existing valid data from those volumes to other new volumes in the same storage pool (and wait for the process to end). Do not move the data to a different storage pool, and do not issue this command for missing volume(s) identified in step 1:

MOVE DATA <volume name>

7. Issue the following commands to determine if there are any objects or referenced deduplicated base chunks remaining on any of the volumes identified in steps 1 (DESTROYED) or 2 (READONLY) above:

QUERY CONTENT <volume name> FOLLOWLINKS=NO

QUERY CONTENT <volume name> FOLLOWLINKS=JUSTLINKS

Part 2: Removing data that cannot be recovered from the missing or damaged volumes:

8. For any file volume(s) identified in step 2 (READONLY) above, ensure that all unreadable data remains marked as damaged by initiating an audit (and wait for the process to end):

AUDIT VOLUME <volume name> FIX=YES

9. For any file volume(s) identified in step 2 (READONLY) above, attempt to move any remaining valid data from those volumes to other volumes in the same storage pool and wait for the process to end (do not move the data to a different storage pool):

MOVE DATA <volume name>

IMPORTANT:

10. For any file volume(s) identified in either step 1 (DESTROYED) or 2 (READONLY) above, remove the volume(s) and their remaining irrecoverable data from the Tivoli Storage Manager inventory using the following command (and wait for the process to end):

WARNING:

ANR4895E

Auditing and repairing a deduplicated file storage pool

IMPORTANT:

NOTE:

DELETE VOLUME <volume name> DISCARDD=YES

ANR2401E

Part 3: Recovering objects referencing damaged data ("invalid links"):

11. Scan and validate the deduplicated storage pool to determine if deleting the volume invalidated any links to base data:

VALIDATE EXTENTS <deduplicated stgpool> ACTION=MARKDAMAGED PREVIEW=NO

12. Review the activity log to determine the results of the above step. The results will look similar to the following:

07/09/2015 09:18:12 **** VALIDATE EXTENTS CURRENT TOTALS FOR dedup ****

07/09/2015 09:18:12 Validate Extents: Total invalid : 0

07/09/2015 09:18:12 Validate Extents: Total deleted : 0

07/09/2015 09:18:12 Validate Extents: Total damaged (in pool): 0

07/09/2015 09:18:12 Validate Extents: Total damaged (not in pool): 0

07/09/2015 09:18:12 ANR0985I Process 5 for VALIDATE EXTENTS running in the

BACKGROUND completed with completion state SUCCESS at

09:18:12 AM.

13. If a copy storage pool exists, attempt to restore any affected data on the missing or damaged volume(s) by issuing the following command (and wait for the process to end):

RESTORE STGPOOL <stgpool name> PREVIEW=NO MAXPROCESS=<n>

14. If the data is (or might be) replicated, attempt to use node replication to recover the affected data on the missing or damaged volume(s) by issuing the following command and waiting for the process to end (monitor the process on the source and target servers):

REPLICATE NODE * RECOVERDAMAGED=ONLY WAIT=YES

15. Repeat steps 11 and 12 to verify that no further issues are reported after the RESTORE STGPOOL or REPLICATE NODE RECOVERDAMAGED recovery attempt. If no further objects are invalid/damaged, then recovery is complete. If problems are still reported, then continue with the below step.

Part 4: Removing any remaining deduplicated data that cannot be recovered:

16. Review the following TechNote to download and start a dedupAuditTool.pl scan using the INVALIDATED_LINKS symptom code: Auditing and repairing a deduplicated file storage pool. Once the script has started, contact IBM support for further assistance in resolving the remaining damage that could not be recovered. The script output, once complete, will be reviewed by the support team and further instructions will be provided to you.

[{"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Server","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Supported Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Product Synonym

ITSM ADSM TSM Spectrum protect

Was this topic helpful?

Document Information

Modified date:
17 June 2018

UID

swg21883611

Tips

Recovering from lost or damaged FILE volumes in deduplicated storage pool.

Troubleshooting

Problem

Environment

Resolving The Problem

Part 1: Recovering data using a copy storage pool (all supported levels) and/or a node replication target (7.1.1.0+ only):

Part 2: Removing data that cannot be recovered from the missing or damaged volumes:

Part 3: Recovering objects referencing damaged data ("invalid links"):

Part 4: Removing any remaining deduplicated data that cannot be recovered:

Product Synonym

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?