IBM Support

Boundary Condition Can Occur Causing Files in a Container Storage Pool to Be Left Unrecoverable



APAR IT27050 may affect directory-container and cloud-container storage pools which can result in damaged deduplicated extents (chunks). Any files affected by the damaged deduplicated extents may become unrecoverable. This potential loss of data may affect future data restore, retrieve, or space management recall operations.

This APAR represents a boundary condition that may result in incorrect data being written to a container storage pool. The issue occurs when client-side deduplication is being performed on a file greater than 10 GB, and incorrect deduplicated extents could be written to the storage pool. Subsequent data read operations such as a client-based restore or a server-based REPLICATE NODE will fail. Additionally, subsequent ingested files could link to these incorrect extents and cause these files to also be unrecoverable. In a directory container pool, AUDIT CONTAINER will detect and report that there are damaged deduplicated extents in the respective containers. Note that the incorrect deduplicated extents may be propagated from the source to the target in a server-to-server replication pair where PROTECT STGPOOL is being used.


Problem Summary:
Client data stored in directory or cloud container storage pools may encounter an error in processing which results in portions of data being stored incorrectly. The affected data may not be readable by any IBM Spectrum Protect client or IBM Tivoli Data Protection client in a restore, retrieve, or recall scenario.



Who is affected:

This APAR affects users of deduplication-based directory-container or cloud-container storage pools where the IBM Spectrum Protect server is version version or version and higher.


Problem Resolution:

A fix for the IBM Spectrum Protect server which resolves APAR IT27050 will be delivered in the following server levels. This is subject to change at the sole discretion of IBM:

If you require a fix before these levels become available, contact IBM Software Support.


Recommended Actions:

The following are the recommended cleanup actions to perform after the fixing level is applied. Please review the entire set of instructions first. There are steps that may be repeated multiple times, often times once per affected storage pool, before proceeding to the next step. Some steps may not apply to you. Step 3 may be skipped if you do not have a storage type “CLOUD” storage pool. The recommended steps are:

  1. Issue the command:  QUERY STGPOOL * FORMAT=DETAILED

    For each storage pool of storage type “DIRECTORY” or “CLOUD”, proceed through the following steps. If all storage pools for a given server are storage type “DEVCLASS”, then that server is not affected by the APAR discussed in this advisory.

    This needs to be performed for all IBM Spectrum Protect servers. 

  2. For each storage pool of storage type “DIRECTORY” identified in step 1, perform the following:

    Steps 2-1 and 2-2 need to be performed on all servers before performing step 2-3.
    1. Minimize the defragmentation processing (which is done through background MOVE CONTAINER operations) for the directory container storage pools. This is done by setting these server options to a value of 99. The options to set to 99 are:  DEFRAGFSTRIGGER and DEFRAGCNTRTRIGGER. Before changing these values, issue QUERY OPTION and make note of the current values for these options so that they can be returned to their originally set values after these remediation actions are completed. The default values are 90 for the DEFRAGFSTRIGGER and 95 for the DEFRAGCNTRTRIGGER.

      These options can be set using the SETOPT command. An example of how to set this is to issue the following commands to the IBM Spectrum Protect server:


      Manually performed MOVE CONTAINER processes should not be run until the entire set of clean up instructions is complete. If a MOVE CONTAINER process is started before the pool is fully audited (whether manually or automatically), then the audit must be performed again.

    2. Doing the following ensures the deduplication catalogs are synchronized so that the AUDIT CONTAINER operations in the following steps are complete:

      For each server that runs the PROTECT STGPOOL command, ensure there is a complete and successful run of PROTECT STGPOOL before proceeding to the next step.

      If REPLICATE NODE is used in the environment (without the use of PROTECT STGPOOL), ensure there is a complete and successful run of REPLICATE NODE before proceeding.  
    3. Perform AUDIT CONTAINER on all container files in the storage pool. The AUDIT CONTAINER command will audit any container that has a “Last Audit Date” older than the day these AUDIT CONTAINER operations are started. 

      The AUDIT CONTAINER processing is constrained by disk I/O, CPU, and memory.  The larger the pool is, the longer it will take to complete the entire AUDIT CONTAINER operation. In a large blueprint server, AUDIT CONTAINER may be required for tens or thousands of containers, depending on how many exist in the system, and could take many days to perform.

      This audit can be performed manually or automatically. To perform it automatically, this is done using storage rules which were introduced in 8.1.5 and higher. This is documented in the following link:

      Audit Storage Rule

      To do this manually, make note of the date when this action is started. For example, if the date that the fixing level of the server was installed and when the AUDIT CONTAINER operations are started is April 17, 2019 then that is the date that will be used. The recommended syntax of the command is:


      This will cause all containers in the referenced pool with a last audit date prior to April 17, 2019 to be audited. 

      This command can be scheduled to run every day for some number of hours. Be aware that the number of processes used on the command is set to 20 (MAXPR=20). Care should be used when considering this setting since AUDIT CONTAINER will add I/O load on the system by performing reads from the container storage pool directories. It will also add additional load on the server while it is performing the calculations to create the cryptographic digest (SHA-1) used to validate the deduplicated extents during the audit processing. If the AUDIT CONTAINER processing is impacting the performance of the IBM Spectrum Protect server, such as slowing down client backups, consider cancelling and invoking the command again with an adjusted MAXPR value depending on the impact of the audit in the environment. There may be opportunity to move the AUDIT CONTAINER to a window that is under lighter load. This might allow for a higher MAXPR value for improving runtime performance.

    4. To monitor the progress of the AUDIT CONTAINER processing for this pool, consider one of the following: Once all the containers in the referenced directory container storage pool have been audited, proceed to the next step.

      AUDIT CONTAINER processing will mark any deduplicated extents (chunks) that fail to validate as damaged. These can be viewed with the QUERY DAMAGED command.

      Repeat step 2-3 above for other storage type “DIRECTORY” storage pools. For multiple pools, it is possible to do this concurrently if there is sufficient CPU resource to handle the operations. Ensure that no more than 40 AUDIT CONTAINER processes are running at a given time on a single server.
      1. Issue the command: “QUERY CONTAINER F=D”. 

        Review those containers with an “Appox. Last Audit Date” prior to the date the audit processing was started.

      2. Issue the command: “SELECT COUNT(*) FROM CONTAINERS WHERE STGPOOL_NAME=’stgpool_name_goes_here’ AND LASTAUDIT_DATE<=’04/17/2019’”. 

        This will provide a count of the number of containers that still need to be audited for the referenced storage pool.
    5. Once the AUDIT CONTAINER processing has been completed, revert the defragmentation options back to their original settings. The following illustrate how to set these options back to their default settings:


  3. For each storage pool of storage type “CLOUD” identified in step 1, perform the following:

    Contact IBM support. An audit tool will be provided along with instructions on how to use it. The audit tool will scan the container objects stored to the object storage (cloud). It will provide a summary of the containers that have been evaluated.

    For any container objects stored to the object storage which have deduplicated extents affected by the APAR discussed in this advisory, the audit tool will create a shell script to execute against the IBM Spectrum Protect server. The shell script will have a list of the affected deduplicated extents. This shell script will need to be executed against the IBM Spectrum Protect server to mark those deduplicated extents (chunks) as damaged. Once the script has been performed, the affected extents will be reported in QUERY DAMAGED.

  4. At this point, all storage type “DIRECTORY” and “CLOUD” storage pools have been through the audit identification processing. This audit processing results in all the affected deduplicated extents (chunks) as having been marked in the IBM Spectrum Protect server database. To review the client nodes and a count of damaged data extents, issue the command:  QUERY DAMAGED TYPE=NODE.
  5. It may be possible for the IBM Spectrum Protect server to replace the damaged deduplicated data extent during normal ingest processing. This is done by having the client where the data originated resend the data which is affected. When the server receives an extent that matches an existing extent that is damaged, the IBM Spectrum Protect server will automatically store a new copy of that extent from the ingest stream and then update the meta-data for all objects referencing the damaged extent.

    When following sub-steps (5-1, 5-2, and 5-3 below), periodically review the QUERY DAMAGED results. As affected deduplicated extents are replaced, other data reported as damaged may also be corrected because the deduplicated extent may be shared by many different files. 

    In order to utilize this damaged extent replacement capability of the IBM Spectrum Protect server, consider the following actions for each client reported in step 4 above:

    1. For backup data, have the client perform a full backup. The effectiveness of this full backup may vary depending on the type of client in use. 
    2. For archive data, have the client perform a new archive of the affected data. This is only possible if the archive data is still available on the client or from some other location outside of IBM Spectrum Protect. For data where there is no other copy available, the data is lost and not recoverable.
    3. For space managed (HSM) data, if another copy of the affected file is available from some other location, copy and replace the damaged file in the HSM filesystem using this other copy of the data. If no other copy of the affected HSM migrated file exists, then delete the affected file from the HSM filesystem. In the event that a copy is not available for the affected HSM file, the data is lost and not recoverable. 
  6. Once all data that could be re-ingested in step 5 is completed, the final step is to remove any remaining damaged deduplicated extents. Those remaining damaged extents can be reviewed using the command QUERY DAMAGED TYPE=INVENTORY. This command may be long-running depending on the amount of damage remaining in the environment. This provides a final list of any objects that were lost as a result of the APARs identified above.

    To remove the remaining damaged deduplicated extents from the IBM Spectrum Protect server, issue the command “AUDIT CONTAINER STGPOOL=<stgpool name goes here> ACTION=REMOVEDAMAGED”.

    This command should be performed for each pool of either storage type “DIRECTORY” or “CLOUD” in step 1 above. 


For any questions or other assistance regarding this advisory, please contact IBM support.

[{"Business Unit":{"code":"BU010","label":"Systems - Storage"},"Product":{"code":"SSEQVQ","label":"IBM Spectrum Protect"},"Component":"Server","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":";","Edition":""}]

Document Information

Modified date:
17 February 2020