Direct links to fixes
APAR status
Closed as program error.
Error description
A replication storage rule may lead to hang condition on the target replication server when the SdReplicateUnresolvedChunksThread exits due to a critical error. Replication storage rule hangs and it cannot be cancelled from the target sever after at least one failed replication storage rule has occurred. The hang of replication storage rule occurs from slow running replication storage rule transactions that become slower and slower due to enormous amount of extents that are needing updates and they need to be updated in the database table SD_REFCOUNT_UPDATES. This table is used to manage the reference counts for the deduplication catalogue. The issue is the backlog of the SD_REFCOUNT_UPDATES table since upon startup the thread that is responsible for managing that table gets into a single threaded mode trying to resolve inflight entries from the failed replication. The queries into this table, used by the replication storage rule process, become slower and slower and eventually the IBM Spectrum Protect Server causes replication storage rule to hang as there are a lot of entries in SD_REFCOUNT_UPDATES table from the failed replication storage rule. This will only occur on the target server side of IBM Spectrum Protect Server. This APAR can be identified by following steps: 1) On the target server SERVER2, a query process output shows the following process: Protect: SERVER2>query process Process Process Description Job Id Process Status Parent Number Process -------- -------------------- ---------- ------------------------------------------------- -------- 2 Inbound replication XX Inbound Replication Storage Rule REPLICATION storage rule from source server SERVER1, source process 2, REPLICATION from source job XY. SERVER1 when trying to cancel the storage rule for replication process, the following is shown: Protect: SERVER2>cancel process 2 ANR0943E CANCEL PROCESS: Process 2 could not be cancelled. ANS8001I Return code 14. 2) In GSTACK output from the source server check for: SdReplTcrPhaseCheckThread SdReplicateUnresolvedChunks If either of these threads are not seen in gstack output then APAR applies. GSTACK output can be gather by commands bellow from Linux system from root session: "ps -ef |grep dsmserv" note the pid of the dsmserv process to be used in the following command "gstack <dsmserv-pid> IBM Spectrum Protect Versions Affected: IBM Spectrum Protect Server 8.1.13 and above on all supported platforms Additional Keywords: Spectrum Protect; TSM; stgrule; replication; hang; SdReplTcrPhaseCheckThread; SdReplicateUnresolvedChunks; TS008112620
Local fix
The target server will need to be re-cycled before it can receive another replication process.
Problem summary
**************************************************************** * USERS AFFECTED: * * All Spectrum Protect users of replication storage rules * **************************************************************** * PROBLEM DESCRIPTION: * * See error description. * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in level 8.1.18. Note that this is * * subject to change at the discretion of IBM. * ****************************************************************
Problem conclusion
This problem was fixed. Affected platforms: AIX, Linux, and Windows.
Temporary fix
Comments
APAR Information
APAR number
IT42566
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
81L
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-11-24
Closed date
2023-02-02
Last modified date
2023-02-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81L","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
17 March 2023