APAR status
Closed as program error.
Error description
Error Description: When a MOVE CONTAINER command is issued either manually or from the internal defragmentation processing on a directory container storage pool, this can leave the container in a readonly state. There might not be any obvious errors or messages to the reason the container is now in the readonly state, which can lead to out of space issues within the container pool. Please be aware that there valid times that a move container will leave a container in a readonly state, which should be seen from the activity log. Also note that when performing a MOVE CONTAINER, this firstly changes the container state to readonly. Once complete, the state should change appropriately. Currently there are two scenarios when seeing this happen : 1) When a MOVE CONTAINER is ran on a container at the same time that chunks in the container are expiring, this can cause the location update to fail when it finds that a chunk entry no longer exists. This will cause the MOVE CONTAINER process to end in warning and leave the container in a read-only state, preventing further writes to it. You will see the following error logged in the activity log : ANR0103E sddefrag.c(4485): Error 1114 updating row in table "SD.Chunk.Locations" 2) Extents which cannot be moved during the move container process, will leave the container in a readonly state. There are no errors logged in the activity log, nor within the dsmffdc.log. Customer/L2 Diagnostics (If Applicable) For scenario 2), obtaining an SDCNTR trace will reveal that the movement of extents failed. Example of this being : 23:07:47.546 [254][sdcntr.c][4645][SdAcquireAnyContainer]:Using strategy AllocNewCntr with size 294912 23:07:47.546 [254][sdcntr.c][4781][SdAcquireContainer]:Enter: directory E:\TSM\stgdir(1), type 1, size 294912. 23:07:47.546 [254][sdcntr.c][9375][AllocNewContainer]:Enter: size 294912, type 1, getSufficientRange False 23:07:47.546 [254][sdcntr.c][9434][AllocNewContainer]:Directory is full. Requested 294912, minSize 1073741824 23:07:47.546 [254][sdcntr.c][9517][AllocNewContainer]:Couldn't allocate a new container for directory E:\TSM\stgdir(1) with rc=1001 23:07:47.546 [254][sdcntr.c][9627][AllocNewContainer]:Exit: rc 1001, cntrId 0, offset -1 23:07:47.546 [254][sdcntr.c][5233][SdAcquireContainer]:Exit: rc 1001, cntrId 0 23:07:47.572 [253][sddefrag.c][2934][SdMoveContainer]:sdRtrv failed with rc=1001 23:07:47.603 [253][sddefrag.c][3456][SdMoveContainer]:Number of chunks: 17428, moved: 17427, failed: 1 The key error for this example is hitting 1001 which is : #define GRC_NO_SPACE 1001 In this instance, the move process was running at the same time as client backups were sending data into the container pool, so multiple containers were open and space reserved in the containers at the time. The move itself cannot be performed as it hits an out of space error, but this is not externalised into the activity log or dsmffdc log to report this. The process simply ends, but the container remains in a readonly state. This has a knock on effect of potentially many other containers which require to be defragmented, all changing to readonly with the same error. As each container remains in a readonly state, less space is available in the storage pool and client sessions will eventually start to fail with an out of space error. IBM Spectrum Protect Versions Affected: Spectrum Protect Server versions 8.1.4 and 8.1.5 on all platforms. Initial Impact: Low|Medium|High Medium Additional Keywords: tsm, ANR0522W, OOS, TS001182092, TS000843636, TS001166483, defrag
Local fix
For scenario 1), simply ensure the move is performed again on the container. For scenario 2) : Find the containers which are in a readonly by issuing : Q CONTAINER STGPOOL=<POOL> STATE=READONLY Check the activity log for any errors which could validly have placed the container into a readonly state, such as an I/O error. If you find a valid error, do not attempt to place that container back into an AVAILABLE state. If nothing is found, then you can change the state of the containers back to AVAILABLE by issuing the following command from within a DB2 command prompt : To update ALL readonly containers, issue : db2 connect to tsmdb1 db2 set schema tsmdb1 db2 "update sd_containers set state=0 where state=2" For individual readonly containers, issue : db2 connect to tsmdb1 db2 set schema tsmdb1 db2 "update sd_containers set state=0 where cntrname='container-name'" Replace 'container-name' with the actual container name. Once containers are available, these can be written to again for data ingest. Also consider spreading the workload out for client data ingest and lowering the number of containers that sessions are allowed to open by lowering the SDMaxSessionContainers value from the default of 50. Be aware of potential performance problems if altering this for client backups.
Problem summary
**************************************************************** * USERS AFFECTED: * * All IBM Spectrum Protect server users. * **************************************************************** * PROBLEM DESCRIPTION: * * See error description. * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in level 8.1.6. Note that this is * * subject to change at the discretion of IBM. * ****************************************************************
Problem conclusion
This problem was fixed. Affected platforms for reported release: AIX, Linux, and Windows. Platforms fixed: AIX, Linux, and Windows.
Temporary fix
Comments
APAR Information
APAR number
IT26182
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
81W
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2018-09-05
Closed date
2018-09-17
Last modified date
2018-09-17
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81W","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
17 September 2018