IT26182: MOVE CONTAINER CAN LEAVE THE CONTAINER IN A READONLY STATE

APAR status

Closed as program error.

Error description

Error Description:

When a MOVE CONTAINER command is issued either manually or from
the internal defragmentation processing on a directory container
storage pool, this can leave the container in a readonly state.
There might not be any obvious errors or messages to the reason
the container is now in the readonly state, which can lead to
out of space issues within the container pool. Please be aware
that there valid times that a move container will leave a
container in a readonly state, which should be seen from the
activity log.  Also note that when performing a MOVE CONTAINER,
this firstly changes the container state to readonly. Once
complete, the state should change appropriately.

Currently there are two scenarios when seeing this happen :

1) When a MOVE CONTAINER is ran on a container at the same time
that chunks in the container are expiring, this can cause the
location update to fail when it finds that a chunk entry no
longer exists. This will cause the MOVE CONTAINER process to end
in warning and leave the container in a read-only state,
preventing further writes to it.  You will see the following
error logged in the activity log :

 ANR0103E sddefrag.c(4485): Error 1114 updating row in table
"SD.Chunk.Locations"


2) Extents which cannot be moved during the move container
process, will leave the container in a readonly state. There are
no errors logged in the activity log, nor within the
dsmffdc.log.


Customer/L2 Diagnostics (If Applicable)

For scenario 2), obtaining an SDCNTR trace will reveal that the
movement of extents failed. Example of this being :

23:07:47.546 [254][sdcntr.c][4645][SdAcquireAnyContainer]:Using
strategy AllocNewCntr with size 294912
23:07:47.546 [254][sdcntr.c][4781][SdAcquireContainer]:Enter:
directory E:\TSM\stgdir(1), type 1, size 294912.
23:07:47.546 [254][sdcntr.c][9375][AllocNewContainer]:Enter:
size 294912, type 1, getSufficientRange False
23:07:47.546 [254][sdcntr.c][9434][AllocNewContainer]:Directory
is full.  Requested 294912, minSize 1073741824
23:07:47.546 [254][sdcntr.c][9517][AllocNewContainer]:Couldn't
allocate a new container for directory E:\TSM\stgdir(1) with
rc=1001
23:07:47.546 [254][sdcntr.c][9627][AllocNewContainer]:Exit: rc
1001, cntrId 0, offset -1
23:07:47.546 [254][sdcntr.c][5233][SdAcquireContainer]:Exit: rc
1001, cntrId 0

23:07:47.572 [253][sddefrag.c][2934][SdMoveContainer]:sdRtrv
failed with rc=1001
23:07:47.603 [253][sddefrag.c][3456][SdMoveContainer]:Number of
chunks: 17428, moved: 17427, failed: 1

The key error for this example is hitting 1001 which is :

#define GRC_NO_SPACE                            1001

In this instance, the move process was running at the same time
as client backups were sending data into the container pool, so
multiple containers were open and space reserved in the
containers at the time. The move itself cannot be performed as
it hits an out of space error, but this is not externalised into
the activity log or dsmffdc log to report this. The process
simply ends, but the container remains in a readonly state. This
has a knock on effect of potentially many other containers which
require to be defragmented, all changing to readonly with the
same error. As each container remains in a readonly state, less
space is available in the storage pool and client sessions will
eventually start to fail with an out of space error.



IBM Spectrum Protect Versions Affected:
Spectrum Protect Server versions 8.1.4 and 8.1.5 on all
platforms.

Initial Impact: Low|Medium|High
Medium

Additional Keywords:
tsm, ANR0522W, OOS,  TS001182092, TS000843636, TS001166483,
defrag

Local fix

For scenario 1), simply ensure the move is performed again on
the container.

For scenario 2) :

Find the containers which are in a readonly by issuing :

Q CONTAINER STGPOOL=<POOL> STATE=READONLY

Check the activity log for any errors which could validly have
placed the container into a readonly state, such as an I/O
error. If you find a valid error, do not attempt to place that
container back into an AVAILABLE state. If nothing is found,
then you can change the state of the containers back to
AVAILABLE by issuing the following command from within a DB2
command prompt :

To update ALL readonly containers, issue :

db2 connect to tsmdb1
db2 set schema tsmdb1
db2 "update sd_containers set state=0 where state=2"

For individual readonly containers, issue :

db2 connect to tsmdb1
db2 set schema tsmdb1
db2 "update sd_containers set state=0 where
cntrname='container-name'"

Replace 'container-name' with the actual container name.

Once containers are available, these can be written to again for
data ingest.

Also consider spreading the workload out for client data ingest
and lowering the number of containers that sessions are allowed
to open by lowering the SDMaxSessionContainers value from the
default of 50. Be aware of potential performance problems if
altering this for client backups.

Problem summary

****************************************************************
* USERS AFFECTED:                                              *
* All IBM Spectrum Protect server users.                       *
****************************************************************
* PROBLEM DESCRIPTION:                                         *
* See error description.                                       *
****************************************************************
* RECOMMENDATION:                                              *
* Apply fixing level when available. This problem is currently *
* projected to be fixed in level 8.1.6. Note that this is      *
* subject to change at the discretion of IBM.                  *
****************************************************************

Problem conclusion

This problem was fixed.
Affected platforms for reported release:  AIX, Linux, and
Windows.
Platforms fixed:  AIX, Linux, and Windows.

Temporary fix

Comments

APAR Information

APAR number
IT26182
Reported component name
TSM SERVER
Reported component ID
5698ISMSV
Reported release
81W
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2018-09-05
Closed date
2018-09-17
Last modified date
2018-09-17

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name
TSM SERVER
Fixed component ID
5698ISMSV

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81W","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
17 September 2018

Tips

IT26182: MOVE CONTAINER CAN LEAVE THE CONTAINER IN A READONLY STATE

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Fix information

Fixed component name

Fixed component ID

Applicable component levels

Document Information

Share your feedback

Need support?