CQS structure rebuild problems
The most common structure rebuild problem is a rebuild failure. Some environmental situations can occur that cause rebuild to fail.
Rebuild failures
Other types of rebuild problems are much more rare, such as rebuild hanging, rebuild not being initiated when required, work hanging after a successful rebuild, rebuild losing data objects, and rebuild duplicating data objects. Follow these general steps to address any rebuild failure you encounter:
- Collect SYSLOGs Collect the syslog for each LPAR that is running a CQS that is sharing queues. Evaluate each syslog for the following information:
- How the rebuild was initiated (operator command, structure failure, CF failure, link failure).
- How the rebuild was stopped (operator command or CQS).
- Rebuild master (CQS0240I message).
- Rebuild type (COPY or RECOVERY in CQS0240I message).
- Structure quiesced or resumed messages:
- CQS0200I STRUCTURE strname QUIESCED FOR reason
- CQS0201I STRUCTURE strname RESUMED AFTER reason
- Structure status change messages (CQS0202I).
- Structure rebuild messages:
- CQS0240I CQS cqsname STARTED STRUCTURE copy/recovery FOR STRUCTURE strname
- CQS0241I CQS cqsname COMPLETED STRUCTURE copy/recovery FOR STRUCTURE strname
- CQS0242E CQS FAILED STRUCTURE copy/recovery/rebuild FOR STRUCTURE strname
- CQS0243E CQS cqsname UNABLE TO PARTICIPATE IN REBUILD FOR STRUCTURE strname
- CQS0244E STRUCTURE RECOVERY REQUIRED AFTER RECOVERY FAILURE FOR STRUCTURE strname
- CQS0245E STRUCTURE strname REBUILD ERROR
- Consult the CQS Restart and Rebuild Error Reason Codes table
- Check rebuild status Check the rebuild status by issuing the following command on every LPAR where a CQS participating in the rebuild resides:
D XCF,STRUCTURE,STRNAME=strname
If the output indicates that rebuild is waiting for a particular event, a CQS might not be responding to a rebuild event because it is hung or in a loop, which hangs the rebuild. Consider dumping the CQS address space and canceling the CQS that is not responding to the rebuild event, to see if that enables the rebuild to continue.
- Analyze if structure still viable
If a structure copy initiated by an operator failed, no action needs to be taken to restore access to the structure. The structure is still viable and you still have access. Analyze why the structure copy failed, to determine whether you need to take action to prevent a subsequent rebuild failure.
- Restore link, if applicable
If a structure rebuild was initiated because of a link failure and the structure rebuild failed, try to restore the link to restore access to the structure. The structure is still viable. Analyze why the structure rebuild failed, to determine whether you need to take action to prevent a subsequent rebuild failure.
- Contact IBM® If you are unable to resolve the problem, take the following actions:
- Copy the SYSLOG, including the
D XCF,STRUCTURE,STRNAME=strname
output from every LPAR. - Dump all the CQS address spaces, including the rebuild master CQS address space. Message CQS0240I indicates the rebuild master name.
- Retain the CQS log records. The CQS log might contain important
log records pertaining to data objects put on the structure, moved
on the structure, or deleted from the structure. The CQS log might
also contain important log records pertaining to rebuild, such as:
- Rebuild begin log record (4301).
- Rebuild end log record (4302).
- Rebuild failed log record (4303).
- Rebuild lost UOW list log record (4304).
- Request log records (03xx, 07xx, 08xx, 0Bxx, 0Dxx).
- Retain the IMS log records.
- Create a structure dump if you suspect a rebuild hang. The structure dump might contain important information about structure locks.
- Call the IBM Software Support for help.
- Copy the SYSLOG, including the