IBM Support

SE51961 - IPL-SRCC9002910-LOOP Loop sending message

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 APAR (Authorized Program Analysis Report)

Abstract

IPL-SRCC9002910-LOOP Loop sending message

Error Description

The ipl after a warm flash copy is essentially the same thing as
the ipl after a system termination where the data in memory is  
not preserved. For example, losing power without Uninterruptable
Power Supply (UPS) protection could cause the data in memory to
not be written to disk. In this case QSYSOPR message queue was  
being recovered in the first ipl after the warm copy. However,  
an unexpected condition was encountered and the system was in a
loop with c9002910 on the panel. A Main Store Dump was taken and
step mode ipl was started. When the DST signon screen appeared,
service damaged the qsysopr message queue and continued the ipl.
This apar is being opened to see if there is a way to enhance  
qsysopr recovery during the ipl after losing memory.            

Problem Summary

****************************************************************
* PROBLEM: (SE51961) Licensed Program = 5770SS1                *
*           Looping Condition                                  *
****************************************************************
* USERS AFFECTED: All IBM i operating system users for i 7.1.  *
****************************************************************
* RECOMMENDATION: Apply PTF SI46804 for i 7.1.                 *
****************************************************************
A job will go into a loop while sending a message to a          
nonprogram message queue, like QHST or QSYSOPR. It does not    
affect messages sent to a job log or program message queue.    
Frequently, the problem occurs after a system crash, forced IPL,
or use of flash copy support. During the IPL after one of those
functions, the SCPF job will go into a loop while sending a    
message. Usually, program QMHSNSTQ appears in the stack as the  
program that is looping. One symptom was the target of warm    
flash copy function stays at SRC C9002910 for hours. A different
customer system went into a loop sending a message to a user    
profile message queue, which also involved QMHSNSTQ looping, but
it did not occur during an IPL.                                
                                                               
These functions, system crash, forced IPL and use of flash copy,
result in the loss of data in memory, so it appears as if      
QSYSOPR was interrupted in the middle of adding a message to the
message queue. It is similar to a system losing power without  
Uninterruptable Power Supply (UPS) protection which could cause
the data in memory to not be written to disk. With flash copy,  
QSYSOPR message queue was being used during the first IPL after
the flash copy when the loop was noticed. A Main Store Dump was
taken and step mode IPL was started. When the DST signon screen
appeared, service damaged the QSYSOPR message queue to continue
the IPL. This APAR is being opened to see if there is a way to  
enhance QSYSOPR recovery during the IPL after losing memory.    

Problem Conclusion

The message queue being used during the loop had logical damage
or corruption. For example, a message chain would be corrupt    
when a message entry points to a wrong location. The corruption
can occur when the updating of a message queue is interrupted in
the middle of making a change. The system is designed to detect
several types of message queue corruption or damage, and would  
force an MSGMCH0601 to cause message queue cleanup to occur. The
loop occurred because the message queue cleanup was not getting
invoked as expected. This was caused by a change in low-level  
system function that no longer produced an error as it did in  
the past. This fix is making a change to the method used to    
force the MCH0601, so the message queue cleanup will once again
be called as expected when logical damage or corruption is      
detected.                                                      
                                                               
This fixes message handling send operations that are sending    
messages to a nonprogram message queue. The fix will not prevent
message queue logical damage or corruption from occurring. It  
only forces the MCH0601 to be generated, so the cleanup program
can be invoked as it was in the past, before the low-level      
system change was made to eliminate the error.                  

Temporary Fix

                        *********                              
                        * HIPER *                              
                        *********                              

Comments

Circumvention


PTFs Available

R710 SI46804 PTF Cover Letter   2279

Affected Modules

         
         

Affected Publications

Summary Information

Status............................................ CLOSED UR1
HIPER........................................... Yes
Component.................................. 5770SS1WM
Failing Module.......................... RCHMGR
Reported Release................... R710
Duplicate Of..............................




System i Support

IBM disclaims all warranties, whether express or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. By furnishing this document, IBM grants no licenses to any related patents or copyrights. Copyright © 1996,1997,1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 IBM Corporation. Any trademarks and product or brand names referenced in this document are the property of their respective owners. Consult the Terms of use link for trademark information

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG15Q","label":"APARs - OS\/400 General"},"Component":"","ARM Category":[],"Platform":[{"code":"PF012","label":"IBM i"}],"Version":"V7R1M0;V9R9M9","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG16C","label":"APARs - IBM i 7.1 environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF012","label":"IBM i"}],"Version":"V7R1M0;V9R9M9","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
13 October 2012