Topic
  • 26 replies
  • Latest Post - ‏2012-11-16T16:32:03Z by jerberstark
stevedan
stevedan
3 Posts

Pinned topic What's the best way to cause a system to create a crashdump in case of a system hang?

‏2012-09-07T18:35:35Z |
 I've been able to set up some of our systems with a kdump installation and get crashdumps.  I believe that what we are getting instead of system crashes is system hangs.  I think that what we may need is a way of causing a crash when the system is unresponsive and we can't get to a command line to initiate a crash.  In some situations we might want to cause a crash to get a vmcore file on an unattended system that is hung.
 
I am considering the following methods of forcing a crash:
 
1.  Using the service processor from a remote system to force a crash over the network.  I know a system can be powered down this way, but we need to initiate a crashdump.
 
2.  Seems like I tried to get a NMI (non maskable interrupt) to work before using the sysctl setup without success. 
 
3.  Using the IPMI interface to set up a watchdog timer to the hardware.  This should, if it works, get a system back into operation as well as get us a crashdump.
 
My question is if you have any caveats for these features working on our Linux systems running on the Power 5 ppc64 systems.  Or stated differently, is there a recommended method to create a crashdump for Linux on Power.
Updated on 2012-11-16T16:32:03Z at 2012-11-16T16:32:03Z by jerberstark
  • Brian_King
    Brian_King
    23 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-07T19:49:22Z  
     The two methods I've used to trigger a crashdump are:
     
    1. Via sysrq:
     a. Enable sysrq: echo 1 > /proc/sys/kernel/sysrq
     b. Trigger the crashdump at the Linux LPAR console via: ctrl-o c
    2. Via the management console. Select the LPAR and issue a "dump restart".
  • hbabu
    hbabu
    8 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-07T20:19:03Z  
     During system hard hang situations, soft-reset is the only way and reliable - means 'dump restart' from HMC if HMC is used (Ex: select 'operations' for the specific LPAR -> 'restart' and 'dump') or you can use NMI from service processor  / ASM. If you do not have any of these interfaces, you can press yellow button on the system softly. Note that hard pressing this button reboots the system. So HMC or ASM interfaces are best options.
     
    First please check whether kdump is setup properly.  'cat /sys/kernel/kexec_crash_loaded' should return 1. Or you can take the test dump using ' echo 1 > /proc/sys/kernel/sysrq and echo c > /proc/sysrq-trigger'
     
     
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-07T20:41:06Z  
     The two methods I've used to trigger a crashdump are:
     
    1. Via sysrq:
     a. Enable sysrq: echo 1 > /proc/sys/kernel/sysrq
     b. Trigger the crashdump at the Linux LPAR console via: ctrl-o c
    2. Via the management console. Select the LPAR and issue a "dump restart".
     Re: a.  I'm getting the impression that it's too late to enter commands once the system is hung. 
     
    Need more info on what LPAR is in relation to RH5 Linux on Power5.  We no longer have working HMCs.
     Where would I go to put this in our context?
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-07T20:42:53Z  
    • hbabu
    • ‏2012-09-07T20:19:03Z
     During system hard hang situations, soft-reset is the only way and reliable - means 'dump restart' from HMC if HMC is used (Ex: select 'operations' for the specific LPAR -> 'restart' and 'dump') or you can use NMI from service processor  / ASM. If you do not have any of these interfaces, you can press yellow button on the system softly. Note that hard pressing this button reboots the system. So HMC or ASM interfaces are best options.
     
    First please check whether kdump is setup properly.  'cat /sys/kernel/kexec_crash_loaded' should return 1. Or you can take the test dump using ' echo 1 > /proc/sys/kernel/sysrq and echo c > /proc/sysrq-trigger'
     
     
     Already plan to check out using the service processor.  (Item 1 at top.)  Have no working HMC.
  • Bill_Buros
    Bill_Buros
    167 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-07T21:08:19Z  
     Re: a.  I'm getting the impression that it's too late to enter commands once the system is hung. 
     
    Need more info on what LPAR is in relation to RH5 Linux on Power5.  We no longer have working HMCs.
     Where would I go to put this in our context?
    I assume you mean your Power5 is a single-system install of RHEL5.     In that case, the "LPAR" is that single-system image.
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-10T21:16:05Z  
     Already plan to check out using the service processor.  (Item 1 at top.)  Have no working HMC.
     I've seen reference to wd_keepalive (a simplified watchdog daemon),  watchdog man pages, IPMI watchdog on:
     
             
     
    So far, the documentation doesn't say which platforms this is applicable.  Can I assume that it all applies to the P5? Or would it apply to ALL IBM platforms?
     
    The watchdog daemon mentions interfacing with the hardware, but I don't  see where it says if this is automatic or not or describes how to interface with the hardware watchdog.  Just scratching the surface relative to IPMI.
  • jerberstark
    jerberstark
    31 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-10T23:04:20Z  
     I've seen reference to wd_keepalive (a simplified watchdog daemon),  watchdog man pages, IPMI watchdog on:
     
             
     
    So far, the documentation doesn't say which platforms this is applicable.  Can I assume that it all applies to the P5? Or would it apply to ALL IBM platforms?
     
    The watchdog daemon mentions interfacing with the hardware, but I don't  see where it says if this is automatic or not or describes how to interface with the hardware watchdog.  Just scratching the surface relative to IPMI.
    Hi Robin,
     
    The blueprint you linked to is written for System x. Below is a snippet from the Scope, requirements, and support page of the blueprint.  I also looked at the Supported features for PowerLinux systems, and we don't have IPMI listed there. Hence, it looks like IPMI isn't supported on PowerLinux systems. If someone else from the team has other info, I will gladly update the documentation. 
     

    Hardware requirements

    The hardware for this blueprint include IPMI hardware with RHEL 5.2 or SLES 10.2 installed. If you are planning to install the latest version of IPMItool, you will need the Development Tools package group if your machine is running on RHEL. For SLES, the C/C++ Complier & Tools package pattern is sufficient.

    Note that all the instructions here are based on IPMI 2.0 hardware.

    For more information about servers that contain BMCs and thus support IPMI, see Appendix D: System Management overview in the IBM System x Online Configuration and Options Guide (COG) at http://www.ibm.com/systems/xbc/cog/appendixD/appxsysmgmtsupport.html.

    This blueprint was tested on System x stand-alone servers and IBM BladeCenter servers with BMC hardware

    To discover the IPMI version (1.5 or 2.0) on your server, run the command:

    # ipmitool mc info
     
     
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-10T23:19:29Z  
    Hi Robin,
     
    The blueprint you linked to is written for System x. Below is a snippet from the Scope, requirements, and support page of the blueprint.  I also looked at the Supported features for PowerLinux systems, and we don't have IPMI listed there. Hence, it looks like IPMI isn't supported on PowerLinux systems. If someone else from the team has other info, I will gladly update the documentation. 
     

    Hardware requirements

    The hardware for this blueprint include IPMI hardware with RHEL 5.2 or SLES 10.2 installed. If you are planning to install the latest version of IPMItool, you will need the Development Tools package group if your machine is running on RHEL. For SLES, the C/C++ Complier & Tools package pattern is sufficient.

    Note that all the instructions here are based on IPMI 2.0 hardware.

    For more information about servers that contain BMCs and thus support IPMI, see Appendix D: System Management overview in the IBM System x Online Configuration and Options Guide (COG) at http://www.ibm.com/systems/xbc/cog/appendixD/appxsysmgmtsupport.html.

    This blueprint was tested on System x stand-alone servers and IBM BladeCenter servers with BMC hardware

    To discover the IPMI version (1.5 or 2.0) on your server, run the command:

    <pre class="pre codeblock" id="vcc_sp_pre_44" tabindex="22"># ipmitool mc info</pre>
     
     
     Thanks.
     
    I can do a man command on IPMI and get results.  I don't see anything related to watchdog within it.
     
    This leaves open as to whether the other watchdog interface with the hardware.  I don't know if the kernel is running when it hangs.  We can't get to any command line to poke 1 into /proc/sys/kernel/sysrq.  If the system is so hung, the kernel is hung, then if we can't set up a hardware watchdog, we might not be able to cause a crash.
     
    Tried the above command on P5 got:
     
     Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0:  No such file or directory
    Get Device ID command failed
  • hbabu
    hbabu
    8 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-10T23:30:34Z  
     I've seen reference to wd_keepalive (a simplified watchdog daemon),  watchdog man pages, IPMI watchdog on:
     
             
     
    So far, the documentation doesn't say which platforms this is applicable.  Can I assume that it all applies to the P5? Or would it apply to ALL IBM platforms?
     
    The watchdog daemon mentions interfacing with the hardware, but I don't  see where it says if this is automatic or not or describes how to interface with the hardware watchdog.  Just scratching the surface relative to IPMI.
     Generally IPMI is used to send NMI on x86/x86_64 systems since they do not have other way of sending NMI. The watchdog daemon is monitor the system and reboots it for hang scenarios. I think user has to send NMI using IPMI tool for take crash dump on these systems. I do not think IPMI is supported on power. 
     But as I mentioned above, we have reliable way of taking dump for hang systems using other methods.
    - Using HMC : Since your system not connected with HMC, it is not right option for you.
    - service processor : ASM interface. We can initiate dump with this interface remotely. Not sure what option/ interface did you used? Can you explain?
    - Press yellow button softly on the system.
     
     
     
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-12T03:45:47Z  
     I've been trying to set up a Firefox connection from one P5 Linux test system (a) to another P5 Linux test system's (b) service processor(s).  This is set up on system a as follows:
     
    Configure eth1 as 192.168.2.1/255.255.255.0, eth2 as  192.168.3.1/255.255.255.0
    System a's eth1 port is connected to system b's "HMC0" Ethernet port,  a's eth2 port is connected to b's "HMC1" port.
    Both connections use patch cable, although a crossover cable was tested with eth1/HMC0.
    Systems can communicate over network eth0 ports.
     
    After the network set up, it is possible to ping 192.168.2.147.  192.168.3.147 does not ping.  Firefox is brought up and the link set to:
     
              https://192.168.2.147/
     
    This connection either times out or continues indefinitely.  The 3.147 immediately fails.  Using "http://192.168.2.147/" ( without the "s") produces the same results.
     
     Is there another URL required or are there other conditions necessary to access the ASMI? 
     
    The "IBM System p5 570 Technical Overview and Introduction" (http://www.redbooks.ibm.com/redpapers/pdfs/redp9117.pdf) states:
     
              The Web interface to the Advanced System Management Interface is accessible through, at
              the time of writing, Microsoft® Internet Explorer® 6.0, Netscape 7.1, Mozilla Firefox, or
              Opera 7.23 running on a PC or mobile computer connected to the service processor.
     
    Does this mean one cannot use another P5 Firefox to access the service processor (ASMI) and must use a PC?
     
    Thanks.
     
  • stevedan
    stevedan
    3 Posts

    RE:What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-12T15:09:11Z  
    This reply was deleted by MaheshSal 2012-09-11T07:27:32Z.
    There was a post here about NMI Watchdog works under PowerLinux which referenced http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaai/crashdump/liaaicrashdumpnmiwatch.htm
     
    based on the post above is it safe to assume that this page is only for x86 and not PowerLinux? If so we need to update this documentation to stated this.
  • jerberstark
    jerberstark
    31 Posts

    Re: RE:What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-12T15:26:55Z  
    • stevedan
    • ‏2012-09-12T15:09:11Z
    There was a post here about NMI Watchdog works under PowerLinux which referenced http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaai/crashdump/liaaicrashdumpnmiwatch.htm
     
    based on the post above is it safe to assume that this page is only for x86 and not PowerLinux? If so we need to update this documentation to stated this.
     Hi stevedan,
     
    I agree that the doc team needs to comb through these blueprints and make it more clear which apply to Power systems running Linux, and which apply only to System x.  In your opinion, do you think we need to include this information on every page within the blueprint, or would it be adequate to update the Scope, requirements, and support topic that is contained in each blueprint?
     
     I checked the blueprint you're referencing, and it does state in the Scope, Requirement, and Support section that this wasn't tested on a Power system; but you're right that this statement doesn't exactly convey that NMI watchdog isn't supported on a Power system.
     http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaai/crashdump/liaaicrashdumpintro.htm

    Hardware and software requirements

    The instructions in this blueprint are written for Kdump servers and clients running the Red Hat Enterprise Linux (RHEL) 5.3 or SLES 10 SP2 operating systems. The Kdump server should have enough storage to receive the crash dumps from the clients.

    Kdump clients are tested on IBM System x servers; Kdump servers are tested on IBM System x and System p® servers. The Kdump utility is not supported if the Kdump client's operating system distribution does not match the Kdump client machine's.


     
    Updated on 2012-09-12T15:26:55Z at 2012-09-12T15:26:55Z by jerberstark
  • stevedan
    stevedan
    3 Posts

    Re: RE:What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-12T15:43:12Z  
     Hi stevedan,
     
    I agree that the doc team needs to comb through these blueprints and make it more clear which apply to Power systems running Linux, and which apply only to System x.  In your opinion, do you think we need to include this information on every page within the blueprint, or would it be adequate to update the Scope, requirements, and support topic that is contained in each blueprint?
     
     I checked the blueprint you're referencing, and it does state in the Scope, Requirement, and Support section that this wasn't tested on a Power system; but you're right that this statement doesn't exactly convey that NMI watchdog isn't supported on a Power system.
     http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaai/crashdump/liaaicrashdumpintro.htm

    Hardware and software requirements

    The instructions in this blueprint are written for Kdump servers and clients running the Red Hat Enterprise Linux (RHEL) 5.3 or SLES 10 SP2 operating systems. The Kdump server should have enough storage to receive the crash dumps from the clients.

    Kdump clients are tested on IBM System x servers; Kdump servers are tested on IBM System x and System p® servers. The Kdump utility is not supported if the Kdump client's operating system distribution does not match the Kdump client machine's.


     
     I think having it in the scope, requirement and support section is fine.I did miss that it existed in the section of the documentation.
     
    So based on your reply, this requirement statement would indicate that the watchdog should work on Power servers, correct?
     
    But I think I am seeing that we don't think this watchdog approach will work so I'm confused as is Robin.
     
  • jerberstark
    jerberstark
    31 Posts

    Re: RE:What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-12T16:08:36Z  
    • stevedan
    • ‏2012-09-12T15:43:12Z
     I think having it in the scope, requirement and support section is fine.I did miss that it existed in the section of the documentation.
     
    So based on your reply, this requirement statement would indicate that the watchdog should work on Power servers, correct?
     
    But I think I am seeing that we don't think this watchdog approach will work so I'm confused as is Robin.
     
    Stevedan, I think we've concluded that watchdog will NOT work on Power servers. My previous reply was basically asking how we could make it easier to see/understand that in the docs. Sorry for the confusion.
     
    I think we need someone from the development team to weigh and confirm that watchdog isn't an option here.
     
    hbabu - Haren, can you confirm that Robincox2 should NOT use watchdog in this case?
     

  • jerberstark
    jerberstark
    31 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-12T16:21:02Z  
     I've been trying to set up a Firefox connection from one P5 Linux test system (a) to another P5 Linux test system's (b) service processor(s).  This is set up on system a as follows:
     
    Configure eth1 as 192.168.2.1/255.255.255.0, eth2 as  192.168.3.1/255.255.255.0
    System a's eth1 port is connected to system b's "HMC0" Ethernet port,  a's eth2 port is connected to b's "HMC1" port.
    Both connections use patch cable, although a crossover cable was tested with eth1/HMC0.
    Systems can communicate over network eth0 ports.
     
    After the network set up, it is possible to ping 192.168.2.147.  192.168.3.147 does not ping.  Firefox is brought up and the link set to:
     
              https://192.168.2.147/
     
    This connection either times out or continues indefinitely.  The 3.147 immediately fails.  Using "http://192.168.2.147/" ( without the "s") produces the same results.
     
     Is there another URL required or are there other conditions necessary to access the ASMI? 
     
    The "IBM System p5 570 Technical Overview and Introduction" (http://www.redbooks.ibm.com/redpapers/pdfs/redp9117.pdf) states:
     
              The Web interface to the Advanced System Management Interface is accessible through, at
              the time of writing, Microsoft® Internet Explorer® 6.0, Netscape 7.1, Mozilla Firefox, or
              Opera 7.23 running on a PC or mobile computer connected to the service processor.
     
    Does this mean one cannot use another P5 Firefox to access the service processor (ASMI) and must use a PC?
     
    Thanks.
     
     Robinwcox2, I found some information in the Systems Hardware Info Center that disagrees with the technical overview on the supported browsers. Can you try one the browsers listed here? http://pic.dhe.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphby/requirements.htm
     
    Some more detailed steps for connecting and troubleshooting are also in the Systems HW Info Center: http://pic.dhe.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphby/connect_asmi.htm
  • hbabu
    hbabu
    8 Posts

    Re: RE:What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-12T22:49:20Z  
    Stevedan, I think we've concluded that watchdog will NOT work on Power servers. My previous reply was basically asking how we could make it easier to see/understand that in the docs. Sorry for the confusion.
     
    I think we need someone from the development team to weigh and confirm that watchdog isn't an option here.
     
    hbabu - Haren, can you confirm that Robincox2 should NOT use watchdog in this case?
     

     Yes, nmi_watchdog should not be used on power to generate kdump for system hang.
     
    This watchdog might invoke panic() like on other archtectures, but it can not stop other CPUs (in case if they are in deadlock) since powerpc does not have software NMI. The kernel uses this SW NMI to stop other CPUs and bring them to dump to capture their states. 
     
    So as I mentioned above, we should always use the following ways to take the dump or put the system in debugger for system hangs:
     
    - HMC interface (operations->restart->dump)
    - ASMI (parition dump option) from service processor
    - For blades, Select the blade and click 'reboot with NMI' on blade center management module
     
    The above interfaces are recommended options. But we can also use 'pressing yellow button softly on the system if available'.
     
    Blades will have small hole next to power button. Pressing this hole softly with a pin should also invoke soft-reset. But bladecenter MM interface is preferred option.
     
     
     
     
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-13T18:51:39Z  
     Robinwcox2, I found some information in the Systems Hardware Info Center that disagrees with the technical overview on the supported browsers. Can you try one the browsers listed here? http://pic.dhe.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphby/requirements.htm
     
    Some more detailed steps for connecting and troubleshooting are also in the Systems HW Info Center: http://pic.dhe.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/iphby/connect_asmi.htm
     Netscape 7.1 works from a PC running Windows.  We don't have PCs accessible to most systems.  Firefox comes native on Linux.  If we could get a browser to work from another P5, that would eliminate a number of problems.
     
    My associate hooked the PC into the HMC2 (vs. HMC1) service processor port.  I haven't had a chance to check that with Firefox.
     
    We seem to have to jump through a number of hoops to get what seems like it should be a basic function to work.
  • jerberstark
    jerberstark
    31 Posts

    Re: RE:What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-09-28T15:05:40Z  
    • hbabu
    • ‏2012-09-12T22:49:20Z
     Yes, nmi_watchdog should not be used on power to generate kdump for system hang.
     
    This watchdog might invoke panic() like on other archtectures, but it can not stop other CPUs (in case if they are in deadlock) since powerpc does not have software NMI. The kernel uses this SW NMI to stop other CPUs and bring them to dump to capture their states. 
     
    So as I mentioned above, we should always use the following ways to take the dump or put the system in debugger for system hangs:
     
    - HMC interface (operations->restart->dump)
    - ASMI (parition dump option) from service processor
    - For blades, Select the blade and click 'reboot with NMI' on blade center management module
     
    The above interfaces are recommended options. But we can also use 'pressing yellow button softly on the system if available'.
     
    Blades will have small hole next to power button. Pressing this hole softly with a pin should also invoke soft-reset. But bladecenter MM interface is preferred option.
     
     
     
     
     I have updated the IPMI blueprint to clarify that it does not apply to Power systems. Thanks for your feedback.
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-10-22T19:13:17Z  
     Netscape 7.1 works from a PC running Windows.  We don't have PCs accessible to most systems.  Firefox comes native on Linux.  If we could get a browser to work from another P5, that would eliminate a number of problems.
     
    My associate hooked the PC into the HMC2 (vs. HMC1) service processor port.  I haven't had a chance to check that with Firefox.
     
    We seem to have to jump through a number of hoops to get what seems like it should be a basic function to work.
     We can get to the service processor from a PC running Netscape.  When the system is hung, we have selected the "System Service Aids" --> "System Dump".  This shuts down the system and reboots.  No vmcore file appears in /var/crash/<date>.
     
    We think the system would have been  set up for kdump correctly.  We get vmcores when the system is not hung and use "ALT-sysrq-c" (kernel variable kernel.sysrq set) or when placing a "1" in /proc/sys/kernel/sysrq.
    (The button on the pop-out panel looks white to me (though most have black marks from being pushed with a pen).  If that's the "yellow" button, that hasn't worked so far.  Perhaps we didn't push it lightly enough.)
     
    Are we using the wrong service processor menu option?
  • hbabu
    hbabu
    8 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-10-22T20:30:55Z  
     We can get to the service processor from a PC running Netscape.  When the system is hung, we have selected the "System Service Aids" --> "System Dump".  This shuts down the system and reboots.  No vmcore file appears in /var/crash/<date>.
     
    We think the system would have been  set up for kdump correctly.  We get vmcores when the system is not hung and use "ALT-sysrq-c" (kernel variable kernel.sysrq set) or when placing a "1" in /proc/sys/kernel/sysrq.
    (The button on the pop-out panel looks white to me (though most have black marks from being pushed with a pen).  If that's the "yellow" button, that hasn't worked so far.  Perhaps we didn't push it lightly enough.)
     
    Are we using the wrong service processor menu option?
     Yes, Alt-Sysrq-c worked means kdump was setup properly.
     
    "System Service Aids" --> "System Dump' is used to take FSP (service processor) dump.
    As I mentioned above, can you try "System Service Aids" --> "Partition Dump" to take kdump for hang scenarios if the system is not used HMC.
    If the system is connected to the console, you can see the system will be booted to kdump kernel, taking the dump and reboot the system. 
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-10-22T20:43:42Z  
    • hbabu
    • ‏2012-10-22T20:30:55Z
     Yes, Alt-Sysrq-c worked means kdump was setup properly.
     
    "System Service Aids" --> "System Dump' is used to take FSP (service processor) dump.
    As I mentioned above, can you try "System Service Aids" --> "Partition Dump" to take kdump for hang scenarios if the system is not used HMC.
    If the system is connected to the console, you can see the system will be booted to kdump kernel, taking the dump and reboot the system. 
     Thanks.  I can see that I should have read your previous response more carefully.
     
    I noticed that there's a "Service Processor Dump" after the "System Dump".  Are these the same?
  • hbabu
    hbabu
    8 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-10-22T21:11:56Z  
     Thanks.  I can see that I should have read your previous response more carefully.
     
    I noticed that there's a "Service Processor Dump" after the "System Dump".  Are these the same?
     Here it is complete information on ASM interfaces:
     http://pic.dhe.ibm.com/infocenter/powersys/v3r1m5/topic/iphby/iphby.pdf
      (pages 44, 45, 46)
    System Dump:   to capture overall system information, system processor state, hardware scan rings, caches, and other information. This information can be used to resolve a hardware or server firmware problem.
     
    Service processor dump: can preserve error data after a service processor application failure, external reset, or user request for a service processor dump
     
    Partition dump:  By initiating a partition dump, you can preserve error data that can be used to diagnose server firmware or operating system problems. The state of the operating system is saved on the hard disk and the partition restarts. This function can be used when the operating system is in an abnormal wait state or endless loop.
  • robinwcox2
    robinwcox2
    11 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-10-23T01:48:56Z  
    • hbabu
    • ‏2012-10-22T21:11:56Z
     Here it is complete information on ASM interfaces:
     http://pic.dhe.ibm.com/infocenter/powersys/v3r1m5/topic/iphby/iphby.pdf
      (pages 44, 45, 46)
    System Dump:   to capture overall system information, system processor state, hardware scan rings, caches, and other information. This information can be used to resolve a hardware or server firmware problem.
     
    Service processor dump: can preserve error data after a service processor application failure, external reset, or user request for a service processor dump
     
    Partition dump:  By initiating a partition dump, you can preserve error data that can be used to diagnose server firmware or operating system problems. The state of the operating system is saved on the hard disk and the partition restarts. This function can be used when the operating system is in an abnormal wait state or endless loop.
     Your description doesn't sound like these options have anything to do with getting a crashdump.  I checked /sys/kernel/kexec_crash_loaded in the system I was working on and it always was 0.  After load kexec, after reboot.
     
    Thanks for the link.   I'll have to check if I've seen this one.
  • hbabu
    hbabu
    8 Posts

    Re: What's the best way to cause a system to create a crashdump in case of a system hang?

    ‏2012-10-23T06:38:09Z  
     Your description doesn't sound like these options have anything to do with getting a crashdump.  I checked /sys/kernel/kexec_crash_loaded in the system I was working on and it always was 0.  After load kexec, after reboot.
     
    Thanks for the link.   I'll have to check if I've seen this one.
     If 'cat /sys/kernel/kexec_crash_loaded' gives 0 means kdump kernel is not loaded. In this case, even Alt-Sysrq-c should not be successful taking the dump.
    Please run '/etc/sysconfig/kdump restart' and see whether kexec successfully loaded kdump kernel.