Linux problem analysis

Use this procedure to perform Linux® problem analysis.

If you experience a problem with your Linux system or logical partition, attempt to gather more information about the problem to either solve it, or to help your next level of support or your hardware service provider to solve it more quickly and accurately.
Note: While troubleshooting Linux, you might need to have available service tools. For details, see Obtaining service and productivity tools for Linux.

Keep the following in mind while troubleshooting Linux problems:

Check the following connections:

The server has never been partitioned and there is no HMC or Integrated Virtualization Manager

  1. Is the server turned on, or can you turn on your server?
    • No: Go to step 2.
    • Yes: Ensure that the server is turned on and then go to step 4.
  2. Perform the following steps to verify that the server is receiving power:
    1. If your server is protected by an emergency power off (EPO) circuit, check that the EPO switch is not activated.
    2. If you have an uninterruptible power supply, verify that the cables are correctly connected to the server, and that it is functioning correctly.
    3. When a good power source is connected to the server, one of the following occurs:
      • If you have a control panel, the Function/Data display on the control (operator) panel is illuminated.
      • If you do not have a control panel, the Bulk Power Controller system lights are illuminated.
  3. Is the control (operator) panel illuminated?
    • Yes: Start the server by pressing the power button on the control (operator) panel, and then go to step 4.
      Note: If the server stops with a reference code appearing in the Function/Data display on the control (operator) panel, record the reference code and any related information, and go to Reference codes list for customers. This ends the procedure.
    • No: There is a power problem. Verify that the power source to the server is functioning correctly (for example, the wall outlet is functioning correctly and the power cord is not damaged). If you cannot find a problem with the power source, contact your next level of support or your hardware service provider. This ends the procedure.
  4. Is the control (operator) panel displaying a reference code?
    • Yes: Continue with the next step.
    • No: Go to step 9.
  5. Is the Attention light on the control (operator) panel illuminated?
    • Yes: Go to step 9.
    • No: Continue with the next step.
  6. Are any additional messages (for example a device is not available or reporting errors) related to this problem displayed on the system console or sent to you in e-mail from the operating system?
    • Yes: Continue with the next step.
    • No: Contact your next level of support or your hardware service provider.
  7. Record any additional message information that is available from the control (operator) panel, attached displays, or e-mail from the operating system.
  8. If the additional message information contains recovery instructions, follow these instructions.
    Did this solve the problem?
    • Yes: This ends the procedure.
    • No: Continue with the next step.
  9. Is the operating system functioning?
    • Yes: Continue with the next step.
    • No: Perform the following steps:
      1. Refer to the ASMI's Error/Event Logs to obtain a list of error and event log entries. For details, see the Displaying error and event logs topic.
      2. Continue with step 11.
  10. Run the eServer™ stand-alone diagnostics in Problem Determination mode. For details, see Running the eServer stand-alone diagnostics from CD-ROM. Record any SRN information that is displayed or available through e-mail. When you run the eServer stand-alone diagnostics in Problem Determination mode, you are given the option to test the resources that the diagnostic programs find in your server. Be sure to check the list of available resources in your server to make sure that all resources that you know are installed are also available to be tested. If you find that a resource you know to be installed in your system is not available to be tested, record any information that is available about the missing resource, and check to ensure that the missing resource is installed correctly. If you cannot correct the problem with a missing resource, replace the missing resource (contact your service provider if necessary).
  11. Record all other reference codes (if any are displayed) that you are receiving on the control (operator) panel. See Collecting reference codes and system information for details.
  12. Go to the Reference codes list for customers.

The server has been partitioned and there is an HMC or Integrated Virtualization Manager.

If you have an HMC, it must be attached and functioning correctly.

  1. Choose from the following options:
    • If you have an HMC, ensure you performed the steps in Beginning problem analysis. Then return here if you are directed to do so.
    • If you are using an Integrated Virtualization Manager, continue with the next step.
  2. Can you start the server and at least one logical partition on your server?
    • No: Go to step 3.
    • Yes: Go to step 5.
  3. Perform the following steps to verify that the server is receiving power:
    1. If your server is protected by an emergency power off (EPO) circuit, check that the EPO switch is not activated.
    2. If you have an uninterruptible power supply, verify that the cables are correctly connected to the server, and that it is functioning correctly.
    3. When a good power source is connected to the server, one of the following occurs:
      • If you have a control panel, the Function/Data display on the control (operator) panel is illuminated.
      • If you do not have a control panel, the Bulk Power Controller system lights are illuminated.
  4. Is the control (operator) panel or Bulk Power Controller illuminated?
    • No: There is a power problem. Verify that the power source to the server is functioning correctly (for example, the wall outlet is functioning correctly and the power cord is not damaged). If you cannot find a problem with the power source, contact your next level of support or your hardware service provider. This ends the procedure.
    • Yes: Start the server.
      Note:
      If the server stops with a reference code appearing in the Function/Data display on the control (operator) panel, or on the HMC, or on the Integrated Virtualization Manager, record the reference code and any related information, and go to the Reference codes list for customers for further information. This ends the procedure.
  5. Is the server's control (operator) panel, HMC, or Integrated Virtualization Manager displaying function 11?
    Note: If you are using the control panel, use the increment or decrement buttons to cycle through the functions to determine if function 11 exists. You can alternate between the function number and the data by pressing Enter. For details, see Collecting reference codes and system information.
    • Yes: Go to step 10.
    • No: Continue with the next step.
  6. Is the system attention light on?
    • Yes: Go to step 10.
    • No: Continue with the next step.
  7. Did you receive a message related to this problem either through the mail function or shown on the HMC or Integrated Virtualization Manager?
    • Yes: Continue with the next step.
    • No: Contact your next level of support or your hardware service provider.
  8. Record the additional message information on the problem reporting form. For details, see Using the problem reporting forms. Then follow the recovery instructions on the Additional Message Information display. Did this solve the problem?
    • Yes: This ends the procedure.
    • No: Continue with the next step.
  9. Record any SRN information that is displayed or available through e-mail. If you do not have any SRN information, run the eServer stand-alone diagnostics in Problem Determination mode. For details, see Running the eServer stand-alone diagnostics from CD-ROM and perform any repair actions.
  10. Perform the following:
    1. Record all the reference codes that you are receiving on the control (operator) panel, the HMC, or the Integrated Virtualization Manager. For details, see Collecting reference codes and system information.
    2. Go to the Reference codes list for customers.