AIX problem analysis

Use this procedure to perform AIX® problem analysis.

If you experience a problem with your AIX server or logical partition, you should attempt to gather more information about the problem to either solve it, or to help your next level of support or your hardware service provider to solve it more quickly and accurately.

Keep the following in mind while troubleshooting AIX server problems :
  • Has there been an external power outage or momentary power loss?
  • Has the hardware configuration changed?
  • Has server software been added?
  • Have any new programs or program updates been installed recently?
Check the following connections:
  • Verify that the power cord is plugged in.
  • Verify that all your cables are attached securely.

The server has never been partitioned and there is no HMC or Integrated Virtualization Manager

  1. Is the server turned on, or can you turn on your server?
    • No: Go to step 2.
    • Yes: Ensure that the server is turned on and then go to step 4.
  2. Perform the following steps to verify that the server is receiving power:
    1. If your server is protected by an emergency power off (EPO) circuit, check that the EPO switch is not activated.
    2. If you have an uninterruptible power supply, verify that the cables are correctly connected to the server, and that it is functioning correctly.
    3. When a good power source is connected to the server, one of the following occurs:
      • If you have a control panel, the Function/Data display on the control (operator) panel is illuminated.
      • If you do not have a control panel, the Bulk Power Controller system lights are illuminated.
  3. Is the control (operator) panel illuminated?
    • Yes: Start the server by either pressing the power button on the control (operator) panel, and then go to step 4.
      Note: If the server stops with a reference code appearing in the Function/Data display on the control (operator) panel, record the reference code and any related information, and go to Reference codes list for customers. This ends the procedure.
    • No: There is a power problem. Verify that the power source to the server is functioning correctly (for example, the wall outlet is functioning correctly and the power cord is not damaged). If you cannot find a problem with the power source, contact your next level of support or your hardware service provider. This ends the procedure.
  4. Is the control (operator) panel blank?
    • Yes: Go to step 9.
    • No: Continue with the next step.
  5. Is the Attention light on the control (operator) panel illuminated?
    • Yes: Go to step 9.
    • No: Continue with the next step.
  6. Are any additional messages related to this problem displayed on the system console or sent to you in e-mail from the operating system?
    • Yes: Continue with the next step.
    • No: Contact your next level of support or your hardware service provider.
  7. Record any additional message information that is available from the control (operator) panel, attached displays, or e-mail from the operating system.
  8. If the additional message information contains recovery instructions, follow these instructions.
    Did this solve the problem?
    • Yes: This ends the procedure.
    • No: Continue with the next step.
  9. Is the operating system functioning?
    • Yes: Continue with the next step.
    • No: Perform the following steps:
      1. Obtain a list of error and event log entries from the ASMI's Error/Event Logs. For details, see the Displaying error and event logs topic.
      2. Continue with step 11.
  10. Record any SRN information that is displayed or available through e-mail.
    Note: If you have not found an SRN, it is possible to display an SRN using the operating system. Perform the following to display previous diagnostic results from online diagnostics in concurrent mode:
    1. Log in to the AIX operating system as root user, or use CE login. If you need help, contact the system administrator.
    2. Enter the diag command to load the diagnostic controller, and display the online diagnostic menus.
    3. At the Function selection menu, select Task selection.
    4. From the Task selection list menu, select Display previous diagnostic results.
    5. From the Previous diagnostic results menu, select Display diagnostic log summary.

      A Display diagnostic log will be shown with a time ordered table of events from the error log. Look in the T column for the most recent entry that has an S entry. Press Enter to select the row in the table and then select Commit. The details of this entry from the table will be displayed; look for the SRN entry shown near the end of the entry and record the information shown.

  11. Record all other reference codes (if any are displayed) that you are receiving on the control (operator) panel. See Collecting reference codes and system information for details.
  12. Go to the Reference codes list for customers.

The server has been partitioned and there is an HMC or an Integrated Virtualization Manager.

If you have an HMC, it must be attached and functioning correctly.

  1. Choose from the following options:
    • If you have an HMC, ensure you performed the steps in Beginning problem analysis. Then return here if you are directed to do so.
    • If you are using an Integrated Virtualization Manager, continue with the next step.
  2. Can you start the server and at least one logical partition on your server?
    • No: Go to step 3.
    • Yes: Go to step 5.
  3. Perform the following steps to verify that the server is receiving power:
    1. If your server is protected by an emergency power off (EPO) circuit, check that the EPO switch is not activated.
    2. If you have an uninterruptible power supply, verify that the cables are correctly connected to the server, and that it is functioning correctly.
    3. When a good power source is connected to the server, one of the following occurs:
      • If you have a control panel, the Function/Data display on the control (operator) panel is illuminated.
      • If you do not have a control panel, the Bulk Power Controller system lights are illuminated.
  4. Is the control (operator) panel or Bulk Power Controller illuminated?
    • No: There is a power problem. Verify that the power source to the server is functioning correctly (for example, the wall outlet is functioning correctly and the power cord is not damaged). If you cannot find a problem with the power source, contact your next level of support or your hardware service provider. This ends the procedure.
    • Yes: Start the server.
      Note:
      If the server stops with a reference code appearing in the Function/Data display on the control (operator) panel, or on the HMC, or on the Integrated Virtualization Manager, record the reference code and any related information, and go to the Reference codes list for customers for further information. This ends the procedure.
  5. Is the server's control (operator) panel, the HMC, or Integrated Virtualization Manager displaying function 11?
    Note: If you are using the control panel, use the increment or decrement buttons to cycle through the functions to determine if function 11 exists. You can alternate between the function number and the data by pressing Enter. For details, see Collecting reference codes and system information.
    • Yes: Go to step 9.
    • No: Continue with the next step.
  6. Is the system attention light on?
    • Yes: Go to step 9.
    • No: Continue with the next step.
  7. Did you receive a message related to this problem either through the mail function or shown on the HMC or Integrated Virtualization Manager?
    • Yes: Continue with the next step.
    • No: Contact your next level of support or your hardware service provider.
  8. Record the additional message information on the problem reporting form. For details, see Using the problem reporting forms. Then follow the recovery instructions on the Additional Message Information display. Did this solve the problem?
    • Yes: This ends the procedure.
    • No: Continue with the next step.
  9. Perform the following:
    1. Record all the reference codes that you are receiving on the control (operator) panel, the HMC, or the Integrated Virtualization Manager. For details, see Collecting reference codes and system information.
    2. Go to the Reference codes list for customers.