Problems with loading and starting the operating system (AIX and Linux)

If the system is running partitions from partition standby (LPAR), the following procedure addresses the problem in which one partition does not boot AIX® or Linux® while other partitions boot successfully and run the operating system successfully.

About this task

It is the customer's responsibility to move devices between partitions. If a device must be moved to another partition to run stand-alone diagnostics, contact the customer or system administrator. If the optical drive must be moved to another partition, all SCSI devices that are connected to that SCSI adapter must be moved because moves are done at the slot level, not at the device level.

Depending on the boot device, a checkpoint might be displayed on the operator panel for an extended period while the boot image is retrieved from the device. This is particularly true for tape and network boot attempts. If you are booting from an optical drive or tape drive, watch for activity on the drive's LED indicator. A flashing LED indicates that the loading of either the boot image or additional information that is required by the operating system that is being booted is still in progress. If the checkpoint is displayed for an extended period and the drive LED is not indicating any activity, there might be a problem with loading the boot image from the device.

Notes:
  1. For network boot attempts, if the system is not connected to an active network or if the target server is inaccessible (which can also result from incorrect IP parameters), the system still attempts to boot. Because time-out durations are necessarily long to accommodate retries, the system might appear to be hung. Refer to checkpoint CA00 E174.
  2. If the partition hangs with a 4-character checkpoint in the display, the partition must be deactivated, then reactivated before you attempt to reboot.
  3. If a BA06 000x error code is reported, the partition is already deactivated and in the error state. Reboot by activating the partition. If the reboot is still not successful, go to step 3.

This procedure assumes that a diagnostic CD-ROM and an optical drive from which it can be booted are available, or that diagnostics can be run from a NIM (Network Installation Management) server. Booting the diagnostic image from an optical drive or a NIM server is referred to as running stand-alone diagnostics.

Procedure

  1. Is a management console attached to the managed system?
    • Yes: Continue with the next step.
    • No: Go to step 3.
  2. Look at the service action event error log on the management console.
    Complete the actions necessary to resolve any open entries that affect devices in the boot path of the partition or that indicate problems with I/O cabling. Then, try to reboot the partition. Does the partition reboot successfully?
    • Yes: This ends the procedure.
    • No: Continue with the next step.
  3. Boot to the SMS main menu. Then, choose from the following options:
    • If you are rebooting a partition from partition standby (LPAR), go to the properties of the partition and select Boot to SMS, then activate the partition.
    • If you are rebooting from platform standby:
      • For 9080-HEX systems, access the ASMI. See Setting up and accessing the ASMI. In the navigation area, expand Power/Restart Control, then Power On/Off System. In the AIX/Linux partition mode boot box, select Boot to SMS menu > Save Settings and Power On.
      • For all other systems, access the ASMI. See Logging on to the ASMI GUI. In the navigation area, expand Operations > Server power operations, and click Power Off. In the AIX/Linux partition mode boot box, select Boot to SMS menu and click Save.

    At the SMS main menu, select Select Boot Options and verify whether the intended boot device is correctly specified in the boot list. Is the intended load device correctly specified in the boot list?

    • Yes: Complete the following steps:
      1. Remove all removable media from devices in the boot list from which you do not want to load the operating system.
      2. If you are attempting to load the operating system from a network, go to step 4.
      3. If you are attempting to load the operating system from a disk drive or an optical drive, go to step 7.
      4. No: Go to step 5.
  4. If you are attempting to load the operating system from the network, complete the following steps:
    • Verify that the IP parameters are correct.
    • Use the SMS ping utility to attempt to ping the target server. If the ping is not successful, have the network administrator verify the server configuration for this client.
    • Check with the network administrator to ensure that the network is up. Also, ask the network administrator to verify the settings on the server from which you are trying to load the operating system.
    • Check the network cabling to the adapter.

    Restart the partition and try loading the operating system. Does the operating system load successfully?

    • Yes: This ends the procedure.
    • No: Go to step 7.
  5. Use the SMS menus to add the intended boot device to the boot sequence.
    Can you add the device to the boot sequence?
    • Yes: Restart the partition. This ends the procedure.
    • No: Continue with the next step.
  6. Ask the customer or system administrator to verify that the device you are trying to load from is assigned to the correct partition.
    Then, select List All Devices and record the list of bootable devices that displays. Is the device from which you want to load the operating system in the list?
    • Yes: Go to step 7.
    • No: Go to step 10.
  7. Try to load and run stand-alone diagnostics against the devices in the partition, particularly against the boot device from which you want to load the operating system.
    You can run stand-alone diagnostics from an optical drive or a NIM server. To boot stand-alone diagnostics, follow the detailed procedures in Running the online and stand-alone diagnostics.
    Note: When you attempt to load diagnostics on a partition from partition standby, the device from which you are loading stand-alone diagnostics must be made available to the partition that is not able to load the operating system, if it is not already in that partition. Contact the customer or system administrator if a device must be moved between partitions to load stand-alone diagnostics.

    Did stand-alone diagnostics load and start successfully?

    • Yes: Go to step 8.
    • No: Go to step 14.
  8. Was the intended boot device present in the output of the option Display Configuration and Resource List that is run from the Task Selection menu?
    • Yes: Continue with the next step.
    • No: Go to step 10.
  9. Did running diagnostics against the intended boot device result in a No Trouble Found message?
    • Yes: Go to step 12.
    • No: Go to the list of service request numbers and complete the repair actions for the SRN reported by the diagnostics. After you complete the repair actions, go to step 13.
  10. Complete the following actions:
    1. Complete the first item in the action list below. In the list of actions below, choose SCSI or IDE based on the type of device from which you are trying to boot the operating system.
    2. Restart the system or partition.
    3. Stop at the SMS menus and select Select Boot Options.
    4. Is the device that was not appearing previously in the boot list now present?
      • Yes: Go to Verifying a repair. This ends the procedure.
      • No: Perform the next item in the action list and then return to step 10.b. If no more items are in the action list, go to step 11.
    Action list:
    Note: See Part locations and location codes for part numbers and links to exchange procedures.
    1. Verify that the SCSI or IDE cables are properly connected. Also, verify that the device configuration and address jumpers are set correctly.
    2. Choose from the following options:
      • SCSI boot device: If you are attempting to boot from a SCSI device, remove all hot-swap disk drives (except the intended boot device, if the boot device is a hot-swap drive).If the boot device is present in the boot list after you boot the system to the SMS menus, add the hot-swap disk drives back in one at a time, until you isolate the failing device.
      • IDE boot device: If you are attempting to boot from an IDE device, disconnect all other internal SCSI or IDE devices. If the boot device is present in the boot list after you boot the system to the SMS menus, reconnect the internal SCSI or IDE devices one at a time, until you isolate the failing device or cable.
    3. Replace the SCSI or IDE cables.
    4. Replace the SCSI backplane (or IDE backplane, if present) to which the boot device is connected.
    5. Replace the intended boot device.
    6. Replace the system backplane.
  11. Choose from the following options:
  12. Have you disconnected any other devices?
    • Yes: Reinstall each device that you disconnected, one at a time. After you reinstall each device, reboot the system. Continue this procedure until you isolate the failing device. Replace the failing device. Then, go to step 13.
    • No: Perform an operating system-specific recovery process or reinstall the operating system. This ends the procedure.
  13. Is the problem corrected?
  14. Is a SCSI boot failure (where you cannot boot from a SCSI-attached device) also occurring?
  15. Complete the following actions to determine whether another adapter is causing the problem:
    1. Remove all adapters except the one to which the optical drive is attached and the one used for the console.
    2. Reload the stand-alone diagnostics. Can you successfully reload the stand-alone diagnostics?
      • Yes: Complete the following steps:
        1. Reinstall the adapters that you removed (and attach devices as applicable) one at a time. After you reinstall each adapter, try the boot operation again until the problem recurs.
        2. Replace the adapter or device that caused the problem.
        3. Go to Verifying a repair. This ends the procedure.
      • No: Continue with the next step.
  16. The graphics adapter (if installed), optical drive, IDE or SCSI cable, or system board is most likely defective.
    Is a PCI graphics adapter installed in the system?
    • Yes: Continue with the next step.
    • No: Go to step 18.
  17. Complete the following steps to determine whether the graphics adapter is causing the problem:
    1. Remove the graphics adapter.
    2. Attach a TTY terminal to the system port.
    3. Try to reload stand-alone diagnostics. Do the stand-alone diagnostics load successfully?
      • Yes: Replace the graphics adapter. This ends the procedure.
      • No: Continue with the next step.
  18. Replace the following (if not already replaced), one at a time, until the problem is resolved:
    1. Optical drive
    2. IDE or SCSI cable that goes to the optical drive.
    3. System board that contains the integrated SCSI or IDE adapters.

    If this resolves the problem, go to Verifying a repair. If the problem still persists or if the previous descriptions did not address your particular situation, go to PFW1548: Memory and processor subsystem problem isolation procedure.

    This ends the procedure.