Topic
  • 5 replies
  • Latest Post - ‏2011-03-22T16:46:59Z by Miescha
Miescha
Miescha
6 Posts

Pinned topic x235 ServeRAID 6i: Boot error after RAID 1 disk rebuild ????

‏2011-03-22T00:13:52Z |
IBM x235 Server

ServeRAID 6i Controller
6 146gb physical drives installed
Setup as 3 logical drives / 3 arrays with RAID 1 mirroring
BIOS 7.12.13
FIRMWARE 7.12.13
DEVICE DRIVER 7.12.02 (?????)

User received i9990301 DISK FAILURE OR DISK RESET FAILED error
User pulled physical drive 0 (bottom-most drive)and rebooted.
Physical drive 0 is first drive in logical disk 1 / array A.
User states prompted to mark pulled drive defunct and did so.

User got i9990301 error.
User replaced drive 0 and pulled drive 1 (other drive in logical disk 1 / array A).
User states prompted to mark pulled drive defunct and did so.
User got i9990301 error.

User called me - I booted to IBM ServeRAID Manager CD and discovered mismatched bios/firmware/device driver versions (but doesn't seem to be causing a problem previously). I figure I will resolve that later with newer version 7.12.14 versions for all.

ServeRAID Manager showed array A offline and both physical drive 0 and 1 defunct / offline. I powered down, pulled both drives, rebooted, rescanned, inserted drive 1 and set it online, then inserted drive 0 and nothing.

I repeated but inserted drive 0 and set it online, then inserted drive 1 and rebuild started nearly immediately.

If memory serves me, I cannot reboot until the rebuild is complete - correct?

Also, even if the rebuild is not successful (the drive is bad) I'm thinking the system should boot and run but with this drive defunct/offline so Array A will show as critical or degraded until I get a replacement drive in place (probably tomorrow) and a good rebuild - correct?

Is there something else I'm missing that could be causing the i9990301 error and boot failure?

Is this the proper way to replace a bad/critical drive given that the user had already marked both drives as defunct?

What is the best practice for replacing a critical drive before it is marked as defunct?Enough questions - thanks for any assistance.

-Miescha
Updated on 2011-03-22T16:46:59Z at 2011-03-22T16:46:59Z by Miescha
  • Miescha
    Miescha
    6 Posts

    Re: x235 ServeRAID 6i: Boot error after RAID 1 disk rebuild ????

    ‏2011-03-22T02:08:31Z  
    UPDATE:

    Rebuild completed successfully and SR Manager shows all drives, all arrays OK.

    However, on reboot, still get i9990301 disk failure error and cannot boot to HDD (any).

    Bios shows Planar SCSI as boot device.

    Tried removing SR6i with reboot, reset bios to default settings and reboot, then reinstall SR6i and reboot. No luck.

    Error log files shows scsi drive missing or non-functioning at various points today (likely when removed or marked defunct) but last entry shows all working.

    Do I need to copy the config from the drives to the controller???? If so, why? And if so, how long does the copy config from drives to controller process take to complete?

    Thanks all.

    -Miescha
  • Miescha
    Miescha
    6 Posts

    Re: x235 ServeRAID 6i: Boot error after RAID 1 disk rebuild ????

    ‏2011-03-22T03:49:01Z  
    • Miescha
    • ‏2011-03-22T02:08:31Z
    UPDATE:

    Rebuild completed successfully and SR Manager shows all drives, all arrays OK.

    However, on reboot, still get i9990301 disk failure error and cannot boot to HDD (any).

    Bios shows Planar SCSI as boot device.

    Tried removing SR6i with reboot, reset bios to default settings and reboot, then reinstall SR6i and reboot. No luck.

    Error log files shows scsi drive missing or non-functioning at various points today (likely when removed or marked defunct) but last entry shows all working.

    Do I need to copy the config from the drives to the controller???? If so, why? And if so, how long does the copy config from drives to controller process take to complete?

    Thanks all.

    -Miescha
    UPDATE 2

    All have the SR6i controller.

    So . . . I'm starting to wonder (this is where it gets dangerous around midnight) . . . why can't I just swap one of the other SR6i controllers into this unit, copy the config from the drives over to the controller, and boot - just as a way to test that the controller has not gone bad?

    Better still, why not just move the first drive of each array to one of the other x235 units, make sure all the BIOS/Firmware/Driver versions are updated to 7.12.13, copy the config from the drives back to the controller, and BOOM boot from these machines. Of course, actually running one of these for the DC would require changes to show the new network card drive, and there may be other DHCP issues, and more I'm not thinking about right now.

    So I'd really rather just get the darn original x235 up and running before I fall asleep (or morning work rolls around), but are there any other options I should consider given the test options I have available with these other 4 units?

    AGain, thanks for any input.

    -Miescha
  • Miescha
    Miescha
    6 Posts

    Re: x235 ServeRAID 6i: Boot error after RAID 1 disk rebuild ?

    ‏2011-03-22T03:52:07Z  
    UPDATE 2

    All have the SR6i controller.

    So . . . I'm starting to wonder (this is where it gets dangerous around midnight) . . . why can't I just swap one of the other SR6i controllers into this unit, copy the config from the drives over to the controller, and boot - just as a way to test that the controller has not gone bad?

    Better still, why not just move the first drive of each array to one of the other x235 units, make sure all the BIOS/Firmware/Driver versions are updated to 7.12.13, copy the config from the drives back to the controller, and BOOM boot from these machines. Of course, actually running one of these for the DC would require changes to show the new network card drive, and there may be other DHCP issues, and more I'm not thinking about right now.

    So I'd really rather just get the darn original x235 up and running before I fall asleep (or morning work rolls around), but are there any other options I should consider given the test options I have available with these other 4 units?

    AGain, thanks for any input.

    -Miescha
  • Miescha
    Miescha
    6 Posts

    Re: x235 ServeRAID 6i: Parts swap with other x235 ????

    ‏2011-03-22T12:08:29Z  
    UPDATE 2 - Corrected.

    Sorry about the double post - it should have been a single post with more info. Somehow, the first half of the post got cut off and the second half posted twice. No doubt operator error due to lack of sleep. Again, my apologies.

    The first half of the post merely stated that this location has four identical x235 machines which led me to wonder in the wee hours of the night (a very dangerous thng to do for me :-). My thinking was that I could swap around controllers and/or drives in an effort to either (1) get the current machine working with a different controller or (2) get the dataset working on a different machine (though I think this would involve a lot of other setting changes as this is the DC and DHCP controller - but maybe just changing the ip address to match that of the old machine would do it.

    As mentioned in the second part of this post (now above - twice) all four x235 units have the same SR6i controller. All have dual Xenon 3.2ghz chips and 6gb DDR1 memory. Only the 2008 machine has any additional hard drives.

    I'm thinking I should be able to swap some controllers for testing purposes if nothing else.

    -Miescha
  • Miescha
    Miescha
    6 Posts

    Re: x235 ServeRAID 6i: Parts swap with other x235 ????

    ‏2011-03-22T16:46:59Z  
    • Miescha
    • ‏2011-03-22T12:08:29Z
    UPDATE 2 - Corrected.

    Sorry about the double post - it should have been a single post with more info. Somehow, the first half of the post got cut off and the second half posted twice. No doubt operator error due to lack of sleep. Again, my apologies.

    The first half of the post merely stated that this location has four identical x235 machines which led me to wonder in the wee hours of the night (a very dangerous thng to do for me :-). My thinking was that I could swap around controllers and/or drives in an effort to either (1) get the current machine working with a different controller or (2) get the dataset working on a different machine (though I think this would involve a lot of other setting changes as this is the DC and DHCP controller - but maybe just changing the ip address to match that of the old machine would do it.

    As mentioned in the second part of this post (now above - twice) all four x235 units have the same SR6i controller. All have dual Xenon 3.2ghz chips and 6gb DDR1 memory. Only the 2008 machine has any additional hard drives.

    I'm thinking I should be able to swap some controllers for testing purposes if nothing else.

    -Miescha
    SYSTEM NOW WORKING (MOSTLY)

    After some coffee and clear(er) thinking, I opted not to start swapping around harware among the various x235 servers.

    On the original 'problem' server, I was unable to get matching bios/firmware/device drivers (still don't know why) so I 'downgraded' the controller bios and firmware to 7.12.02 to match the device driver (I know, always a bad idea to downgrade). This alone did nothing.

    I then removed the controller, reset the sytem BIOS to default settings and replaced the controller - again, nothing.

    I then removed the controller, reflashed the system BIOS to v1.17 and replaced the controller - SNAP, system booted to Windows SErver 2003.

    There is at least one problem in that the primary data base will not start due to a mismatch of create dates on the database - which I suspect occurred during the defunct drive rebuild. I'm working on this now.

    As for long term, the controller firmware, bios and device driver need to be updated to 7.12.14 (though I can only find 7.12.02 for the device driver at this point). I'm told the hard drives have firmware that should be updated - not certain on that process, but I'll look into it.

    I suspect the adaptec and LSI drivers need updating also.

    For now, the system is at least up and running - save for the date error on the database of the rebuilt HDD.

    Hope this might help somone else - and I'm still open to suggestions if anyone has them.

    -Miescha