Topic
  • 22 replies
  • Latest Post - ‏2015-06-27T01:06:50Z by NSP_SLM
SystemAdmin
SystemAdmin
3234 Posts

Pinned topic SAS RAID Controller Failed - BladeCenter S

‏2013-02-28T11:13:59Z |
Hey,

i've got the huge problem with my SAS RAID controller. After Power failure it is failed and second one is working in survivor mode. Is there any way to recover it from this state? Official documentation says that it should be replaced but i really believe that there is a way to recover it from this state. Actual it is in failed mode and when I log into it every command gets "Unable to communicate with the controller".

I will be very pleased If you give me any hints what to do.
Updated on 2013-03-20T11:07:14Z at 2013-03-20T11:07:14Z by SystemAdmin
  • gez
    gez
    275 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-04T19:16:36Z  
    reboot/reseat the raid module
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-14T22:32:40Z  
    reseat the module. Then if it comes up with errors in the AMM, login and run shutdown -ctlr X restart where ctlr X is the controller in the offline mode.

    If this does not resolve it, it will need to be replaced. Good luck.
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-15T12:31:45Z  
    reseat the module. Then if it comes up with errors in the AMM, login and run shutdown -ctlr X restart where ctlr X is the controller in the offline mode.

    If this does not resolve it, it will need to be replaced. Good luck.
    Check ethernet connectivity between both controllers in slot 3 and 4 (they are using port 7 and 14 of ethernet switch in slot 1), try to set the failed controller to default (via AMM web interface, I/O modules advanced configuration) and try to reseat the failed controller.

    Check also controller battery if it is correctly inserted, try to reseat the battery.

    If nothing solve the issue you have to replace the failed controller.

    Tomas Petrzilka
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-16T15:42:52Z  
    I believe ports 7 and 14 come into play when you use the ICPM Ethernet module in bay 1. In that situation ports 7 and 14 need to be connected to your network switch on the same network and subnet as your AMM.

    Follow Josh's suggestion.

    Also have you looked at the event logs to determine the issue. Also look at the RAID controller logs via CLI or SCM to get more indepth information.
    Another point, have you verified the SAS module has the correct IP addresses for SAS switch and SAS RAID Controller? do this by going to AMM->IO modules Tasks->Configuration->Slot x. x being either 3 or 4. IP addresses are displayed.
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-18T11:13:13Z  
    Can you explain me how to check those network switch ports? I can't see ports 7 and 14 anywhere. It's highly possible that network problem cause my controller to fail. In switch web page I see now something disturbing that on port EXT4 i got status STP: Blocking should I be worried?

    Thank you for previous answers i hope that we could figure it out shortly.
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-18T12:39:25Z  
    Can you explain me how to check those network switch ports? I can't see ports 7 and 14 anywhere. It's highly possible that network problem cause my controller to fail. In switch web page I see now something disturbing that on port EXT4 i got status STP: Blocking should I be worried?

    Thank you for previous answers i hope that we could figure it out shortly.
    Ports 7 and 14 come into play when you use the ICPM. See:

    https://www-947.ibm.com/support/entry/myportal/docdisplay?lndocid=migr-5091833

    The BladeCenter S I/O module bays are unique. Below is port summary. Port 7 and 14 are assigned to IO bays 3 and 4 respectively.

    Bay 1 Bay 2 Bay 3 Bay 4
    Port 1 Blade 1 NIC A n/a Blade 1 n/a
    Port 2 Blade 2 NIC A n/a Blade 2 n/a
    Port 3 Blade 3 NIC A n/a Blade 3 n/a
    Port 4 Blade 4 NIC A n/a Blade 4 n/a
    Port 5 Blade 5 NIC A n/a Blade 5 n/a
    Port 6 Blade 6 NIC A n/a Blade 6 n/a
    Port 7 Switch Bay 3 n/a n/a n/a
    Port 8 Blade 1 NIC B n/a n/a Blade 1
    Port 9 Blade 2 NIC B n/a n/a Blade 2
    Port 10 Blade 3 NIC B n/a n/a Blade 3
    Port 11 Blade 4 NIC B n/a n/a Blade 4
    Port 12 Blade 5 NIC B n/a n/a Blade 5
    Port 13 Blade 6 NIC B n/a n/a Blade 6
    Port 14 Switch Bay 4 n/a n/a n/a

    On BladeCenter S the two onboard Ethernet connections of each blade server are mapped directly to I/O module bay 1. Because both default paths from the blade server are Ethernet, I/O module bay 1 can only support an Ethernet switch or pass-through module
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-18T14:31:25Z  
    I use Nortel Switch in Bay 1.

    What is strange, I can connect to SAS RAID Controller without a problem through Telnet. But all I get is Unable to communicate with controller.

    Controller is reachable at start when it got status Starting after use of list controller command. After a while it is unable to communicate with.
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-18T14:33:18Z  
    I use Nortel Switch in Bay 1.

    What is strange, I can connect to SAS RAID Controller without a problem through Telnet. But all I get is Unable to communicate with controller.

    Controller is reachable at start when it got status Starting after use of list controller command. After a while it is unable to communicate with.
    What state is the other controller in when that one falls out?
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-18T23:02:58Z  
    Second one is in Survivor mode. RAID works.
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-19T08:41:14Z  
    I use Nortel Switch in Bay 1.

    What is strange, I can connect to SAS RAID Controller without a problem through Telnet. But all I get is Unable to communicate with controller.

    Controller is reachable at start when it got status Starting after use of list controller command. After a while it is unable to communicate with.
    Ethernet ports 7 and 14 of the I/O module 1 are always "in game". RSSMs are always using them. When you have some configuration issues for example with the BNT
    switch in slot 1 it can also make some troubles with RSSMs. It is my experience. So you have to be sure, that the ethernet switch in bay 1 is working well.

    I had and issue, when there was unable to ping to RAID controllers, but SAS switches were accessible. Reseat of the BNT switch in bay one solved the issue.

    Probably it will be good idea to make a service window and try to reseat BNT and both RSSMs, probably you can swap them.
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-19T11:01:58Z  
    If I would reseat RAID controllers and change their placen i I/O slots, is there any danger that my RAID configuration and disks assignement would crash?
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-19T18:08:51Z  
    If I would reseat RAID controllers and change their placen i I/O slots, is there any danger that my RAID configuration and disks assignement would crash?
    Anything is possible. I'd power down your blades, stop all the IO, and backup your configurations first.
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-20T10:11:38Z  
    If I would reseat RAID controllers and change their placen i I/O slots, is there any danger that my RAID configuration and disks assignement would crash?
    Backup your data of course. The RAID configuration is written on drives, so there will be no problem with RAID modules replacement.

    Be carrefull with configuration backup via SCM = the RAID configuration can be saved there, but you can not simply restore the RAID configuration without RAID initialization = it means no data will be restored by this SCM functionality.
  • SystemAdmin
    SystemAdmin
    3234 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-03-20T11:07:14Z  
    What i checked today is that before I get info: "Unable to communicate with the controller. Please try again" after using any command through Telnet connection I get this:

    "Error : Failed to get time
    Reason: Controller has asserted. Log collection under progress"

    I look no on new firmware and tehere is something like this:

    Key Fixes
    * 47424: Fixed controller assert when processing ordered commands
    * 56507: Fixed controller assert when processing AIX clear ACA commands

    Can this be connected with my problem? Maybe controller is affected that this bug that is fixed in latest firmware? What do you think?
  • ivfibm
    ivfibm
    13 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-08-03T13:49:08Z  
    What i checked today is that before I get info: "Unable to communicate with the controller. Please try again" after using any command through Telnet connection I get this:

    "Error : Failed to get time
    Reason: Controller has asserted. Log collection under progress"

    I look no on new firmware and tehere is something like this:

    Key Fixes
    * 47424: Fixed controller assert when processing ordered commands
    * 56507: Fixed controller assert when processing AIX clear ACA commands

    Can this be connected with my problem? Maybe controller is affected that this bug that is fixed in latest firmware? What do you think?

    xxxx

    Hi!

    We have some problem
    (all blade server lost connection with both san hard disk)

    I try to update firmware on SAS Raid controller but after  I log into it
    every command gets "Unable to communicate with the controller".


    Controller is reachable at restart
    But  after a while(few seconds)  it is unable to communicate with them,
    and I can not update firmware



    One solution in this topic forum is
    "Check ethernet connectivity between both controllers in slot 3 and 4 (they are using port 7 and 14 of ethernet switch in slot 1)",


    My answer is
    Can you explain me how to check those network switch ports? I dont know how ?

    Thanks!

  • Josh_Corder
    Josh_Corder
    94 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-08-05T14:35:30Z  
    • ivfibm
    • ‏2013-08-03T13:49:08Z

    xxxx

    Hi!

    We have some problem
    (all blade server lost connection with both san hard disk)

    I try to update firmware on SAS Raid controller but after  I log into it
    every command gets "Unable to communicate with the controller".


    Controller is reachable at restart
    But  after a while(few seconds)  it is unable to communicate with them,
    and I can not update firmware



    One solution in this topic forum is
    "Check ethernet connectivity between both controllers in slot 3 and 4 (they are using port 7 and 14 of ethernet switch in slot 1)",


    My answer is
    Can you explain me how to check those network switch ports? I dont know how ?

    Thanks!

    Do you have the copper pass thru modules in the chassis or network switches?  ports 7 and 14 are only valid for the copper pass thru module.  Is either controller communicating?  If you can login to one or the other and run "list controller" and paste the output here, that will help in understanding if something is broken.

  • Guy Kempny
    Guy Kempny
    18 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-08-08T02:36:06Z  
    I believe ports 7 and 14 come into play when you use the ICPM Ethernet module in bay 1. In that situation ports 7 and 14 need to be connected to your network switch on the same network and subnet as your AMM.

    Follow Josh's suggestion.

    Also have you looked at the event logs to determine the issue. Also look at the RAID controller logs via CLI or SCM to get more indepth information.
    Another point, have you verified the SAS module has the correct IP addresses for SAS switch and SAS RAID Controller? do this by going to AMM->IO modules Tasks->Configuration->Slot x. x being either 3 or 4. IP addresses are displayed.

    Actually what Josh says here makes excellent sense. A power hit might have reset the failing controller to default settings. Hence look at its IP addresses in the AMM.

    Have you done this? How are you trying to communicate with the RAID controller cards?

    1. AMM

    2. CLi

    3. SCM

     

  • ivfibm
    ivfibm
    13 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-08-13T09:00:39Z  

    Do you have the copper pass thru modules in the chassis or network switches?  ports 7 and 14 are only valid for the copper pass thru module.  Is either controller communicating?  If you can login to one or the other and run "list controller" and paste the output here, that will help in understanding if something is broken.

    Hi!
     
    Thanks for your answer!
     
    I dont know if I have  copper pass thru modules
    in the chassis or network switches
    My network switch is:
    BNT Layer 2/3 Copper Gigabit Ethernet Switch Module
     
     
    If I try to acces controller with telnet I receive :
    "Unable to comunicate with the controller.Please try again"
    or
    "Tsal Status is TSALSTATERROR_TSAL_ASSERTED_COLLECTINGLOGS"
    (this is happen for both SAS Raid Controlller module)
     
    In AMM  I get the error:
    An error on I>o Module 3 was detected
    An error on I>o Module 4 was detected
     
    If I restart the controller from AMM
    the error from AMM  dissapear,
    and I can access with telnet the controller
    but only for few seccond,
    (its only time for run only 1 or 2 command from telnet)
    and after that
    error from AMM appear again
    and from telnet I get again :
    "Unable to comunicate with the controller.Please try again"
     
    Command alert -get don't show any error
    and comand  list controller show:
     
     
    |   Ctlr#    |   Controller|       Status         |  Ports  |  LUNs   |
    |_____ __|________________            |__________________
    |           0  |           Ctlr0|       STARTING|         1|            --|
    |           1  |           Ctlr1|            FAILED|          -|            --|
  • Josh_Corder
    Josh_Corder
    94 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-08-13T12:33:35Z  
    • ivfibm
    • ‏2013-08-13T09:00:39Z
    Hi!
     
    Thanks for your answer!
     
    I dont know if I have  copper pass thru modules
    in the chassis or network switches
    My network switch is:
    BNT Layer 2/3 Copper Gigabit Ethernet Switch Module
     
     
    If I try to acces controller with telnet I receive :
    "Unable to comunicate with the controller.Please try again"
    or
    "Tsal Status is TSALSTATERROR_TSAL_ASSERTED_COLLECTINGLOGS"
    (this is happen for both SAS Raid Controlller module)
     
    In AMM  I get the error:
    An error on I>o Module 3 was detected
    An error on I>o Module 4 was detected
     
    If I restart the controller from AMM
    the error from AMM  dissapear,
    and I can access with telnet the controller
    but only for few seccond,
    (its only time for run only 1 or 2 command from telnet)
    and after that
    error from AMM appear again
    and from telnet I get again :
    "Unable to comunicate with the controller.Please try again"
     
    Command alert -get don't show any error
    and comand  list controller show:
     
     
    |   Ctlr#    |   Controller|       Status         |  Ports  |  LUNs   |
    |_____ __|________________            |__________________
    |           0  |           Ctlr0|       STARTING|         1|            --|
    |           1  |           Ctlr1|            FAILED|          -|            --|

    In your config, the CPM modules are not in play.  It appears that the second RSSM is failing.  Try pulling it and see if you can access the RSSM in I/O Bay 3 afterwards.

  • ivfibm
    ivfibm
    13 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-08-20T06:00:53Z  

    In your config, the CPM modules are not in play.  It appears that the second RSSM is failing.  Try pulling it and see if you can access the RSSM in I/O Bay 3 afterwards.

    Hi!

    Sorry for late!

     

    Sorry!

     I don't understand if I have a  copper pass thru modules

    (my network switch is BNT Layer 2/3 Copper Gigabit Ethernet Switch Module)

    and

    I don't know how pulling the second RSSM

     

    Thanks!

  • Guy Kempny
    Guy Kempny
    18 Posts

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2013-09-24T02:30:39Z  
    • ivfibm
    • ‏2013-08-20T06:00:53Z

    Hi!

    Sorry for late!

     

    Sorry!

     I don't understand if I have a  copper pass thru modules

    (my network switch is BNT Layer 2/3 Copper Gigabit Ethernet Switch Module)

    and

    I don't know how pulling the second RSSM

     

    Thanks!

    1. Go to back of bladecenter.

    2. Located I/O bay 3.

    3. Remove the module from this bay. ie flip the tabs and pull out.

    4. Push back in and lock the tabs up.

    5. Wait for a period of time and review controller status.

  • NSP_SLM
    NSP_SLM
    1 Post

    Re: SAS RAID Controller Failed - BladeCenter S

    ‏2015-06-27T01:06:50Z  

    hey

    we are getting the same error "Tsal Status is TSALSTATERROR_TSAL_ASSERTED_COLLECTINGLOGS" for list controller cammand. There is no amber in blade chassis or any modules.