Topic
  • 14 replies
  • Latest Post - ‏2011-10-04T20:14:20Z by Brian_King
jnendel
jnendel
1 Post

Pinned topic Monitoring Hardware RAID

‏2009-12-08T15:38:53Z |
We have a very simple installation of RedHat 4 on a P-series server but have not been able to find a way of monitoring the hardware RAID. There is software which came with the server but only had installations for i386 or EM64T architecture. HELP!!!
Updated on 2011-10-04T20:14:20Z at 2011-10-04T20:14:20Z by Brian_King
  • SystemAdmin
    SystemAdmin
    706 Posts

    Re: Monitoring Hardware RAID

    ‏2009-12-11T16:29:36Z  
    Hello jnendel,
    A few questions:
    1. Can you provide the output of "uname -a" and /etc/issue ?
    2. What type of card are you using?
    3. What type of POWER system are you using?

    Thanks!
  • jnendel2
    jnendel2
    8 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-03T18:54:09Z  
    Hello jnendel,
    A few questions:
    1. Can you provide the output of "uname -a" and /etc/issue ?
    2. What type of card are you using?
    3. What type of POWER system are you using?

    Thanks!
    Linux Oracle.bibleonstage.com 2.6.9-100.EL #1 SMP Tue Feb 1 12:10:14 EST 2011 ppc64 ppc64 ppc64 GNU/Linux

    Kernel \r on an \m

    RAID card: IBM Model 7031-D24/T24

    P-series
  • Brian_King
    Brian_King
    20 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-03T19:30:26Z  
    • jnendel2
    • ‏2011-10-03T18:54:09Z
    Linux Oracle.bibleonstage.com 2.6.9-100.EL #1 SMP Tue Feb 1 12:10:14 EST 2011 ppc64 ppc64 ppc64 GNU/Linux

    Kernel \r on an \m

    RAID card: IBM Model 7031-D24/T24

    P-series
    Can you post the output of these two commands as well?

    lspci

    grep system_type /proc/ppc64/lparcfg

    Thanks,

    Brian
  • jnendel2
    jnendel2
    8 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T13:10:37Z  
    Can you post the output of these two commands as well?

    lspci

    grep system_type /proc/ppc64/lparcfg

    Thanks,

    Brian
    Appreciate your help. The P-series is a 550.

    00:01.0 RAID bus controller: IBM Citrine chipset SCSI controller (rev 11)
    0001:00:02.0 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0001:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0001:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0001:00:02.6 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0001:c0:01.0 SCSI storage controller: Mylex Corporation AcceleRAID 600/500/400/Sapphire support Device (rev 04)
    0001:d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01)
    0002:00:02.0 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0002:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0002:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0002:00:02.6 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0002:c0:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
    0002:c0:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
    0002:c8:01.0 USB Controller: NEC Corporation USB (rev 43)
    0002:c8:01.1 USB Controller: NEC Corporation USB (rev 43)
    0002:c8:01.2 USB Controller: NEC Corporation USB 2.0 (rev 04)

    system_type=IBM,9133-55A
  • Brian_King
    Brian_King
    20 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T14:03:18Z  
    • jnendel2
    • ‏2011-10-04T13:10:37Z
    Appreciate your help. The P-series is a 550.

    00:01.0 RAID bus controller: IBM Citrine chipset SCSI controller (rev 11)
    0001:00:02.0 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0001:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0001:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0001:00:02.6 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0001:c0:01.0 SCSI storage controller: Mylex Corporation AcceleRAID 600/500/400/Sapphire support Device (rev 04)
    0001:d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01)
    0002:00:02.0 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0002:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0002:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0002:00:02.6 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
    0002:c0:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
    0002:c0:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
    0002:c8:01.0 USB Controller: NEC Corporation USB (rev 43)
    0002:c8:01.1 USB Controller: NEC Corporation USB (rev 43)
    0002:c8:01.2 USB Controller: NEC Corporation USB 2.0 (rev 04)

    system_type=IBM,9133-55A
    It looks like you are using an ipr based storage controller. A package of management utilities are included with your Linux distribution in the iprutils package. The iprconfig command will allow you to see the current device and array status, create RAID arrays, and perform other RAID management activities. This utility also provides a command line interface, which allows for use in a scripted environment. Refer to the iprutils man page for further details on this tool as well as the command line syntax.

    Any time the ipr RAID adapter detects a serviceable event, an error will be logged to the kernel dmesg log with details of the error, so monitoring /var/log/messages may suffice for what you are looking to accomplish.

    A wealth of information on this family of RAID adapters is provided in the "PCI-X SCSI RAID Controller Reference Guide for Linux", which can be found here:

    http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/hardware_docs/pdf/231327.pdf

    Since you are running RHEL 4, you will want to look at Part 1 of the document, which is the section for Linux distributions based on a 2.6 Linux kernel.

    Refer to Chapter 5 for an explanation of the errors that can be logged by the RAID adapters along with recommended service actions for the various errors.

    Thanks,

    Brian
  • jnendel2
    jnendel2
    8 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T16:00:42Z  
    It looks like you are using an ipr based storage controller. A package of management utilities are included with your Linux distribution in the iprutils package. The iprconfig command will allow you to see the current device and array status, create RAID arrays, and perform other RAID management activities. This utility also provides a command line interface, which allows for use in a scripted environment. Refer to the iprutils man page for further details on this tool as well as the command line syntax.

    Any time the ipr RAID adapter detects a serviceable event, an error will be logged to the kernel dmesg log with details of the error, so monitoring /var/log/messages may suffice for what you are looking to accomplish.

    A wealth of information on this family of RAID adapters is provided in the "PCI-X SCSI RAID Controller Reference Guide for Linux", which can be found here:

    http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/hardware_docs/pdf/231327.pdf

    Since you are running RHEL 4, you will want to look at Part 1 of the document, which is the section for Linux distributions based on a 2.6 Linux kernel.

    Refer to Chapter 5 for an explanation of the errors that can be logged by the RAID adapters along with recommended service actions for the various errors.

    Thanks,

    Brian
    Thanks. I knew about those utilities but I thought they only worked when you were in the console interface prior to booting Linux. One reason for concluding that is I see not a single entry in the ipr error log yet the utility says "degraded" for every RAID array. Should I be concerned?
  • Brian_King
    Brian_King
    20 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T16:19:15Z  
    • jnendel2
    • ‏2011-10-04T16:00:42Z
    Thanks. I knew about those utilities but I thought they only worked when you were in the console interface prior to booting Linux. One reason for concluding that is I see not a single entry in the ipr error log yet the utility says "degraded" for every RAID array. Should I be concerned?
    Be sure to check /var/log/dmesg as well, since any errors logged prior to syslogd starting will only be visible there. There are multiple potential causes for an array status of degraded. A failed drive is one possible cause. If this is the case, iprconfig should report the status of that disk as Failed. Another possible cause is related to the adapter's write cache. Ensure the iprinit daemon is running:

    service iprinit status

    If it is not, it should be started to ensure optimal performance:

    service iprinit start
    chkconfig iprinit on

    Thanks,

    Brian
  • jnendel2
    jnendel2
    8 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T18:17:05Z  
    Be sure to check /var/log/dmesg as well, since any errors logged prior to syslogd starting will only be visible there. There are multiple potential causes for an array status of degraded. A failed drive is one possible cause. If this is the case, iprconfig should report the status of that disk as Failed. Another possible cause is related to the adapter's write cache. Ensure the iprinit daemon is running:

    service iprinit status

    If it is not, it should be started to ensure optimal performance:

    service iprinit start
    chkconfig iprinit on

    Thanks,

    Brian
    iprinit is running. During the last boot up there were warning messages because of "checktime reached" or "maximal mount count reached". Would those result in iprconfig reporting the RAID as degraded? It says "running e2fsck is recommended". Is there any risk in running e2fsck.

    Thank you!
  • Brian_King
    Brian_King
    20 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T18:37:49Z  
    • jnendel2
    • ‏2011-10-04T18:17:05Z
    iprinit is running. During the last boot up there were warning messages because of "checktime reached" or "maximal mount count reached". Would those result in iprconfig reporting the RAID as degraded? It says "running e2fsck is recommended". Is there any risk in running e2fsck.

    Thank you!
    Those messages shouldn't cause the array to be marked as degraded. There should be no problem running e2fsck, although it won't change the array status.

    Were there any errors logged by the ipr driver in /var/log/dmesg?

    If not, you can try resetting the adapter and see if any errors get logged to /var/log/messages. First, determine to which scsi host the array is connected. If you look at the Display hardware status screen in iprconfig, find the degraded array and look at the PCI/SCSI Location field. It is formatted like:

    PCI Location / SCSI host:SCSI bus:SCSI id:SCSI LUN

    For example, you might see something like:

    0000:41:01.0/2:0:3:0

    In this case, the scsi host is 2. Then run the following command:

    echo 1 > /sys/class/scsi_host/host2/reset

    This will cause the adapter to be reset. The reset may take 30 seconds or so to complete, during which time, all I/O will be stalled, so take notice if running this on a production machine. Once the reset has completed, all I/O will resume. You can then check /var/log/messages for any errors that may have been logged.

    Thanks,

    Brian
  • jnendel2
    jnendel2
    8 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T19:08:44Z  
    Those messages shouldn't cause the array to be marked as degraded. There should be no problem running e2fsck, although it won't change the array status.

    Were there any errors logged by the ipr driver in /var/log/dmesg?

    If not, you can try resetting the adapter and see if any errors get logged to /var/log/messages. First, determine to which scsi host the array is connected. If you look at the Display hardware status screen in iprconfig, find the degraded array and look at the PCI/SCSI Location field. It is formatted like:

    PCI Location / SCSI host:SCSI bus:SCSI id:SCSI LUN

    For example, you might see something like:

    0000:41:01.0/2:0:3:0

    In this case, the scsi host is 2. Then run the following command:

    echo 1 > /sys/class/scsi_host/host2/reset

    This will cause the adapter to be reset. The reset may take 30 seconds or so to complete, during which time, all I/O will be stalled, so take notice if running this on a production machine. Once the reset has completed, all I/O will resume. You can then check /var/log/messages for any errors that may have been logged.

    Thanks,

    Brian
    Yes, there was one ipr error: "8008: A permanent cache battery pack failure occurred".

    I read at one place where it said
    "NOTE: Under a certain configuration, this SRC may not represent an error that requires a service action. Depending on the configuration of the system, the Storage IOA may have been altered and/or the Storage IOA Cache may have been disabled to allow attachment of OEM Storage that emulates a Load Source drive. If this is the case, this error will be posted each time the IOA is IPLed and it can be ignored." - so I wasn't sure if it was a real error or not.

    Thanks you.
  • Brian_King
    Brian_King
    20 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T19:20:53Z  
    • jnendel2
    • ‏2011-10-04T19:08:44Z
    Yes, there was one ipr error: "8008: A permanent cache battery pack failure occurred".

    I read at one place where it said
    "NOTE: Under a certain configuration, this SRC may not represent an error that requires a service action. Depending on the configuration of the system, the Storage IOA may have been altered and/or the Storage IOA Cache may have been disabled to allow attachment of OEM Storage that emulates a Load Source drive. If this is the case, this error will be posted each time the IOA is IPLed and it can be ignored." - so I wasn't sure if it was a real error or not.

    Thanks you.
    The 8008 error can be logged for multiple reasons, but my guess at this point is that the rechargeable battery may need to be replaced. I would suggest contacting IBM hardware support to assist at this point.

    Thanks,

    Brian
  • jnendel2
    jnendel2
    8 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T19:29:44Z  
    The 8008 error can be logged for multiple reasons, but my guess at this point is that the rechargeable battery may need to be replaced. I would suggest contacting IBM hardware support to assist at this point.

    Thanks,

    Brian
    Thanks!
  • jnendel2
    jnendel2
    8 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T19:35:01Z  
    • jnendel2
    • ‏2011-10-04T19:29:44Z
    Thanks!
    I appreciate all your help. We have not been able to find anyone locally qualified to support this hardware. Is there a central IBM hardware support number we can call? Thanks again!
  • Brian_King
    Brian_King
    20 Posts

    Re: Monitoring Hardware RAID

    ‏2011-10-04T20:14:20Z  
    • jnendel2
    • ‏2011-10-04T19:35:01Z
    I appreciate all your help. We have not been able to find anyone locally qualified to support this hardware. Is there a central IBM hardware support number we can call? Thanks again!
    For IBM support contact information, start on this page:

    http://www.ibm.com/planetwide/region.html

    Select the country where you require service and you should see the contact number for IBM Hardware and Software Support.

    Thanks,

    Brian