Topic
14 replies Latest Post - ‏2011-10-04T20:14:20Z by Brian_King
jnendel
jnendel
1 Post
ACCEPTED ANSWER

Pinned topic Monitoring Hardware RAID

‏2009-12-08T15:38:53Z |
We have a very simple installation of RedHat 4 on a P-series server but have not been able to find a way of monitoring the hardware RAID. There is software which came with the server but only had installations for i386 or EM64T architecture. HELP!!!
Updated on 2011-10-04T20:14:20Z at 2011-10-04T20:14:20Z by Brian_King
  • SystemAdmin
    SystemAdmin
    706 Posts
    ACCEPTED ANSWER

    Re: Monitoring Hardware RAID

    ‏2009-12-11T16:29:36Z  in response to jnendel
    Hello jnendel,
    A few questions:
    1. Can you provide the output of "uname -a" and /etc/issue ?
    2. What type of card are you using?
    3. What type of POWER system are you using?

    Thanks!
    • jnendel2
      jnendel2
      8 Posts
      ACCEPTED ANSWER

      Re: Monitoring Hardware RAID

      ‏2011-10-03T18:54:09Z  in response to SystemAdmin
      Linux Oracle.bibleonstage.com 2.6.9-100.EL #1 SMP Tue Feb 1 12:10:14 EST 2011 ppc64 ppc64 ppc64 GNU/Linux

      Kernel \r on an \m

      RAID card: IBM Model 7031-D24/T24

      P-series
      • Brian_King
        Brian_King
        20 Posts
        ACCEPTED ANSWER

        Re: Monitoring Hardware RAID

        ‏2011-10-03T19:30:26Z  in response to jnendel2
        Can you post the output of these two commands as well?

        lspci

        grep system_type /proc/ppc64/lparcfg

        Thanks,

        Brian
        • jnendel2
          jnendel2
          8 Posts
          ACCEPTED ANSWER

          Re: Monitoring Hardware RAID

          ‏2011-10-04T13:10:37Z  in response to Brian_King
          Appreciate your help. The P-series is a 550.

          00:01.0 RAID bus controller: IBM Citrine chipset SCSI controller (rev 11)
          0001:00:02.0 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
          0001:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
          0001:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
          0001:00:02.6 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
          0001:c0:01.0 SCSI storage controller: Mylex Corporation AcceleRAID 600/500/400/Sapphire support Device (rev 04)
          0001:d0:01.0 Mass storage controller: Promise Technology, Inc. 20275 (rev 01)
          0002:00:02.0 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
          0002:00:02.2 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
          0002:00:02.4 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
          0002:00:02.6 PCI bridge: IBM EADS-X PCI-X to PCI-X Bridge (rev 03)
          0002:c0:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
          0002:c0:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
          0002:c8:01.0 USB Controller: NEC Corporation USB (rev 43)
          0002:c8:01.1 USB Controller: NEC Corporation USB (rev 43)
          0002:c8:01.2 USB Controller: NEC Corporation USB 2.0 (rev 04)

          system_type=IBM,9133-55A
          • Brian_King
            Brian_King
            20 Posts
            ACCEPTED ANSWER

            Re: Monitoring Hardware RAID

            ‏2011-10-04T14:03:18Z  in response to jnendel2
            It looks like you are using an ipr based storage controller. A package of management utilities are included with your Linux distribution in the iprutils package. The iprconfig command will allow you to see the current device and array status, create RAID arrays, and perform other RAID management activities. This utility also provides a command line interface, which allows for use in a scripted environment. Refer to the iprutils man page for further details on this tool as well as the command line syntax.

            Any time the ipr RAID adapter detects a serviceable event, an error will be logged to the kernel dmesg log with details of the error, so monitoring /var/log/messages may suffice for what you are looking to accomplish.

            A wealth of information on this family of RAID adapters is provided in the "PCI-X SCSI RAID Controller Reference Guide for Linux", which can be found here:

            http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/hardware_docs/pdf/231327.pdf

            Since you are running RHEL 4, you will want to look at Part 1 of the document, which is the section for Linux distributions based on a 2.6 Linux kernel.

            Refer to Chapter 5 for an explanation of the errors that can be logged by the RAID adapters along with recommended service actions for the various errors.

            Thanks,

            Brian
            • jnendel2
              jnendel2
              8 Posts
              ACCEPTED ANSWER

              Re: Monitoring Hardware RAID

              ‏2011-10-04T16:00:42Z  in response to Brian_King
              Thanks. I knew about those utilities but I thought they only worked when you were in the console interface prior to booting Linux. One reason for concluding that is I see not a single entry in the ipr error log yet the utility says "degraded" for every RAID array. Should I be concerned?
              • Brian_King
                Brian_King
                20 Posts
                ACCEPTED ANSWER

                Re: Monitoring Hardware RAID

                ‏2011-10-04T16:19:15Z  in response to jnendel2
                Be sure to check /var/log/dmesg as well, since any errors logged prior to syslogd starting will only be visible there. There are multiple potential causes for an array status of degraded. A failed drive is one possible cause. If this is the case, iprconfig should report the status of that disk as Failed. Another possible cause is related to the adapter's write cache. Ensure the iprinit daemon is running:

                service iprinit status

                If it is not, it should be started to ensure optimal performance:

                service iprinit start
                chkconfig iprinit on

                Thanks,

                Brian
                • jnendel2
                  jnendel2
                  8 Posts
                  ACCEPTED ANSWER

                  Re: Monitoring Hardware RAID

                  ‏2011-10-04T18:17:05Z  in response to Brian_King
                  iprinit is running. During the last boot up there were warning messages because of "checktime reached" or "maximal mount count reached". Would those result in iprconfig reporting the RAID as degraded? It says "running e2fsck is recommended". Is there any risk in running e2fsck.

                  Thank you!
                  • Brian_King
                    Brian_King
                    20 Posts
                    ACCEPTED ANSWER

                    Re: Monitoring Hardware RAID

                    ‏2011-10-04T18:37:49Z  in response to jnendel2
                    Those messages shouldn't cause the array to be marked as degraded. There should be no problem running e2fsck, although it won't change the array status.

                    Were there any errors logged by the ipr driver in /var/log/dmesg?

                    If not, you can try resetting the adapter and see if any errors get logged to /var/log/messages. First, determine to which scsi host the array is connected. If you look at the Display hardware status screen in iprconfig, find the degraded array and look at the PCI/SCSI Location field. It is formatted like:

                    PCI Location / SCSI host:SCSI bus:SCSI id:SCSI LUN

                    For example, you might see something like:

                    0000:41:01.0/2:0:3:0

                    In this case, the scsi host is 2. Then run the following command:

                    echo 1 > /sys/class/scsi_host/host2/reset

                    This will cause the adapter to be reset. The reset may take 30 seconds or so to complete, during which time, all I/O will be stalled, so take notice if running this on a production machine. Once the reset has completed, all I/O will resume. You can then check /var/log/messages for any errors that may have been logged.

                    Thanks,

                    Brian
                    • jnendel2
                      jnendel2
                      8 Posts
                      ACCEPTED ANSWER

                      Re: Monitoring Hardware RAID

                      ‏2011-10-04T19:08:44Z  in response to Brian_King
                      Yes, there was one ipr error: "8008: A permanent cache battery pack failure occurred".

                      I read at one place where it said
                      "NOTE: Under a certain configuration, this SRC may not represent an error that requires a service action. Depending on the configuration of the system, the Storage IOA may have been altered and/or the Storage IOA Cache may have been disabled to allow attachment of OEM Storage that emulates a Load Source drive. If this is the case, this error will be posted each time the IOA is IPLed and it can be ignored." - so I wasn't sure if it was a real error or not.

                      Thanks you.
                      • Brian_King
                        Brian_King
                        20 Posts
                        ACCEPTED ANSWER

                        Re: Monitoring Hardware RAID

                        ‏2011-10-04T19:20:53Z  in response to jnendel2
                        The 8008 error can be logged for multiple reasons, but my guess at this point is that the rechargeable battery may need to be replaced. I would suggest contacting IBM hardware support to assist at this point.

                        Thanks,

                        Brian
                        • jnendel2
                          jnendel2
                          8 Posts
                          ACCEPTED ANSWER

                          Re: Monitoring Hardware RAID

                          ‏2011-10-04T19:29:44Z  in response to Brian_King
                          Thanks!
                          • jnendel2
                            jnendel2
                            8 Posts
                            ACCEPTED ANSWER

                            Re: Monitoring Hardware RAID

                            ‏2011-10-04T19:35:01Z  in response to jnendel2
                            I appreciate all your help. We have not been able to find anyone locally qualified to support this hardware. Is there a central IBM hardware support number we can call? Thanks again!
                            • Brian_King
                              Brian_King
                              20 Posts
                              ACCEPTED ANSWER

                              Re: Monitoring Hardware RAID

                              ‏2011-10-04T20:14:20Z  in response to jnendel2
                              For IBM support contact information, start on this page:

                              http://www.ibm.com/planetwide/region.html

                              Select the country where you require service and you should see the contact number for IBM Hardware and Software Support.

                              Thanks,

                              Brian