Checking the health of an ESS configuration: a sample scenario

The scenario presented here shows how to use the gnrhealthcheck sample script to check the general health of an ESS configuration.

  1. In this example, all checks are successful.
    To run a health check on the local server nodes and place output in /tmp/gnrhealthcheck.out, issue the following command:
       
    gnrhealthcheck --local | tee /tmp/gnrhealthcheck.out
    
    The system displays information similar to this:
    
    ################################################################
    # Beginning topology checks.
    ################################################################
    Topology checks successful.
    
    ################################################################
    # Beginning enclosure checks.
    ################################################################
    Enclosure checks successful.
    
    ################################################################
    # Beginning recovery group checks.
    ################################################################
    Recovery group checks successful.
    
    ################################################################
    # Beginning pdisk checks.
    ################################################################
    Pdisk group checks successful.
    
    ################################# 
    Beginning IBM Power RAID checks. 
    ##################################### 
    IBM Power RAID checks successful.
  2. In this example, several issues need to be investigated.
    To run a health check on the local server nodes and place output in /tmp/gnrhealthcheck.out, issue the following command:
       
    gnrhealthcheck --local | tee /tmp/gnrhealthcheck.out
    
    The system displays information similar to this:
    
    ################################################################
    # Beginning topology checks.
    ############################################################
    Found topology problems on node c45f01n01-ib0.gpfs.net
    
    DCS3700 enclosures found: 0123456789AB SV11812206 SV12616296 SV13306129
    Enclosure 0123456789AB (number 1):
    Enclosure 0123456789AB ESM A sg244[0379][scsi8 port 4] ESM B sg4[0379][scsi7 port 4]
    Enclosure 0123456789AB Drawer 1 ESM sg244 12 disks diskset "19968" ESM sg4 12 disks diskset "19968"
    Enclosure 0123456789AB Drawer 2 ESM sg244 12 disks diskset "11294" ESM sg4 12 disks diskset "11294"
    Enclosure 0123456789AB Drawer 3 ESM sg244 12 disks diskset "60155" ESM sg4 12 disks diskset "60155"
    Enclosure 0123456789AB Drawer 4 ESM sg244 12 disks diskset "03345" ESM sg4 12 disks diskset "03345"
    Enclosure 0123456789AB Drawer 5 ESM sg244 11 disks diskset "33625" ESM sg4 11 disks diskset "33625"
    Enclosure 0123456789AB sees 59 disks
    
    Enclosure SV12616296 (number 2):
    Enclosure SV12616296 ESM A sg63[0379][scsi7 port 3] ESM B sg3[0379][scsi9 port 4]
    Enclosure SV12616296 Drawer 1 ESM sg63 11 disks diskset "51519" ESM sg3 11 disks diskset "51519"
    Enclosure SV12616296 Drawer 2 ESM sg63 12 disks diskset "36246" ESM sg3 12 disks diskset "36246"
    Enclosure SV12616296 Drawer 3 ESM sg63 12 disks diskset "53750" ESM sg3 12 disks diskset "53750"
    Enclosure SV12616296 Drawer 4 ESM sg63 12 disks diskset "07471" ESM sg3 12 disks diskset "07471"
    Enclosure SV12616296 Drawer 5 ESM sg63 11 disks diskset "16033" ESM sg3 11 disks diskset "16033"
    Enclosure SV12616296 sees 58 disks
    
    Enclosure SV11812206 (number 3):
    Enclosure SV11812206 ESM A sg66[0379][scsi9 port 3] ESM B sg6[0379][scsi8 port 3]
    Enclosure SV11812206 Drawer 1 ESM sg66 11 disks diskset "23334" ESM sg6 11 disks diskset "23334"
    Enclosure SV11812206 Drawer 2 ESM sg66 12 disks diskset "16332" ESM sg6 12 disks diskset "16332"
    Enclosure SV11812206 Drawer 3 ESM sg66 12 disks diskset "52806" ESM sg6 12 disks diskset "52806"
    Enclosure SV11812206 Drawer 4 ESM sg66 12 disks diskset "28492" ESM sg6 12 disks diskset "28492"
    Enclosure SV11812206 Drawer 5 ESM sg66 11 disks diskset "24964" ESM sg6 11 disks diskset "24964"
    Enclosure SV11812206 sees 58 disks
    
    Enclosure SV13306129 (number 4):
    Enclosure SV13306129 ESM A sg64[0379][scsi8 port 2] ESM B sg353[0379][scsi7 port 2]
    Enclosure SV13306129 Drawer 1 ESM sg64 11 disks diskset "47887" ESM sg353 11 disks diskset "47887"
    Enclosure SV13306129 Drawer 2 ESM sg64 12 disks diskset "53906" ESM sg353 12 disks diskset "53906"
    Enclosure SV13306129 Drawer 3 ESM sg64 12 disks diskset "35322" ESM sg353 12 disks diskset "35322"
    Enclosure SV13306129 Drawer 4 ESM sg64 12 disks diskset "37055" ESM sg353 12 disks diskset "37055"
    Enclosure SV13306129 Drawer 5 ESM sg64 11 disks diskset "16025" ESM sg353 11 disks diskset "16025"
    Enclosure SV13306129 sees 58 disks
    
    DCS3700 configuration: 4 enclosures, 1 SSD, 7 empty slots, 233 disks total
    Location 0123456789AB-5-12 appears empty but should have an SSD
    Location SV12616296-1-3 appears empty but should have an SSD
    Location SV12616296-5-12 appears empty but should have an SSD
    Location SV11812206-1-3 appears empty but should have an SSD
    Location SV11812206-5-12 appears empty but should have an SSD
    
    scsi7[07.00.00.00] 0000:11:00.0 [P2 SV13306129 ESM B (sg353)] [P3 SV12616296 ESM A (sg63)] 
    [P4 0123456789AB ESM B (sg4)]
    scsi8[07.00.00.00] 0000:8b:00.0 [P2 SV13306129 ESM A (sg64)] [P3 SV11812206 ESM B (sg6)] 
    [P4 0123456789AB ESM A (sg244)]
    scsi9[07.00.00.00] 0000:90:00.0 [P3 SV11812206 ESM A (sg66)] [P4 SV12616296 ESM B (sg3)]
    
    ################################################################
    # Beginning enclosure checks.
    ################################################################
    Enclosure checks successful.
    
    ################################################################
    # Beginning recovery group checks.
    ################################################################
    Found recovery group BB1RGR, primary server is not the active server.
    
    ################################################################
    # Beginning pdisk checks.
    ################################################################
    Found recovery group BB1RGL pdisk e4d5s06 has 0 paths.
    
    ################################# 
    Beginning IBM Power RAID checks. 
    ####################################
    IBM Power RAID Array is running in degraded mode.  
    Name   PCI/SCSI Location          Description               Status
    ------ -------------------------  ------------------------- -----------------
           0007:90:00.0/0:            PCI-E SAS RAID Adapter    Operational
           0007:90:00.0/0:0:1:0       Advanced Function Disk    Failed
           0007:90:00.0/0:0:2:0       Advanced Function Disk    Active
    sda    0007:90:00.0/0:2:0:0       RAID 10  Disk Array       Degraded
           0007:90:00.0/0:0:0:0       RAID 10  Array Member     Active
           0007:90:00.0/0:0:3:0       RAID 10  Array Member     Failed
           0007:90:00.0/0:0:4:0       Enclosure                 Active
           0007:90:00.0/0:0:6:0       Enclosure                 Active
           0007:90:00.0/0:0:7:0       Enclosure                 Active

See the gnrhealthcheck script for more information.