Checking the health of an ESS configuration: a sample scenario
The scenario presented here shows how to use the gnrhealthcheck sample script to check the general health of an ESS configuration.
- In this example, all checks are successful. To run a health check on the local server nodes and place output in /tmp/gnrhealthcheck.out, issue the following command:
gnrhealthcheck --local | tee /tmp/gnrhealthcheck.out
The system displays information similar to this:################################################################ # Beginning topology checks. ################################################################ Topology checks successful. ################################################################ # Beginning enclosure checks. ################################################################ Enclosure checks successful. ################################################################ # Beginning recovery group checks. ################################################################ Recovery group checks successful. ################################################################ # Beginning pdisk checks. ################################################################ Pdisk group checks successful. ################################# Beginning IBM Power RAID checks. ##################################### IBM Power RAID checks successful.
- In this example, several issues need to be investigated. To run a health check on the local server nodes and place output in /tmp/gnrhealthcheck.out, issue the following command:
gnrhealthcheck --local | tee /tmp/gnrhealthcheck.out
The system displays information similar to this:################################################################ # Beginning topology checks. ############################################################ Found topology problems on node c45f01n01-ib0.gpfs.net DCS3700 enclosures found: 0123456789AB SV11812206 SV12616296 SV13306129 Enclosure 0123456789AB (number 1): Enclosure 0123456789AB ESM A sg244[0379][scsi8 port 4] ESM B sg4[0379][scsi7 port 4] Enclosure 0123456789AB Drawer 1 ESM sg244 12 disks diskset "19968" ESM sg4 12 disks diskset "19968" Enclosure 0123456789AB Drawer 2 ESM sg244 12 disks diskset "11294" ESM sg4 12 disks diskset "11294" Enclosure 0123456789AB Drawer 3 ESM sg244 12 disks diskset "60155" ESM sg4 12 disks diskset "60155" Enclosure 0123456789AB Drawer 4 ESM sg244 12 disks diskset "03345" ESM sg4 12 disks diskset "03345" Enclosure 0123456789AB Drawer 5 ESM sg244 11 disks diskset "33625" ESM sg4 11 disks diskset "33625" Enclosure 0123456789AB sees 59 disks Enclosure SV12616296 (number 2): Enclosure SV12616296 ESM A sg63[0379][scsi7 port 3] ESM B sg3[0379][scsi9 port 4] Enclosure SV12616296 Drawer 1 ESM sg63 11 disks diskset "51519" ESM sg3 11 disks diskset "51519" Enclosure SV12616296 Drawer 2 ESM sg63 12 disks diskset "36246" ESM sg3 12 disks diskset "36246" Enclosure SV12616296 Drawer 3 ESM sg63 12 disks diskset "53750" ESM sg3 12 disks diskset "53750" Enclosure SV12616296 Drawer 4 ESM sg63 12 disks diskset "07471" ESM sg3 12 disks diskset "07471" Enclosure SV12616296 Drawer 5 ESM sg63 11 disks diskset "16033" ESM sg3 11 disks diskset "16033" Enclosure SV12616296 sees 58 disks Enclosure SV11812206 (number 3): Enclosure SV11812206 ESM A sg66[0379][scsi9 port 3] ESM B sg6[0379][scsi8 port 3] Enclosure SV11812206 Drawer 1 ESM sg66 11 disks diskset "23334" ESM sg6 11 disks diskset "23334" Enclosure SV11812206 Drawer 2 ESM sg66 12 disks diskset "16332" ESM sg6 12 disks diskset "16332" Enclosure SV11812206 Drawer 3 ESM sg66 12 disks diskset "52806" ESM sg6 12 disks diskset "52806" Enclosure SV11812206 Drawer 4 ESM sg66 12 disks diskset "28492" ESM sg6 12 disks diskset "28492" Enclosure SV11812206 Drawer 5 ESM sg66 11 disks diskset "24964" ESM sg6 11 disks diskset "24964" Enclosure SV11812206 sees 58 disks Enclosure SV13306129 (number 4): Enclosure SV13306129 ESM A sg64[0379][scsi8 port 2] ESM B sg353[0379][scsi7 port 2] Enclosure SV13306129 Drawer 1 ESM sg64 11 disks diskset "47887" ESM sg353 11 disks diskset "47887" Enclosure SV13306129 Drawer 2 ESM sg64 12 disks diskset "53906" ESM sg353 12 disks diskset "53906" Enclosure SV13306129 Drawer 3 ESM sg64 12 disks diskset "35322" ESM sg353 12 disks diskset "35322" Enclosure SV13306129 Drawer 4 ESM sg64 12 disks diskset "37055" ESM sg353 12 disks diskset "37055" Enclosure SV13306129 Drawer 5 ESM sg64 11 disks diskset "16025" ESM sg353 11 disks diskset "16025" Enclosure SV13306129 sees 58 disks DCS3700 configuration: 4 enclosures, 1 SSD, 7 empty slots, 233 disks total Location 0123456789AB-5-12 appears empty but should have an SSD Location SV12616296-1-3 appears empty but should have an SSD Location SV12616296-5-12 appears empty but should have an SSD Location SV11812206-1-3 appears empty but should have an SSD Location SV11812206-5-12 appears empty but should have an SSD scsi7[07.00.00.00] 0000:11:00.0 [P2 SV13306129 ESM B (sg353)] [P3 SV12616296 ESM A (sg63)] [P4 0123456789AB ESM B (sg4)] scsi8[07.00.00.00] 0000:8b:00.0 [P2 SV13306129 ESM A (sg64)] [P3 SV11812206 ESM B (sg6)] [P4 0123456789AB ESM A (sg244)] scsi9[07.00.00.00] 0000:90:00.0 [P3 SV11812206 ESM A (sg66)] [P4 SV12616296 ESM B (sg3)] ################################################################ # Beginning enclosure checks. ################################################################ Enclosure checks successful. ################################################################ # Beginning recovery group checks. ################################################################ Found recovery group BB1RGR, primary server is not the active server. ################################################################ # Beginning pdisk checks. ################################################################ Found recovery group BB1RGL pdisk e4d5s06 has 0 paths. ################################# Beginning IBM Power RAID checks. #################################### IBM Power RAID Array is running in degraded mode. Name PCI/SCSI Location Description Status ------ ------------------------- ------------------------- ----------------- 0007:90:00.0/0: PCI-E SAS RAID Adapter Operational 0007:90:00.0/0:0:1:0 Advanced Function Disk Failed 0007:90:00.0/0:0:2:0 Advanced Function Disk Active sda 0007:90:00.0/0:2:0:0 RAID 10 Disk Array Degraded 0007:90:00.0/0:0:0:0 RAID 10 Array Member Active 0007:90:00.0/0:0:3:0 RAID 10 Array Member Failed 0007:90:00.0/0:0:4:0 Enclosure Active 0007:90:00.0/0:0:6:0 Enclosure Active 0007:90:00.0/0:0:7:0 Enclosure Active
See the gnrhealthcheck script for more information.