Hardware service requested
It is important to be notified when a hardware component fails so that Support can notify service technicians that can replace or repair the component. For devices such as disks, a hardware failure causes the system to bring a spare disk online, and after an activation period, the spare disk transparently replaces the failed disk. However, it is important to replace the failed disk with a healthy disk so that you restore the system to its normal operation with its complement of spares.
In other cases, such as SPU failures, the system reroutes the work of the failed SPU to the other available SPUs. The system performance is affected because the healthy resources take on extra workload. Again, it is critical to obtain service to replace the faulty component and restore the system to its normal performance.
If you enable the event rule HardwareServiceRequested, the system generates a notification when there is a hardware failure and service technicians might be required to replace or repair components.
-name 'HardwareServiceRequested' -on no -eventType hwServiceRequested
-eventArgsExpr '' -notifyType email -dst 'you@company.com' -ccDst ''
-msg 'NPS system $HOST - Service requested for $hwType $hwId at
$eventTimestamp $eventSource.' -bodyText
'$notifyMsg\n\nlocation:$location\nerror
string:$errString\ndevSerial:$devSerial\nevent source:$eventSource\n'
c-eventAggrCount 0
Arguments | Description | Example |
---|---|---|
hwType | The type of hardware affected | spu, disk, pwr, fan, mm |
hwId | The hardware ID of the component that reports a problem | 1013 |
location | A string that describes the physical location of the component | |
errString | Specifies more information about the error or condition that triggered the event. If the failed component is not inventoried, it is specified in this string. | |
devSerial | Specifies the serial number of the component, or Unknown if the component has no serial number. | 601S496A2012 |
2012-04-05 19:52:41.637742 EDT Info: received & processing event type
= hwServiceRequested, event args = 'hwType=disk, hwId=1073,
location=Logical Name:'spa1.diskEncl2.disk1' Logical Location:'1st
rack, 2nd disk enclosure, disk in Row 1/Column 1', errString=disk md:
md2 sector: 2051 partition type: DATA table: 201328,
devSerial=9QJ2FMKN00009838VVR9...
- The md value specifies the RAID device on the SPU that encountered the issue.
- The sector value specifies which sector in the device has the read error.
- The partition type specifies whether the partition is a user data (DATA) or SYSTEM partition.
- The table value specifies the table ID of the user table that is affected by the bad sector.
If the system notifies you of a read sector error, contact Netezza Performance Server Support for assistance with troubleshooting and resolving the problems.