Troubleshooting
Problem
This IBM internal only technotes describes a very infrequent failure on first time power up only which results in a system that fails to boot. Details on how to identify the failure and a workaround to repair are provided.
Cause
This issue only occurs on first time power up of a new system shipped from IBM factory or a chassis FRU replacement done in the field. Controller replacement of one or both nodes is not subject to this failure mode. The chassis EEPROM is updated one time only on first time power up and due to a timing issue, both nodes can attempt to write the VPD information at the same time leading to corrupt Vital Product Data (VPD) info stored on the chassis midplane.
This issue is limited to the VPD portion of the EEPROM and can be corrected using the low level Service Processor repair process provided in this technote, no other components on the system are affected.
This issue is limited to the VPD portion of the EEPROM and can be corrected using the low level Service Processor repair process provided in this technote, no other components on the system are affected.
Environment
N6210, N6240 and N6270 only with dual-node single chassis (2858-C20, -C21, -C22).
Diagnosing The Problem
The following messages upon boot of a N6200 Series 2858-C2x in a dual-node configuration is the error signature for chassis midplane FRU VPD corruption:
Chassis midplane (IBM-^[O_^[O_) is not compatible with the installed controller
model (IBM-2858-CN0).
Shutting down: Unsupported configuration.
Please note the garbage value of "IBM-^[O_^[O_" for the chassis midplane model. In the error scenario, this value of the chassis midplane model could also appear in a slightly different form, but would never show a valid N6200 Series Machine Type-Model (MTM). If these messages were displayed upon boot with a valid value for chassis midplane model, that would simply indicate a chassis/controller FRU mismatch [likely due to mixing chassis/controller FRUs, or user error in programming the value of the controller model by a field engineer] rather than chassis midplane FRU VPD corruption.
Chassis midplane (IBM-^[O_^[O_) is not compatible with the installed controller
model (IBM-2858-CN0).
Shutting down: Unsupported configuration.
Please note the garbage value of "IBM-^[O_^[O_" for the chassis midplane model. In the error scenario, this value of the chassis midplane model could also appear in a slightly different form, but would never show a valid N6200 Series Machine Type-Model (MTM). If these messages were displayed upon boot with a valid value for chassis midplane model, that would simply indicate a chassis/controller FRU mismatch [likely due to mixing chassis/controller FRUs, or user error in programming the value of the controller model by a field engineer] rather than chassis midplane FRU VPD corruption.
Resolving The Problem
This information may only be shared with core members of IBM N series team -- strictly on a "need-to-know" basis. This procedure is not available for customers.
Prior to the workaround repair process made available with Service Processor FW 1.3, the only way to resolve the issue is to replace the chassis. This is still a valid method to fix, however, using the repair process described in this technote is the preferred method and should be able to be done much faster.
Service Processor FW 1.3 is required and has been installed on all N62xx systems since late June 2012. SP firmware can be upgraded independent of whether the system is in a failed state from this issue.
SP FW 1.3 provides a recovery method in the event of midplane FRU VPD corruption on N6200 in a dual-node single chassis configuration. Instructions for this recovery method are as follows:
1.) Issue the following SP commands to determine whether or not midplane FRU VPD was truly corrupted --
SP> system fru show 1
SP> system fru show 2
If a midplane FRU has corrupt VPD, a message stating "'Midplane1' inventory data checksum error detected" or "'Midplane2' inventory data checksum error detected" will be displayed. If neither midplane FRU has corrupt VPD, then the recovery command in Step 2 does not apply and would have no effect. If both midplane FRUs have corrupt VPD, then recovery is not possible, and the recovery command in Step 2 would have no effect.
2.) Issue the following SP commands to recover the single midplane FRU with corrupt VPD --
SP> priv set diag
SP> system fru mp-recover
Depending on which midplane FRU has corrupt VPD, one of the following sets of messages will appear --
<<If Midplane FRU1 has corrupt VPD>>
Current midplane FRUs summary
'Midplane1' inventory data:
'Midplane1' inventory data checksum error detected
. . .
Checksum error detected on midplane FRU1.
Using midplane FRU2 content for recovery. Please wait ...
New midplane FRUs summary
. . .
Midplane FRU1 is restored with default HAOSC setting.
<<If Midplane FRU2 has corrupt VPD>>
Current midplane FRUs summary
. . .
'Midplane2' inventory data:
'Midplane2' inventory data checksum error detected
Checksum error detected on midplane FRU2.
Using midplane FRU1 content for recovery. Please wait ...
New midplane FRUs summary
. . .
Midplane FRU2 is restored with default HAOSC setting.
3.) Reboot the system.
Prior to the workaround repair process made available with Service Processor FW 1.3, the only way to resolve the issue is to replace the chassis. This is still a valid method to fix, however, using the repair process described in this technote is the preferred method and should be able to be done much faster.
Service Processor FW 1.3 is required and has been installed on all N62xx systems since late June 2012. SP firmware can be upgraded independent of whether the system is in a failed state from this issue.
SP FW 1.3 provides a recovery method in the event of midplane FRU VPD corruption on N6200 in a dual-node single chassis configuration. Instructions for this recovery method are as follows:
1.) Issue the following SP commands to determine whether or not midplane FRU VPD was truly corrupted --
SP> system fru show 1
SP> system fru show 2
If a midplane FRU has corrupt VPD, a message stating "'Midplane1' inventory data checksum error detected" or "'Midplane2' inventory data checksum error detected" will be displayed. If neither midplane FRU has corrupt VPD, then the recovery command in Step 2 does not apply and would have no effect. If both midplane FRUs have corrupt VPD, then recovery is not possible, and the recovery command in Step 2 would have no effect.
2.) Issue the following SP commands to recover the single midplane FRU with corrupt VPD --
SP> priv set diag
SP> system fru mp-recover
Depending on which midplane FRU has corrupt VPD, one of the following sets of messages will appear --
<<If Midplane FRU1 has corrupt VPD>>
Current midplane FRUs summary
'Midplane1' inventory data:
'Midplane1' inventory data checksum error detected
. . .
Checksum error detected on midplane FRU1.
Using midplane FRU2 content for recovery. Please wait ...
New midplane FRUs summary
. . .
Midplane FRU1 is restored with default HAOSC setting.
<<If Midplane FRU2 has corrupt VPD>>
Current midplane FRUs summary
. . .
'Midplane2' inventory data:
'Midplane2' inventory data checksum error detected
Checksum error detected on midplane FRU2.
Using midplane FRU1 content for recovery. Please wait ...
New midplane FRUs summary
. . .
Midplane FRU2 is restored with default HAOSC setting.
3.) Reboot the system.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"STFUJD","label":"Network Attached Storage (NAS)->N6240 (2858-E11, C21, E21)"},"Component":"Hardware","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]
Was this topic helpful?
Document Information
Modified date:
10 January 2020
UID
ssg1S1004158