A fix is available
APAR status
Closed as program error.
Error description
When a system is restarted, or the device reconfigured it is is possible that the OFED device may be gone. This is because the provider's registration with OFED has failed.
Local fix
Problem summary
When a system is restarted, or the device reconfigured it is is possible that the OFED device may be gone. This is because the provider's registration with OFED has failed.
Problem conclusion
When the Provider calls ofed_register, as part of the call, the code does the MAD buffer initialization. If there are MAD messages in the system while the initialization is being done, they are received, processed and replenished. However, if two instances of the replenish code run at the same time (one in the initialization path and one in the receive completion path that will repost the buffer), then both of them will try to repost all the buffers to the limit. This code is implemented as a "do-while" loop so if 2 threads execute in parallel there will be one more post than the QP receive depth/capacity. This will caiuse the ib_post_receive() to fail (from the provider). The receive completion code can handle that error but the MAD initialization code will not and will bail out. As a result ofed_register() will fail and the provider will not be able to register the device with OFED. The fix is to implement the loop as a while-do, so that we don't do the extra receive buffer posting in this path.
Temporary fix
Comments
6100-09 - use AIX APAR IV52981 6100-09 - use AIX APAR IV52981 6100-09 - use AIX APAR IV52981 7100-03 - use AIX APAR IV53250
APAR Information
APAR number
IV53250
Reported component name
AIX V7.1
Reported component ID
5765H4000
Reported release
710
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Submitted date
2013-12-11
Closed date
2013-12-11
Last modified date
2014-05-22
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
AIX V7.1
Fixed component ID
5765H4000
Applicable component levels
R710 PSY U858987
UP14/05/22 I 1000
PTF to Fileset Mapping
U858987 ofed.core.rte 7.1.3.15
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"AIX 7.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
22 May 2014