APAR status
Closed as program error.
Error description
3K customers using ESS 6.1.5 and above are seeing pems going offline so pems hang. If customer runs scsi-rescan there is potential exposure to crash at pemsSlaveDestroy+0x46.
Local fix
If pems goes offline we need to avoid scsi-rescan to avoid the crash. To recover from offline we can restart the pemsmod module for now until fix is applied
Problem summary
3K customers using ESS 6.1.5 and above are seeing pems going offline so pems hang. If customer runs scsi-rescan there is potential exposure to crash at pemsSlaveDestroy+0x46.
Problem conclusion
This problem is fixed in 5.1.8.1 To see all Spectrum Scale APARs and their respective Fix solutions refer to page: https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: This solution will fix a potential hang at pemsCliCmdQueue and will fix the crash at pemsSlaveDestroy+0x46 if scsi-rescan happens when pems is offline. Work Around: If pems goes offline we need to avoid scsi-rescan to avoid the crash. To recover from offline we can restart the pemsmod module for now until fix is applied Problem trigger: Not sure why this issue is being triggered by ESS 6.1.5 and above. Not recreated at lab. Symptom: If pems is going offline there will be some commands like tsplatformstat, sginfo and other that will fail. At the dmesg will see: scsi 11:0:0:0: Device offlined - not ready after error recovery scsi 11:0:0:0: rejecting I/O to offline device INFO: task pemsCliCmdQueue:11928 blocked for more than 120 seconds.and if doing scsi-rescan the crash stack trace will have this pemsSlaveDestroy+0x46. Functional Area affected: ESS 3K Customer Impact: High importance
Temporary fix
Comments
APAR Information
APAR number
IJ47466
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
518
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-07-05
Closed date
2023-07-27
Last modified date
2023-07-27
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"518","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
27 July 2023