A fix is available
APAR status
Closed as program error.
Error description
Machine hangs when trying to perform cd and rm operations in filesystems that have snapshots. Other commands like snap or lspv may also hang. Problem is caused by race condition in snapshot filesystem code where two processes end up both holding a lock and being blocked on what the other is holding. It is a classic deadlock situation in the kernel which only can be overcome by reboot. The processes involved were running the cd and rm commands. Stacks involved in hang look like: (0)> f 2657 pvthread+0A6100 STACK: [0052D660]slock+000480 (00000000000D3870, 8000000000001032 [??]) [00009558].simple_lock+000058 () [0027EDC4]siAlloc+000044 (??, ??, ??, ??) [0027CCC0]siWriterReadSMap+0003C0 (F1000A06438EDC80, 00000000015AE844, F00000002FF457D0, 0000000100000001) [00283BC0]siIOD+000140 (??, ??, ??, ??, ??) [00273E10]txIODUpdateSMap+000110 (??, ??, ??, ??) [00277810]xtLog+000350 (??, ??, ??, ??) [0028EC60]xtTruncate+000300 (??, ??, ??, ??, ??) [00275DDC]txLog+00029C (??, ??, ??) [002781D4]txCommit+000654 (??, ??, ??, ??) [002AF178]j2_remove+000438 (??, ??, ??, ??) [0057C0A4]vnop_remove+0003E4 (??, ??, ??, ??) [00672580]kunlink+000300 (??, ??) [00003850]ovlya_addr_sc_flih_main+000130 () [kdb_get_virtual_memory] no real storage @ 2FF228E8 [10052490]10052490 () [kdb_read_mem] no real storage @ FFFFFFFFFFF92A0 ------------------------------------ and (0)> f 2395 pvthread+095B00 STACK: [00527BF4]complex_lock_sleep_ppc+0001D4 (00000000000D3870, 8000000000001032, 0000000044288848, F00000002FF461A0 [??]) [0052927C]lock_read_ppc+00095C (??) [00280964]siReaderLookupSMap+0000E4 (??, ??, ??, ??, ??, ??, ??) [00263D38]smRead+0001B8 (??, ??) [00236D00]bmStartIOOne+000120 (??) [0023C6FC]bmRead+00027C (??, ??, ??, ??, ??, ??) [002896D4]xtSearch+0005D4 (??, ??, ??, ??, ??) [00291A24]xtLookup+000064 (??, ??, ??, ??, ??, ??, ??) [0023C9B8]bmRead+000538 (??, ??, ??, ??, ??, ??) [00272F30]diMount+000050 (??) [002857B4]siAttach+000414 (??, ??, ??, ??) [00346FE8]j2_lookup+0004C8 (??, ??, ??, ??, ??, ??) [0057E364]vnop_lookup+000184 (??, ??, ??, ??, ??, ??) [00540CE4]lookuppn+000A04 (??, ??, ??, ??, ??, ??, ??, ??) [005414A0]lookupname_internal+0000A0 (??, ??, ??, ??, ??, ??, ??, ??) [0067B9D8]chdirec+000058 (??, ??, ??, ??) [0067B844]chdir+000124 (??) [00003850]ovlya_addr_sc_flih_main+000130 () [kdb_get_virtual_memory] no real storage @ 2FF22598
Local fix
There is no local fix except to reboot after hang is discoverd. It would seem to be a pretty rare hang since it is seen in downlevel code that has been around for some time.
Problem summary
Machine hangs when trying to perform cd and rm operations in filesystems that have snapshots. Other commands like snap or lspv may also hang. Problem is caused by race condition in snapshot filesystem code where two processes end up both holding a lock and being blocked on what the other is holding. It is a classic deadlock situation in the kernel which only can be overcome by reboot. The processes involved were running the cd and rm commands. Stacks involved in hang look like: (0)> f 2657 pvthread+0A6100 STACK: 0052D660 slock+000480 (00000000000D3870, 00009558 .simple_lock+000058 () 0027EDC4 siAlloc+000044 (??, ??, ??, ??) 0027CCC0 siWriterReadSMap+0003C0 (F1000A06438EDC80, 00283BC0 siIOD+000140 (??, ??, ??, ??, ??) 00273E10 txIODUpdateSMap+000110 (??, ??, ??, ??) 00277810 xtLog+000350 (??, ??, ??, ??) 0028EC60 xtTruncate+000300 (??, ??, ??, ??, ??) 00275DDC txLog+00029C (??, ??, ??) 002781D4 txCommit+000654 (??, ??, ??, ??) 002AF178 j2_remove+000438 (??, ??, ??, ??) 0057C0A4 vnop_remove+0003E4 (??, ??, ??, ??) 00672580 kunlink+000300 (??, ??) 00003850 ovlya_addr_sc_flih_main+000130 () ------------------------------------ and (0)> f 2395 pvthread+095B00 STACK: 00527BF4 complex_lock_sleep_ppc+0001D4 0052927C lock_read_ppc+00095C (??) 00280964 siReaderLookupSMap+0000E4 (??, ??, ??, ??, ??, 00263D38 smRead+0001B8 (??, ??) 00236D00 bmStartIOOne+000120 (??) 0023C6FC bmRead+00027C (??, ??, ??, ??, ??, ??) 002896D4 xtSearch+0005D4 (??, ??, ??, ??, ??) 00291A24 xtLookup+000064 (??, ??, ??, ??, ??, ??, ??) 0023C9B8 bmRead+000538 (??, ??, ??, ??, ??, ??) 00272F30 diMount+000050 (??) 002857B4 siAttach+000414 (??, ??, ??, ??) 00346FE8 j2_lookup+0004C8 (??, ??, ??, ??, ??, ??) 0057E364 vnop_lookup+000184 (??, ??, ??, ??, ??, ??) 00540CE4 lookuppn+000A04 (??, ??, ??, ??, ??, ??, ??, 005414A0 lookupname_internal+0000A0 (??, ??, ??, ??, ??, 0067B9D8 chdirec+000058 (??, ??, ??, ??) 0067B844 chdir+000124 (??)
Problem conclusion
Fix serialization during smap page writes.
Temporary fix
Comments
6100-06 - use AIX APAR IV23346 6100-07 - use AIX APAR IV33759 6100-08 - use AIX APAR IV29780 6100-09 - use AIX APAR IV30215 7100-00 - use AIX APAR IV35184 7100-01 - use AIX APAR IV34863 7100-02 - use AIX APAR IV29829
APAR Information
APAR number
IV35184
Reported component name
AIX V7.1
Reported component ID
5765H4000
Reported release
710
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Submitted date
2013-01-15
Closed date
2013-01-15
Last modified date
2013-11-23
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
AIX V7.1
Fixed component ID
5765H4000
Applicable component levels
R710 PSY U854839
UP13/04/25 I 1000
PTF to Fileset Mapping
U854839 bos.mp64 7.1.0.22
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SSMV87","label":"AIX 6.1 Enterprise Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSMVAX","label":"AIX Express Edition"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG11R","label":"AIX 7.1 HIPERS, APARs and Fixes"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"710","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
23 November 2013