• 1 reply
  • Latest Post - ‏2012-10-26T16:47:58Z by esj
15 Posts

Pinned topic x3650M2 slows down frequently

‏2012-10-26T07:27:17Z |
Hi *,

we have 4 GPFS Servers accessing 8 Storageboxes (ds3400).
Since we upgraded from RHEL5.4 to RHEL6.2 we see the Servers
frequently freezing.
Then we get messages in the kernelringbuffer of that kind:

INFO: task sh:10559 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sh D 0000000000000005 0 10559 10558 0x00000080
ffff88032593bd38 0000000000000086 0000000000000000 ffff88032ce5e780
ffff88032593bcc8 ffffffffa06f4ccc ffff88032593bcc8 ffffffff81193bfa
ffff880373ca6638 ffff88032593bfd8 000000000000fb88 ffff880373ca6638
Call Trace:
<ffffffffa06f4ccc> ? gpfs_i_permission_noacl+0x4c/0xe0 mmfslinux
<ffffffff81193bfa> ? dput+0x9a/0x150
<ffffffff814ff45e> __mutex_lock_slowpath+0x13e/0x180
<ffffffff811638b2> ? kmem_cache_alloc+0x182/0x190
<ffffffff814ff2fb> mutex_lock+0x2b/0x50
<ffffffff8118c06e> do_filp_open+0x2be/0xd60
<ffffffff8104457c> ? __do_page_fault+0x1ec/0x480
<ffffffff81500bf5> ? page_fault+0x25/0x30
<ffffffff811984f2> ? alloc_fd+0x92/0x160
<ffffffff811789a9> do_sys_open+0x69/0x140
<ffffffff81178ac0> sys_open+0x20/0x30
<ffffffff8100b0f2> system_call_fastpath+0x16/0x1b
The process which is hanging is at random.
But the first function in the stacktrace is always gpfs_i_permission_noacl.
Sometime the machines crash completly without any hint what has happened.

We are already on the latest patchlevel for the machine and the kernel/driver.

Has anyone got a clue?

Updated on 2012-10-26T16:47:58Z at 2012-10-26T16:47:58Z by esj
  • esj
    104 Posts

    Re: x3650M2 slows down frequently

    Here is what I was told by somebody who knows what he is talking about ....

    This is an OS call into the GPFS permission op which actually does very little before turning around
    and calling "generic_permission" back in Linux. The traceback is strange in that dput does not make
    such a call. If the offset is the same (sounds like this happens frequently), then maybe an objdump
    of that mmfslinux module would help to locate the instruction that we're hung on. We may also need
    the entire Linux log file along with dump of kthreads to look for a deadlock.

    This cannot be handled in this forum. You are better off opening a PMR.