Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
1 reply Latest Post - ‏2013-04-02T18:05:11Z by SystemAdmin
SystemAdmin
SystemAdmin
2092 Posts
ACCEPTED ANSWER

Pinned topic GPFS en kernel error

‏2013-04-02T14:35:15Z |
Hi,

this weekend our gpfs 3.5.0.4 cluster has got some problems. The services on the system stopped working, but no errors were given on gpfs level. We couldn't work with our filesystem, and there were waiters hanging forever for 'delsnaps'.
On the manager node, we got also this error in the syslog

Mar 30 02:44:35 bulk1 kernel: http://727149.364096 BUG: unable to handle kernel paging request at ffffc90022ecc6f8
Mar 30 02:44:35 bulk1 kernel: http://727149.371189 IP: <ffffffffa057c92c> _ZNK8OpenFile28copyDataToPrevSnapshotNeededEjxx+0x2ec/0x310 mmfs26
Mar 30 02:44:35 bulk1 kernel: http://727149.380891 PGD c2fc0c067 PUD c2fc0d067 PMD a06ccd067 PTE 0
Mar 30 02:44:35 bulk1 kernel: http://727149.386608 Oops: 0000 1 SMP
Mar 30 02:44:35 bulk1 kernel: http://727149.389956 last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host1/target1:1:0/1:1:0:0/rev
Mar 30 02:44:35 bulk1 kernel: http://727149.399879 CPU 4
Mar 30 02:44:35 bulk1 kernel: http://727149.402001 Modules linked in: mmfs26 mmfslinux tracedev nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipip tunnel4 dm_round_robin dm_multipath scsi_dh bonding ipmi_si ipmi_devintf ipmi_msghandler snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr psmouse joydev evdev dcdbas power_meter serio_raw processor button ext3 jbd mbcache dm_mod raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod btrfs zlib_deflate crc32c libcrc32c sd_mod crc_t10dif ses enclosure usbhid hid ixgbe mpt2sas uhci_hcd ehci_hcd usbcore scsi_transport_sas dca nls_base mdio scsi_mod thermal thermal_sys bnx2 last unloaded: scsi_wait_scan
Mar 30 02:44:35 bulk1 kernel: http://727149.464334 Pid: 13886, comm: smbd Not tainted 2.6.32-5-amd64 #1 PowerEdge R610
Mar 30 02:44:35 bulk1 kernel: http://727149.471745 RIP: 0010:<ffffffffa057c92c> <ffffffffa057c92c> _ZNK8OpenFile28copyDataToPrevSnapshotNeededEjxx+0x2ec/0x310 mmfs26
Mar 30 02:44:35 bulk1 kernel: http://727149.483873 RSP: 0018:ffff8809b05cf7e8 EFLAGS: 00010206
Mar 30 02:44:35 bulk1 kernel: http://727149.489277 RAX: 0000000000005480 RBX: 0000000000000000 RCX: 0000000000000023
Mar 30 02:44:35 bulk1 kernel: http://727149.496514 RDX: 0000000800000000 RSI: 0000000000005480 RDI: ffffc90021cef910
Mar 30 02:44:35 bulk1 kernel: http://727149.503750 RBP: 000000000015201c R08: ffffc90022ea22f8 R09: 0000000000005480
Mar 30 02:44:35 bulk1 kernel: http://727149.510988 R10: 000000000015201c R11: 000000000000001c R12: ffffc9002a029900
Mar 30 02:44:35 bulk1 kernel: http://727149.518224 R13: 0000000000001057 R14: 000000000015201c R15: 000000000015201c
Mar 30 02:44:35 bulk1 kernel: http://727149.525462 FS: 00007f1a0794b720(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
Mar 30 02:44:35 bulk1 kernel: http://727149.533655 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 30 02:44:35 bulk1 kernel: http://727149.539494 CR2: ffffc90022ecc6f8 CR3: 00000009ad801000 CR4: 00000000000006e0
Mar 30 02:44:35 bulk1 kernel: http://727149.546732 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 30 02:44:35 bulk1 kernel: http://727149.553970 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 30 02:44:35 bulk1 kernel: http://727149.561209 Process smbd (pid: 13886, threadinfo ffff8809b05ce000, task ffff8809eecc0e20)
Mar 30 02:44:35 bulk1 kernel: http://727149.569485 Stack:
Mar 30 02:44:35 bulk1 kernel: http://727149.571593 0000000c0000000c 0000000000000000 0000000000000153 0100000000000292
Mar 30 02:44:35 bulk1 kernel: http://727149.578943 <0> 0000000000000001 00000000104f400a 0100000000000ff4 ffffc90021cef910
Mar 30 02:44:35 bulk1 kernel: http://727149.586767 <0> 0000000000000292 0000002a4038ffff ffff8809b05cfe48 ffffc9002a029900
Mar 30 02:44:35 bulk1 kernel: http://727149.594788 Call Trace:
Mar 30 02:44:35 bulk1 kernel: http://727149.597348 <ffffffffa04faa8b> ? _Z9gpfsWriteP13gpfsVfsData_tP15KernelOperationP9cxiNode_tiP8cxiUio_tP9MMFSVInfoP10cxiVattr_tSA_P10ext_cred_tPvi+0x101b/0x51e0 mmfs26
Mar 30 02:44:35 bulk1 kernel: http://727149.612663 <ffffffff81263983> ? sch_direct_xmit+0x7f/0x14c
Mar 30 02:44:35 bulk1 kernel: http://727149.618591 <ffffffff81250ed3> ? dev_queue_xmit+0x35b/0x38d
Mar 30 02:44:35 bulk1 kernel: http://727149.624518 <ffffffff812780e1> ? ip_queue_xmit+0x311/0x386
Mar 30 02:44:35 bulk1 kernel: http://727149.630356 <ffffffff8124a4f6> ? skb_copy_and_csum_datagram+0x6f/0x2b1
Mar 30 02:44:35 bulk1 kernel: http://727149.637236 <ffffffff810168f3> ? read_tsc+0xa/0x20
Mar 30 02:44:35 bulk1 kernel: http://727149.642382 <ffffffff81065231> ? remove_wait_queue+0x12/0x4d
Mar 30 02:44:35 bulk1 kernel: http://727149.648396 <ffffffff810fbfbb> ? poll_freewait+0x3d/0x8a
Mar 30 02:44:35 bulk1 kernel: http://727149.654062 <ffffffff810fc31e> ? do_sys_poll+0x316/0x391
Mar 30 02:44:35 bulk1 kernel: http://727149.659727 <ffffffff8124a2c3> ? memcpy_toiovec+0x34/0x63
Mar 30 02:44:35 bulk1 kernel: http://727149.665480 <ffffffff812fce89> ? _spin_lock_bh+0x9/0x25
Mar 30 02:44:35 bulk1 kernel: http://727149.671058 <ffffffff81243ea2> ? release_sock+0x13/0xa0
Mar 30 02:44:35 bulk1 kernel: http://727149.676638 <ffffffff8127f48b> ? tcp_recvmsg+0x98b/0xa9e
Mar 30 02:44:35 bulk1 kernel: http://727149.682304 <ffffffff810fd07b> ? pollwake+0x0/0x59
Mar 30 02:44:35 bulk1 kernel: http://727149.687452 <ffffffffa0478eff> ? gpfs_f_write+0x37f/0x500 mmfslinux
Mar 30 02:44:35 bulk1 kernel: http://727149.694245 <ffffffff810ef061> ? do_sync_read+0xce/0x113
Mar 30 02:44:35 bulk1 kernel: http://727149.699910 <ffffffff8100f6c4> ? __switch_to+0x1ad/0x297
Mar 30 02:44:35 bulk1 kernel: http://727149.705575 <ffffffff810ef8a0> ? vfs_write+0xa9/0x102
Mar 30 02:44:35 bulk1 kernel: http://727149.710977 <ffffffff810ef950> ? sys_pwrite64+0x57/0x77
Mar 30 02:44:35 bulk1 kernel: http://727149.716553 <ffffffff8101195b> ? device_not_available+0x1b/0x20
Mar 30 02:44:35 bulk1 kernel: http://727149.722825 <ffffffff81010b42> ? system_call_fastpath+0x16/0x1b
Mar 30 02:44:35 bulk1 kernel: http://727149.729094 Code: 6e f6 ff 89 d1 48 c7 c0 ff ff ff ff 31 db 48 d3 e8 b9 3f 00 00 00 48 89 c2 44 29 d9 48 c7 c0 ff ff ff ff 48 d3 e0 48 21 c2 89 f0 <49> 8b 04 c0 48 21 d0 48 39 c2 0f 95 c3 85 db 0f 94 44 24 1f e9
Mar 30 02:44:35 bulk1 kernel: http://727149.748898 RIP <ffffffffa057c92c> _ZNK8OpenFile28copyDataToPrevSnapshotNeededEjxx+0x2ec/0x310 mmfs26
Mar 30 02:44:35 bulk1 kernel: http://727149.758682 RSP <ffff8809b05cf7e8>
Mar 30 02:44:35 bulk1 kernel: http://727149.762263 CR2: ffffc90022ecc6f8
Mar 30 02:44:35 bulk1 kernel: http://727149.765992 --- end trace 356f418d74618589 ---

What could be the problem here, is this a bug in GPFS?

Thanks!
Updated on 2013-04-02T18:05:11Z at 2013-04-02T18:05:11Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    2092 Posts
    ACCEPTED ANSWER

    Re: GPFS en kernel error

    ‏2013-04-02T18:05:11Z  in response to SystemAdmin
    If you have GPFS service contract, please IBM service to collect debugging data for further analysis of the problem.