Philip Poten
2011-Dec-04 20:59 UTC
[Gluster-users] Kernel oopses with gluster fuse on squeeze
Hi, We've been experiencing repeated (every other day) oopses on hosts with high load glusterfs accesses. Co-occurrent (not immediately tho, but there is some sort of connection) are hanging nginx processes (doing the accessing), which can not be stopped, killed and also block the shutdown of the respective openvz instance. I think I remember at least one instance where this occurred without a warning-oops. The only way to make things working again is a reboot. A bit of googling led me to believe, that the problem here might be with fuse, not glusterfs. The kernel bug that covers this specific problem however is not available since the kernel.orgbugzilla is down. Also, there are no indications in gluster.log as to what this may have caused. We're running Gluster 3.2.1 from debian packages provided. The things I've tried are: using an older version of the kernel (lenny backports) than squeeze, and disabling swap (since the process that oopses is kswapd), the solution currently being tried is using nfs for high load hosts, I hope this doesn't crash the gluster server :). This mail is more of a JFYI than a bug report - perhaps somebody else has seen this problem too and can provide more insight. Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.037978] CPU 3 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038004] Modules linked in: vzethdev vznetdev simfs vzrst vzcpt vzdquota vzmon vzdev xt_tcpudp xt_lengt h xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables x_tables ipmi_devintf ipmi_si ipmi_msghan dler nfs lockd fscache nfs_acl auth_rpcgss sunrpc 8021q garp bridge stp fuse loop snd_pcm snd_timer snd soundcore snd_page_alloc psmouse dcdbas pcspkr serio_raw joydev evdev power_meter button processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid1 md_mod sd_mod crc_t10di f sg sr_mod cdrom usbhid hid ata_generic ata_piix uhci_hcd ehci_hcd mptsas mptscsih mptbase scsi_transport_sas libata usbcore nls_base scsi_mod bnx2 thermal fan thermal_sys [last unloaded: scsi_wait_scan] Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038545] Pid: 48, comm: kswapd1 Not tainted 2.6.32-bpo.5-openvz-amd64 #1 feoktistov PowerEdge R410 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038602] RIP: 0010:[<ffffffff81102ade>] [<ffffffff81102ade>] clear_inode+0x1b/0xd0 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038665] RSP: 0018:ffff88083c919c30 EFLAGS: 00010202 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038697] RAX: 0000000000000000 RBX: ffff8805d8f9b000 RCX: ffff88083c919c90 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038734] RDX: 0000000000000000 RSI: ffffea0016f46000 RDI: ffff8805d8f9b000 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038770] RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000000 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038806] R10: 0000000000000040 R11: 0000000000000002 R12: dead000000100100 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038842] R13: ffff88083c919c90 R14: ffff8804314d3cf0 R15: ffff88083c919d24 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038879] FS: 0000000000000000(0000) GS:ffff880011a20000(0000) knlGS:0000000000000000 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038933] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.038966] CR2: 00007fe22466054c CR3: 0000000001001000 CR4: 00000000000006e0 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039039] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039076] Process kswapd1 (pid: 48, veid=0, threadinfo ffff88083c918000, task ffff88043d082000) Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039155] ffff8805d8f9b000 ffffffff81103568 ffff8805d0213be0 ffff8805d0213bd8 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039199] <0> ffff8805d0213be0 ffffffff810ff58b 0000000000000100 ffff8805d0213bd8 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039263] <0> ffff8804314d3c00 ffffffff810ff820 0000000000000000 0000000000000008 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039375] [<ffffffff81103568>] ? generic_delete_inode+0xec/0x160 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039411] [<ffffffff810ff58b>] ? d_kill+0x40/0x61 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039443] [<ffffffff810ff820>] ? __shrink_dcache_sb+0x274/0x30b Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039479] [<ffffffff810ff9bc>] ? shrink_dcache_memory+0x105/0x216 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039516] [<ffffffff810c21c8>] ? shrink_slab+0x10e/0x189 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039550] [<ffffffff810c2a5d>] ? kswapd+0x4c3/0x659 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039583] [<ffffffff810bffa3>] ? isolate_pages_global+0x0/0x1ff Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039620] [<ffffffff8106680a>] ? autoremove_wake_function+0x0/0x2e Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039655] [<ffffffff810c259a>] ? kswapd+0x0/0x659 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039687] [<ffffffff8106653e>] ? kthread+0xc0/0xca Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039722] [<ffffffff81011c6a>] ? child_rip+0xa/0x20 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039754] [<ffffffff8106647e>] ? kthread+0x0/0xca Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.039786] [<ffffffff81011c60>] ? child_rip+0x0/0x20 Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.040100] RSP <ffff88083c919c30> Dec 2 19:46:50 hn-r410-openvz01 kernel: [ 9903.040496] ---[ end trace ddd95abf24096674 ]--- -------------- next part -------------- An HTML attachment was scrubbed... URL: <supercolony.gluster.org/pipermail/gluster-users/attachments/20111204/1fe04303/attachment.html>