Hello list, my company is administrating an apache2 cluster with three servers. The DocumentRoot-Directory is an iscsi device with ocfs2.>From time to time there is the following kernel bug on one server (_not_every time the same one) which causes a server load above 1k on _each_ node, and finally the unreachability of the whole website. Sep 1 13:01:59 www01 kernel: [438503.058163] ------------[ cut here ]------------ Sep 1 13:01:59 www01 kernel: [438503.058211] kernel BUG at /build/buildd/linux-2.6.32/fs/ocfs2/dlmglue.c:742! Sep 1 13:01:59 www01 kernel: [438503.058267] invalid opcode: 0000 [#1] SMP Sep 1 13:01:59 www01 kernel: [438503.058320] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map Sep 1 13:01:59 www01 kernel: [438503.058405] CPU 7 Sep 1 13:01:59 www01 kernel: [438503.058446] Modules linked in: ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs crc32c mptctl ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi power_meter serio_raw ixgbe mdio ioatdma lp parport usbhid hid mptsas mptscsih mptbase scsi_transport_sas igb dca Sep 1 13:01:59 www01 kernel: [438503.058831] Pid: 3296, comm: ocfs2dc Not tainted 2.6.32-42-server #95-Ubuntu PRIMERGY BX920 S2 Sep 1 13:01:59 www01 kernel: [438503.058918] RIP: 0010:[<ffffffffa023f4b4>] [<ffffffffa023f4b4>] ocfs2_lock_res_free+0x1d4/0x4c0 [ocfs2] Sep 1 13:01:59 www01 kernel: [438503.059036] RSP: 0018:ffff880627f2fd20 EFLAGS: 00010286 Sep 1 13:01:59 www01 kernel: [438503.059086] RAX: 0000000000000062 RBX: ffff8805eea9e618 RCX: 0000000000000000 Sep 1 13:01:59 www01 kernel: [438503.059168] RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000246 Sep 1 13:01:59 www01 kernel: [438503.059251] RBP: ffff880627f2fd50 R08: 00000000ffffffff R09: ffffffff815b0480 Sep 1 13:01:59 www01 kernel: [438503.059333] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000100080000 Sep 1 13:01:59 www01 kernel: [438503.059415] R13: ffff88062d188000 R14: ffff88022cc5c000 R15: ffff88022cc5c720 Sep 1 13:01:59 www01 kernel: [438503.059498] FS: 0000000000000000(0000) GS:ffff88024e460000(0000) knlGS:0000000000000000 Sep 1 13:01:59 www01 kernel: [438503.059584] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 1 13:01:59 www01 kernel: [438503.059635] CR2: 00007ff8a7d2a008 CR3: 000000062d98d000 CR4: 00000000000006e0 Sep 1 13:01:59 www01 kernel: [438503.059718] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 1 13:01:59 www01 kernel: [438503.059800] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 1 13:01:59 www01 kernel: [438503.059883] Process ocfs2dc (pid: 3296, threadinfo ffff880627f2e000, task ffff88062d188000) Sep 1 13:01:59 www01 kernel: [438503.059968] Stack: Sep 1 13:01:59 www01 kernel: [438503.060006] ffff88022cc5c000 ffff88022cc5c720 ffff880627f2fd50 ffff8805eea9e600 Sep 1 13:01:59 www01 kernel: [438503.060070] <0> ffff8805eea9e618 ffff88022cc5c000 ffff880627f2fd80 ffffffffa0231197 Sep 1 13:01:59 www01 kernel: [438503.060167] <0> ffff8805eea9e618 ffff8805eea9e628 0000000000000282 ffff88022cc5c000 Sep 1 13:01:59 www01 kernel: [438503.060294] Call Trace: Sep 1 13:01:59 www01 kernel: [438503.060352] [<ffffffffa0231197>] ocfs2_dentry_lock_put+0x87/0x110 [ocfs2] Sep 1 13:01:59 www01 kernel: [438503.060424] [<ffffffffa0240217>] ocfs2_dentry_post_unlock+0x17/0x20 [ocfs2] Sep 1 13:01:59 www01 kernel: [438503.060497] [<ffffffffa02453e5>] ocfs2_process_blocked_lock+0x115/0x310 [ocfs2] Sep 1 13:01:59 www01 kernel: [438503.060599] [<ffffffffa02456aa>] ocfs2_downconvert_thread_do_work+0xca/0x190 [ocfs2] Sep 1 13:01:59 www01 kernel: [438503.060702] [<ffffffffa02457ee>] ocfs2_downconvert_thread+0x7e/0x1c0 [ocfs2] Sep 1 13:01:59 www01 kernel: [438503.060793] [<ffffffff81086470>] ? autoremove_wake_function+0x0/0x40 Sep 1 13:01:59 www01 kernel: [438503.060865] [<ffffffffa0245770>] ? ocfs2_downconvert_thread+0x0/0x1c0 [ocfs2] Sep 1 13:01:59 www01 kernel: [438503.060950] [<ffffffff810860f6>] kthread+0x96/0xa0 Sep 1 13:01:59 www01 kernel: [438503.061003] [<ffffffff810141aa>] child_rip+0xa/0x20 Sep 1 13:01:59 www01 kernel: [438503.061054] [<ffffffff81086060>] ? kthread+0x0/0xa0 Sep 1 13:01:59 www01 kernel: [438503.061103] [<ffffffff810141a0>] ? child_rip+0x0/0x20 Sep 1 13:01:59 www01 kernel: [438503.061152] Code: e8 b2 a3 df e0 66 90 85 c0 74 24 49 bc 00 00 08 00 01 00 00 00 4c 85 25 fb 22 f6 ff 74 0d 4c 85 25 fa 22 f6 ff 0f 84 d8 01 00 00 <0f> 0b eb fe 83 7b 6c 00 74 24 49 bc 00 00 08 00 01 00 00 00 4c Sep 1 13:01:59 www01 kernel: [438503.061547] RIP [<ffffffffa023f4b4>] ocfs2_lock_res_free+0x1d4/0x4c0 [ocfs2] Sep 1 13:01:59 www01 kernel: [438503.061648] RSP <ffff880627f2fd20> Sep 1 13:01:59 www01 kernel: [438503.062063] ---[ end trace b5022849011f56ab ]--- After upgrading the kernel last week we had this error two times (before: the bug occured just two times a year). Here are some detailed information about the systems: root at www01:~# cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=10.04 DISTRIB_CODENAME=lucid DISTRIB_DESCRIPTION="Ubuntu 10.04.4 LTS" root at www01:~# uname -a Linux www01 2.6.32-42-server #95-Ubuntu SMP Wed Jul 25 16:10:49 UTC 2012 x86_64 GNU/Linux root at www01:~# cat /etc/ocfs2/cluster.conf node: name = www01 cluster = ocfs2 number = 0 ip_address = 192.168.1.1 ip_port = 7777 node: name = www02 cluster = ocfs2 number = 1 ip_address = 192.168.1.2 ip_port = 7777 node: name = www03 cluster = ocfs2 number = 2 ip_address = 192.168.1.3 ip_port = 7777 cluster: name = ocfs2 node_count = 3 Does anybody know this bug, and how to fix? Thanks in advance, Jakob