Our OCFS2 filesystems are having real troubles and we're getting these kernel traces ... any idea about what can be causing this? ------------[ cut here ]------------ kernel BUG at fs/ocfs2/dlmglue.c:536! invalid opcode: 0000 [#1] SMP Modules linked in: sg iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc ipv6 dm_snapshot dm_mirror ide_disk e752x_edac psmouse floppy iTCO_wdt edac_mc serio_raw pcspkr shpchp pci_hotplug evdev joydev tsdev dm_round_robin dm_emc dm_multipath dm_mod ext3 mbcache sd_mod ide_generic ide_cd cdrom ata_generic libata cciss qla2xxx firmware_class uhci_hcd scsi_transport_fc ehci_hcd tg3 scsi_mod piix generic ide_core usbcore thermal processor fan ocfs2_dlmfs ocfs2 jbd ocfs2_dlm ocfs2_nodemanager configfs CPU: 2 EIP: 0060:[<f8943902>] Not tainted VLI EFLAGS: 00210046 (2.6.21-1-686 #1) EIP is at ocfs2_cluster_unlock+0xd2/0x2ae [ocfs2] eax: 00000000 ebx: f54cfa48 ecx: 00200246 edx: f54cfa50 esi: f761a000 edi: 00000005 ebp: f54cfa50 esp: f48b9cf0 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process maildrop (pid: 5514, ti=f48b8000 task=f5ca0540 task.ti=f48b8000) Stack: f54cfc34 f761a000 f6183f28 f592e8b4 f71935b8 00200246 f761a000 00000005 f54cfc34 f54cf980 f8943c16 f761a000 f54cfc34 00000000 ffffffe4 f761a000 f48b9e08 00000000 f8969b3a f761a000 f54cfc34 f39ba540 f48b9e08 00000000 Call Trace: [<f8943c16>] ocfs2_meta_unlock+0x138/0x18f [ocfs2] [<f8969b3a>] ocfs2_reserve_cluster_bitmap_bits+0x23/0xde [ocfs2] [<f8968a60>] ocfs2_free_alloc_context+0x1c/0x40 [ocfs2] [<f8969f1c>] ocfs2_reserve_clusters+0x327/0x396 [ocfs2] [<f89459d1>] ocfs2_data_lock_full+0x1db/0x2eb [ocfs2] [<f894cf71>] ocfs2_extend_file+0x589/0x12ee [ocfs2] [<f8946b81>] ocfs2_meta_lock_full+0x450/0x721 [ocfs2] [<c0179271>] mntput_no_expire+0x11/0x6a [<f894e506>] ocfs2_prepare_inode_for_write+0x830/0x913 [ocfs2] [<c0165ddf>] get_unused_fd+0x4a/0xaa [<f894eed2>] ocfs2_file_aio_write+0x1a5/0x31d [ocfs2] [<c016f619>] do_path_lookup+0x16e/0x189 [<c016711c>] do_sync_write+0xc7/0x10a [<c01700f0>] open_namei+0x6e/0x56a [<c0132969>] autoremove_wake_function+0x0/0x35 [<c0135544>] hrtimer_start+0xf7/0x101 [<c0167055>] do_sync_write+0x0/0x10a [<c0167900>] vfs_write+0xa8/0x12a [<c0167e8b>] sys_write+0x41/0x67 [<c0103d88>] syscall_call+0x7/0xb [<c0290000>] __find_acq_core+0x2a8/0x30d ======================Code: 00 00 c7 04 24 8e 6c 97 f8 89 44 24 04 e8 1a ed 7d c7 85 db 75 04 0f 0b eb fe 83 ff 03 74 16 83 ff 05 75 22 8b 43 4c 85 c0 75 04 <0f> 0b eb fe 48 89 43 4c eb 15 8b 43 48 85 c0 75 04 0f 0b eb fe EIP: [<f8943902>] ocfs2_cluster_unlock+0xd2/0x2ae [ocfs2] SS:ESP 0068:f48b9cf0 (166,3):ocfs2_lock_res_free:497 ERROR: bug expression: res->l_ex_holders (166,3):ocfs2_lock_res_free:497 ERROR: Lockres W000000000000000b37790d0805ff41 has 1 ex holders ------------[ cut here ]------------ kernel BUG at fs/ocfs2/dlmglue.c:497! invalid opcode: 0000 [#2] SMP Modules linked in: tcp_diag inet_diag sg iptable_filter ip_tables x_tables nfs lockd nfs_acl sunrpc ipv6 dm_snapshot dm_mirror ide_disk e752x_edac psmouse floppy iTCO_wdt edac_mc serio_raw pcspkr shpchp pci_hotplug evdev joydev tsdev dm_round_robin dm_emc dm_multipath dm_mod ext3 mbcache sd_mod ide_generic ide_cd cdrom ata_generic libata cciss qla2xxx firmware_class uhci_hcd scsi_transport_fc ehci_hcd tg3 scsi_mod piix generic ide_core usbcore thermal processor fan ocfs2_dlmfs ocfs2 jbd ocfs2_dlm ocfs2_nodemanager configfs CPU: 3 EIP: 0060:[<f8944342>] Not tainted VLI EFLAGS: 00010292 (2.6.21-1-686 #1) EIP is at ocfs2_lock_res_free+0x50e/0x578 [ocfs2] eax: 00000063 ebx: 00000003 ecx: 00000046 edx: 00000000 esi: f761a000 edi: 0000006a ebp: 00000000 esp: c21cfe5c ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process kswapd0 (pid: 166, ti=c21ce000 task=c2194580 task.ti=c21ce000) Stack: f8977233 000000a6 00000003 f8973a43 000001f1 f0cef6ac 00000001 f7164940 0000006a f8953db2 c011af62 c21cfed4 00000003 c2001890 00000001 c2001890 c21cfed4 00000001 f0cef934 f0cef680 f8e326e3 dc5e0d94 dc5e0e2c f55ff200 Call Trace: [<f8953db2>] ocfs2_clear_inode+0x50c/0xbdc [ocfs2] [<c011af62>] __wake_up_common+0x32/0x55 [<f8e326e3>] nfs3_forget_cached_acls+0x44/0x4e [nfs] [<c017755c>] clear_inode+0xa6/0xf4 [<c0177825>] dispose_list+0x46/0xc4 [<c0177a3d>] shrink_icache_memory+0x19a/0x1c2 [<c01548c5>] shrink_slab+0xd9/0x13e [<c0154c83>] kswapd+0x2c2/0x3cf [<c0132969>] autoremove_wake_function+0x0/0x35 [<c01549c1>] kswapd+0x0/0x3cf [<c013289e>] kthread+0xb2/0xdc [<c01327ec>] kthread+0x0/0xdc [<c01049a3>] kernel_thread_helper+0x7/0x10 ======================Code: 54 24 18 89 4c 24 14 c7 44 24 10 f1 01 00 00 c7 44 24 0c 43 3a 97 f8 89 5c 24 08 89 44 24 04 c7 04 24 33 72 97 f8 e8 c1 e2 7d c7 <0f> 0b eb fe 8d 7b 50 b9 13 00 00 00 f3 ab c7 43 20 00 00 00 00 EIP: [<f8944342>] ocfs2_lock_res_free+0x50e/0x578 [ocfs2] SS:ESP 0068:c21cfe5c
Could you tell me something about the kernel? As in, is this 2.6.21 from kernel.org or is this via from some distro. Any patches applied wrt ocfs2? Did this problem start after you did something? As in, after a kernel upgrade or something else. Are they random or do you notice a pattern? We do have a cumulative patch for ocfs2 atop 2.6.21 but none of the patches seem relevant. http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.21/ Isaac Clerencia wrote:> Our OCFS2 filesystems are having real troubles and we're getting these kernel > traces ... any idea about what can be causing this? > > ------------[ cut here ]------------ > kernel BUG at fs/ocfs2/dlmglue.c:536! > invalid opcode: 0000 [#1] > SMP > Modules linked in: sg iptable_filter ip_tables x_tables nfs lockd nfs_acl > sunrpc ipv6 dm_snapshot dm_mirror ide_disk e752x_edac psmouse floppy iTCO_wdt > edac_mc serio_raw pcspkr shpchp pci_hotplug evdev joydev tsdev dm_round_robin > dm_emc dm_multipath dm_mod ext3 mbcache sd_mod ide_generic ide_cd cdrom > ata_generic libata cciss qla2xxx firmware_class uhci_hcd scsi_transport_fc > ehci_hcd tg3 scsi_mod piix generic ide_core usbcore thermal processor fan > ocfs2_dlmfs ocfs2 jbd ocfs2_dlm ocfs2_nodemanager configfs > CPU: 2 > EIP: 0060:[<f8943902>] Not tainted VLI > EFLAGS: 00210046 (2.6.21-1-686 #1) > EIP is at ocfs2_cluster_unlock+0xd2/0x2ae [ocfs2] > eax: 00000000 ebx: f54cfa48 ecx: 00200246 edx: f54cfa50 > esi: f761a000 edi: 00000005 ebp: f54cfa50 esp: f48b9cf0 > ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > Process maildrop (pid: 5514, ti=f48b8000 task=f5ca0540 task.ti=f48b8000) > Stack: f54cfc34 f761a000 f6183f28 f592e8b4 f71935b8 00200246 f761a000 00000005 > f54cfc34 f54cf980 f8943c16 f761a000 f54cfc34 00000000 ffffffe4 f761a000 > f48b9e08 00000000 f8969b3a f761a000 f54cfc34 f39ba540 f48b9e08 00000000 > Call Trace: > [<f8943c16>] ocfs2_meta_unlock+0x138/0x18f [ocfs2] > [<f8969b3a>] ocfs2_reserve_cluster_bitmap_bits+0x23/0xde [ocfs2] > [<f8968a60>] ocfs2_free_alloc_context+0x1c/0x40 [ocfs2] > [<f8969f1c>] ocfs2_reserve_clusters+0x327/0x396 [ocfs2] > [<f89459d1>] ocfs2_data_lock_full+0x1db/0x2eb [ocfs2] > [<f894cf71>] ocfs2_extend_file+0x589/0x12ee [ocfs2] > [<f8946b81>] ocfs2_meta_lock_full+0x450/0x721 [ocfs2] > [<c0179271>] mntput_no_expire+0x11/0x6a > [<f894e506>] ocfs2_prepare_inode_for_write+0x830/0x913 [ocfs2] > [<c0165ddf>] get_unused_fd+0x4a/0xaa > [<f894eed2>] ocfs2_file_aio_write+0x1a5/0x31d [ocfs2] > [<c016f619>] do_path_lookup+0x16e/0x189 > [<c016711c>] do_sync_write+0xc7/0x10a > [<c01700f0>] open_namei+0x6e/0x56a > [<c0132969>] autoremove_wake_function+0x0/0x35 > [<c0135544>] hrtimer_start+0xf7/0x101 > [<c0167055>] do_sync_write+0x0/0x10a > [<c0167900>] vfs_write+0xa8/0x12a > [<c0167e8b>] sys_write+0x41/0x67 > [<c0103d88>] syscall_call+0x7/0xb > [<c0290000>] __find_acq_core+0x2a8/0x30d > ======================> Code: 00 00 c7 04 24 8e 6c 97 f8 89 44 24 04 e8 1a ed 7d c7 85 db 75 04 0f 0b > eb fe 83 ff 03 74 16 83 ff 05 75 22 8b 43 4c 85 c0 75 04 <0f> 0b eb fe 48 89 > 43 4c eb 15 8b 43 48 85 c0 75 04 0f 0b eb fe > EIP: [<f8943902>] ocfs2_cluster_unlock+0xd2/0x2ae [ocfs2] SS:ESP 0068:f48b9cf0 > (166,3):ocfs2_lock_res_free:497 ERROR: bug expression: res->l_ex_holders > (166,3):ocfs2_lock_res_free:497 ERROR: Lockres W000000000000000b37790d0805ff41 > has 1 ex holders > ------------[ cut here ]------------ > kernel BUG at fs/ocfs2/dlmglue.c:497! > invalid opcode: 0000 [#2] > SMP > Modules linked in: tcp_diag inet_diag sg iptable_filter ip_tables x_tables nfs > lockd nfs_acl sunrpc ipv6 dm_snapshot dm_mirror ide_disk e752x_edac psmouse > floppy iTCO_wdt edac_mc serio_raw pcspkr shpchp pci_hotplug evdev joydev > tsdev dm_round_robin dm_emc dm_multipath dm_mod ext3 mbcache sd_mod > ide_generic ide_cd cdrom ata_generic libata cciss qla2xxx firmware_class > uhci_hcd scsi_transport_fc ehci_hcd tg3 scsi_mod piix generic ide_core > usbcore thermal processor fan ocfs2_dlmfs ocfs2 jbd ocfs2_dlm > ocfs2_nodemanager configfs > CPU: 3 > EIP: 0060:[<f8944342>] Not tainted VLI > EFLAGS: 00010292 (2.6.21-1-686 #1) > EIP is at ocfs2_lock_res_free+0x50e/0x578 [ocfs2] > eax: 00000063 ebx: 00000003 ecx: 00000046 edx: 00000000 > esi: f761a000 edi: 0000006a ebp: 00000000 esp: c21cfe5c > ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 > Process kswapd0 (pid: 166, ti=c21ce000 task=c2194580 task.ti=c21ce000) > Stack: f8977233 000000a6 00000003 f8973a43 000001f1 f0cef6ac 00000001 f7164940 > 0000006a f8953db2 c011af62 c21cfed4 00000003 c2001890 00000001 c2001890 > c21cfed4 00000001 f0cef934 f0cef680 f8e326e3 dc5e0d94 dc5e0e2c f55ff200 > Call Trace: > [<f8953db2>] ocfs2_clear_inode+0x50c/0xbdc [ocfs2] > [<c011af62>] __wake_up_common+0x32/0x55 > [<f8e326e3>] nfs3_forget_cached_acls+0x44/0x4e [nfs] > [<c017755c>] clear_inode+0xa6/0xf4 > [<c0177825>] dispose_list+0x46/0xc4 > [<c0177a3d>] shrink_icache_memory+0x19a/0x1c2 > [<c01548c5>] shrink_slab+0xd9/0x13e > [<c0154c83>] kswapd+0x2c2/0x3cf > [<c0132969>] autoremove_wake_function+0x0/0x35 > [<c01549c1>] kswapd+0x0/0x3cf > [<c013289e>] kthread+0xb2/0xdc > [<c01327ec>] kthread+0x0/0xdc > [<c01049a3>] kernel_thread_helper+0x7/0x10 > ======================> Code: 54 24 18 89 4c 24 14 c7 44 24 10 f1 01 00 00 c7 44 24 0c 43 3a 97 f8 89 > 5c 24 08 89 44 24 04 c7 04 24 33 72 97 f8 e8 c1 e2 7d c7 <0f> 0b eb fe 8d 7b > 50 b9 13 00 00 00 f3 ab c7 43 20 00 00 00 00 > EIP: [<f8944342>] ocfs2_lock_res_free+0x50e/0x578 [ocfs2] SS:ESP 0068:c21cfe5c > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users >
Check whether it is the same as in 2.6.21. We run continuous tests against mainline kernels and we've not encountered these problems. The patch fix atop 2.6.21 is as follows: http://www.kernel.org/pub/linux/kernel/people/mfasheh/ocfs2/backports/2.6.21/ The break up of the above can be viewed here: http://git.kernel.org/?p=linux/kernel/git/mfasheh/ocfs2.git;a=shortlog;h=2.6.21_fixes Keep in mind, none of the patch fixes addresses the problem you encountered. I would make sure that the code is what it is supposed to be. Isaac Clerencia wrote:> On Friday 01 June 2007 19:50:43 you wrote: > >> Could you tell me something about the kernel? >> >> As in, is this 2.6.21 from kernel.org or is this via from some distro. >> Any patches applied wrt ocfs2? >> > 2.6.21 from Debian, it shouldn't have any changes wrt ocfs2. > > >> Did this problem start after you did something? As in, after a kernel >> upgrade or something else. >> >> Are they random or do you notice a pattern? >> > We are using a EMC Clariion SAN. > > With our current usage it reports 100% utilization for the volume holding the > OCFS2 filesystem. When the filesystem is not being used intensively > everything works fine. >