Hi, we ran into a BUG() in ocfs2_get_clusters_nocache: [Fri Oct 18 10:52:28 2013] ------------[ cut here ]------------ [Fri Oct 18 10:52:28 2013] Kernel BUG at ffffffffa028ad5a [verbose debug info unavailable] [Fri Oct 18 10:52:28 2013] invalid opcode: 0000 [#1] SMP [Fri Oct 18 10:52:28 2013] Modules linked in: vhost_net vhost macvtap macvlan drbd ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables ocfs2_stack_o2cb rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd fscache sunrpc bridge stp llc w83795 coretemp kvm_intel kvm lru_cache dlm sctp libcrc32c ocfs2_dlm ocfs2_dlmfs ocfs2 ocfs2_stackglue ocfs2_nodemanager configfs quota_tree snd_pcm e1000e snd_page_alloc snd_timer ixgbe snd joydev hid_generic usbmouse usbkbd psmouse usbhid soundcore iTCO_wdt i7core_edac ioatdma gpio_ich hid ptp edac_core iTCO_vendor_support i2c_i801 pcspkr mac_hid lpc_ich serio_raw ses mdio enclosure pps_core dca [last unloaded: evbug] [Fri Oct 18 10:52:28 2013] CPU: 3 PID: 16938 Comm: qemu-system-x86 Tainted: G W 3.11.4 #1 [Fri Oct 18 10:52:28 2013] Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0c 05/15/2012 [Fri Oct 18 10:52:28 2013] task: ffff880c69b62ee0 ti: ffff88130978e000 task.ti: ffff88130978e000 [Fri Oct 18 10:52:28 2013] RIP: 0010:[<ffffffffa028ad5a>] [<ffffffffa028ad5a>] ocfs2_get_clusters_nocache.isra.11+0x4aa/0x530 [ocfs2] [Fri Oct 18 10:52:28 2013] RSP: 0018:ffff88130978f708 EFLAGS: 00010297 [Fri Oct 18 10:52:28 2013] RAX: 00000000000000fa RBX: 0000000000000000 RCX: 000000000012cbd4 [Fri Oct 18 10:52:28 2013] RDX: ffff880868180fe0 RSI: 000000000012cbd3 RDI: ffff880868180030 [Fri Oct 18 10:52:28 2013] RBP: ffff88130978f788 R08: 000000000012cbd4 R09: 00000000000000fc [Fri Oct 18 10:52:28 2013] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88130978f7c8 [Fri Oct 18 10:52:28 2013] R13: ffff880868180030 R14: ffff88176cc7a000 R15: 0000000000000000 [Fri Oct 18 10:52:28 2013] FS: 00007f32c4ff9700(0000) GS:ffff8817dfc60000(0000) knlGS:0000000000000000 [Fri Oct 18 10:52:28 2013] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [Fri Oct 18 10:52:28 2013] CR2: 00007f34f4074000 CR3: 0000002c5d211000 CR4: 00000000000027e0 [Fri Oct 18 10:52:28 2013] DR0: 0000000000000001 DR1: 0000000000000002 DR2: 0000000000000001 [Fri Oct 18 10:52:28 2013] DR3: 000000000000000a DR6: 00000000ffff0ff0 DR7: 0000000000000400 [Fri Oct 18 10:52:28 2013] Stack: [Fri Oct 18 10:52:28 2013] ffff881300000000 0000000000000000 ffff88130978f7e4 ffff880868180000 [Fri Oct 18 10:52:28 2013] ffff882fb66ded80 0012cbd300000001 ffff88130978f8d4 ffff8808ef23f270 [Fri Oct 18 10:52:28 2013] ffff88130978f778 ffffffffa02969fb ffff8817dfc545b0 0000000000000000 [Fri Oct 18 10:52:28 2013] Call Trace: [Fri Oct 18 10:52:28 2013] [<ffffffffa02969fb>] ? ocfs2_read_inode_block_full+0x3b/0x60 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffffa028b2be>] ocfs2_get_clusters+0x23e/0x3b0 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffff8109a9ad>] ? sched_clock_cpu+0xbd/0x110 [Fri Oct 18 10:52:28 2013] [<ffffffffa028b48a>] ocfs2_extent_map_get_blocks+0x5a/0x190 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffffa026eb3a>] ocfs2_direct_IO_get_blocks+0x5a/0x160 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffff811c87c1>] ? inode_dio_done+0x31/0x40 [Fri Oct 18 10:52:28 2013] [<ffffffff811ea90c>] do_blockdev_direct_IO+0xdfc/0x1fb0 [Fri Oct 18 10:52:28 2013] [<ffffffffa026eae0>] ? ocfs2_dio_end_io+0x110/0x110 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffff811ebb15>] __blockdev_direct_IO+0x55/0x60 [Fri Oct 18 10:52:28 2013] [<ffffffffa026eae0>] ? ocfs2_dio_end_io+0x110/0x110 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffffa026e9d0>] ? ocfs2_direct_IO+0x80/0x80 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffffa026e9c3>] ocfs2_direct_IO+0x73/0x80 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffffa026eae0>] ? ocfs2_dio_end_io+0x110/0x110 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffffa026e9d0>] ? ocfs2_direct_IO+0x80/0x80 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffff81146e2b>] generic_file_aio_read+0x6bb/0x720 [Fri Oct 18 10:52:28 2013] [<ffffffff8172168e>] ? _raw_spin_lock+0xe/0x20 [Fri Oct 18 10:52:28 2013] [<ffffffffa02843db>] ? __ocfs2_cluster_unlock.isra.32+0x9b/0xe0 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffffa02847a9>] ? ocfs2_inode_unlock+0xb9/0x130 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffffa028dcf9>] ocfs2_file_aio_read+0xd9/0x3c0 [ocfs2] [Fri Oct 18 10:52:28 2013] [<ffffffff811ae425>] do_sync_readv_writev+0x65/0x90 [Fri Oct 18 10:52:28 2013] [<ffffffff811afba2>] do_readv_writev+0xd2/0x2b0 [Fri Oct 18 10:52:28 2013] [<ffffffff811eeda2>] ? fsnotify+0x1d2/0x2b0 [Fri Oct 18 10:52:28 2013] [<ffffffff811ae500>] ? do_sync_write+0xb0/0xb0 [Fri Oct 18 10:52:28 2013] [<ffffffff811f8886>] ? eventfd_write+0x1a6/0x210 [Fri Oct 18 10:52:28 2013] [<ffffffff811afe09>] vfs_readv+0x39/0x50 [Fri Oct 18 10:52:28 2013] [<ffffffff811b0062>] SyS_preadv+0xc2/0xd0 [Fri Oct 18 10:52:28 2013] [<ffffffff8172a59d>] system_call_fastpath+0x1a/0x1f [Fri Oct 18 10:52:28 2013] Code: b9 00 02 00 00 49 c7 c0 f0 8d 2f a0 48 c7 c7 b8 28 30 a0 e8 82 b1 48 e1 e9 07 fd ff ff 0f 1f 40 00 bb 01 00 00 00 e9 68 fe ff ff <0f> 0b 48 8b 55 a0 48 c7 c6 10 8e 2f a0 bb e2 ff ff ff 4c 8b 47 [Fri Oct 18 10:52:28 2013] RIP [<ffffffffa028ad5a>] ocfs2_get_clusters_nocache.isra.11+0x4aa/0x530 [ocfs2] [Fri Oct 18 10:52:28 2013] RSP <ffff88130978f708> [Fri Oct 18 10:52:28 2013] ---[ end trace 1831bd3aefe19b02 ]--- https://gist.github.com/David-Weber/f3072dd5c44a6ce593b6 (gdb) list *(ocfs2_get_clusters_nocache+0x4aa) 0xa6a is in ocfs2_get_clusters_nocache (fs/ocfs2/extent_map.c:475). 470 goto out_hole; 471 } 472 473 rec = &el->l_recs[i]; 474 475 BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos)); 476 477 if (!rec->e_blkno) { 478 ocfs2_error(inode->i_sb, "Inode %lu has bad extent " 479 "record (%u, %u, 0)", inode->i_ino, This happend the second time but I don't have a reproducer. It is a KVM host with a dual Primary DRBD/OCFS2 System. Kernel is 3.11.4 Thanks! Cheers, David
Goldwyn Rodrigues
2013-Oct-23 12:09 UTC
[Ocfs2-devel] Kernel BUG in ocfs2_get_clusters_nocache
Hi David, On 10/21/2013 02:53 AM, David Weber wrote:> Hi, > > we ran into a BUG() in ocfs2_get_clusters_nocache: > > [Fri Oct 18 10:52:28 2013] ------------[ cut here ]------------ > [Fri Oct 18 10:52:28 2013] Kernel BUG at ffffffffa028ad5a [verbose debug info > unavailable] > [Fri Oct 18 10:52:28 2013] invalid opcode: 0000 [#1] SMP > [Fri Oct 18 10:52:28 2013] Modules linked in: vhost_net vhost macvtap macvlan > drbd ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables > x_tables ocfs2_stack_o2cb rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd fscache > sunrpc bridge stp llc w83795 coretemp kvm_intel kvm lru_cache dlm sctp > libcrc32c ocfs2_dlm ocfs2_dlmfs ocfs2 ocfs2_stackglue ocfs2_nodemanager > configfs quota_tree snd_pcm e1000e snd_page_alloc snd_timer ixgbe snd joydev > hid_generic usbmouse usbkbd psmouse usbhid soundcore iTCO_wdt i7core_edac > ioatdma gpio_ich hid ptp edac_core iTCO_vendor_support i2c_i801 pcspkr mac_hid > lpc_ich serio_raw ses mdio enclosure pps_core dca [last unloaded: evbug] > [Fri Oct 18 10:52:28 2013] CPU: 3 PID: 16938 Comm: qemu-system-x86 Tainted: G > W 3.11.4 #1 > [Fri Oct 18 10:52:28 2013] Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0c > 05/15/2012 > [Fri Oct 18 10:52:28 2013] task: ffff880c69b62ee0 ti: ffff88130978e000 task.ti: > ffff88130978e000 > [Fri Oct 18 10:52:28 2013] RIP: 0010:[<ffffffffa028ad5a>] [<ffffffffa028ad5a>] > ocfs2_get_clusters_nocache.isra.11+0x4aa/0x530 [ocfs2] > [Fri Oct 18 10:52:28 2013] RSP: 0018:ffff88130978f708 EFLAGS: 00010297 > [Fri Oct 18 10:52:28 2013] RAX: 00000000000000fa RBX: 0000000000000000 RCX: > 000000000012cbd4 > [Fri Oct 18 10:52:28 2013] RDX: ffff880868180fe0 RSI: 000000000012cbd3 RDI: > ffff880868180030 > [Fri Oct 18 10:52:28 2013] RBP: ffff88130978f788 R08: 000000000012cbd4 R09: > 00000000000000fc > [Fri Oct 18 10:52:28 2013] R10: 0000000000000000 R11: 0000000000000000 R12: > ffff88130978f7c8 > [Fri Oct 18 10:52:28 2013] R13: ffff880868180030 R14: ffff88176cc7a000 R15: > 0000000000000000 > [Fri Oct 18 10:52:28 2013] FS: 00007f32c4ff9700(0000) GS:ffff8817dfc60000(0000) > knlGS:0000000000000000 > [Fri Oct 18 10:52:28 2013] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [Fri Oct 18 10:52:28 2013] CR2: 00007f34f4074000 CR3: 0000002c5d211000 CR4: > 00000000000027e0 > [Fri Oct 18 10:52:28 2013] DR0: 0000000000000001 DR1: 0000000000000002 DR2: > 0000000000000001 > [Fri Oct 18 10:52:28 2013] DR3: 000000000000000a DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [Fri Oct 18 10:52:28 2013] Stack: > [Fri Oct 18 10:52:28 2013] ffff881300000000 0000000000000000 ffff88130978f7e4 > ffff880868180000 > [Fri Oct 18 10:52:28 2013] ffff882fb66ded80 0012cbd300000001 ffff88130978f8d4 > ffff8808ef23f270 > [Fri Oct 18 10:52:28 2013] ffff88130978f778 ffffffffa02969fb ffff8817dfc545b0 > 0000000000000000 > [Fri Oct 18 10:52:28 2013] Call Trace: > [Fri Oct 18 10:52:28 2013] [<ffffffffa02969fb>] ? > ocfs2_read_inode_block_full+0x3b/0x60 [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffffa028b2be>] ocfs2_get_clusters+0x23e/0x3b0 > [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffff8109a9ad>] ? sched_clock_cpu+0xbd/0x110 > [Fri Oct 18 10:52:28 2013] [<ffffffffa028b48a>] > ocfs2_extent_map_get_blocks+0x5a/0x190 [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffffa026eb3a>] > ocfs2_direct_IO_get_blocks+0x5a/0x160 [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffff811c87c1>] ? inode_dio_done+0x31/0x40 > [Fri Oct 18 10:52:28 2013] [<ffffffff811ea90c>] > do_blockdev_direct_IO+0xdfc/0x1fb0 > [Fri Oct 18 10:52:28 2013] [<ffffffffa026eae0>] ? ocfs2_dio_end_io+0x110/0x110 > [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffff811ebb15>] __blockdev_direct_IO+0x55/0x60 > [Fri Oct 18 10:52:28 2013] [<ffffffffa026eae0>] ? ocfs2_dio_end_io+0x110/0x110 > [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffffa026e9d0>] ? ocfs2_direct_IO+0x80/0x80 > [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffffa026e9c3>] ocfs2_direct_IO+0x73/0x80 [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffffa026eae0>] ? ocfs2_dio_end_io+0x110/0x110 > [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffffa026e9d0>] ? ocfs2_direct_IO+0x80/0x80 > [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffff81146e2b>] generic_file_aio_read+0x6bb/0x720 > [Fri Oct 18 10:52:28 2013] [<ffffffff8172168e>] ? _raw_spin_lock+0xe/0x20 > [Fri Oct 18 10:52:28 2013] [<ffffffffa02843db>] ? > __ocfs2_cluster_unlock.isra.32+0x9b/0xe0 [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffffa02847a9>] ? ocfs2_inode_unlock+0xb9/0x130 > [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffffa028dcf9>] ocfs2_file_aio_read+0xd9/0x3c0 > [ocfs2] > [Fri Oct 18 10:52:28 2013] [<ffffffff811ae425>] do_sync_readv_writev+0x65/0x90 > [Fri Oct 18 10:52:28 2013] [<ffffffff811afba2>] do_readv_writev+0xd2/0x2b0 > [Fri Oct 18 10:52:28 2013] [<ffffffff811eeda2>] ? fsnotify+0x1d2/0x2b0 > [Fri Oct 18 10:52:28 2013] [<ffffffff811ae500>] ? do_sync_write+0xb0/0xb0 > [Fri Oct 18 10:52:28 2013] [<ffffffff811f8886>] ? eventfd_write+0x1a6/0x210 > [Fri Oct 18 10:52:28 2013] [<ffffffff811afe09>] vfs_readv+0x39/0x50 > [Fri Oct 18 10:52:28 2013] [<ffffffff811b0062>] SyS_preadv+0xc2/0xd0 > [Fri Oct 18 10:52:28 2013] [<ffffffff8172a59d>] system_call_fastpath+0x1a/0x1f > [Fri Oct 18 10:52:28 2013] Code: b9 00 02 00 00 49 c7 c0 f0 8d 2f a0 48 c7 c7 > b8 28 30 a0 e8 82 b1 48 e1 e9 07 fd ff ff 0f 1f 40 00 bb 01 00 00 00 e9 68 fe ff > ff <0f> 0b 48 8b 55 a0 48 c7 c6 10 8e 2f a0 bb e2 ff ff ff 4c 8b 47 > [Fri Oct 18 10:52:28 2013] RIP [<ffffffffa028ad5a>] > ocfs2_get_clusters_nocache.isra.11+0x4aa/0x530 [ocfs2] > [Fri Oct 18 10:52:28 2013] RSP <ffff88130978f708> > [Fri Oct 18 10:52:28 2013] ---[ end trace 1831bd3aefe19b02 ]--- > > https://gist.github.com/David-Weber/f3072dd5c44a6ce593b6 > > (gdb) list *(ocfs2_get_clusters_nocache+0x4aa) > 0xa6a is in ocfs2_get_clusters_nocache (fs/ocfs2/extent_map.c:475). > 470 goto out_hole; > 471 } > 472 > 473 rec = &el->l_recs[i]; > 474 > 475 BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos)); > 476 > 477 if (!rec->e_blkno) { > 478 ocfs2_error(inode->i_sb, "Inode %lu has bad extent " > 479 "record (%u, %u, 0)", inode->i_ino, > > This happend the second time but I don't have a reproducer. > It is a KVM host with a dual Primary DRBD/OCFS2 System. > Kernel is 3.11.4 >It seems your data structures on disk are corrupted. Have you tried running the fsck.ocfs2 as yet? If yes, what errors is the fsck fixing? -- Goldwyn