Hi, while running extensive qgroup and tree mod log tests I got to the following BUG, which is probably not related to tree mod log: [71793.530286] ------------[ cut here ]------------ [71793.585407] kernel BUG at fs/btrfs/ctree.c:3166! [71793.640487] invalid opcode: 0000 [#1] PREEMPT SMP [71793.697953] Modules linked in: btrfs mpt2sas scsi_transport_sas raid_class [last unloaded: btrfs] [71793.804482] CPU 3 [71793.826370] Pid: 31128, comm: fsstress Tainted: G W 3.6.0+ #2 Supermicro X8SIL/X8SIL [71793.931342] RIP: 0010:[<ffffffffa050a348>] [<ffffffffa050a348>] btrfs_set_item_key_safe+0x1d8/0x250 [btrfs] [71794.048765] RSP: 0018:ffff8801cfd0fa28 EFLAGS: 00010246 [71794.112142] RAX: ffff8801cfd0fb68 RBX: ffff8801cfd0fa38 RCX: 000000000000006c [71794.197304] RDX: 0000000000000132 RSI: 00000000000f6000 RDI: 00000000000f6000 [71794.282467] RBP: ffff8801cfd0fa88 R08: 0000000000000000 R09: ffff8801cfd0fa40 [71794.367629] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880231425000 [71794.452790] R13: ffff88023125f000 R14: 0000000000000000 R15: ffff8801d25f9ce0 [71794.537951] FS: 00007f4d458fc700(0000) GS:ffff880236c00000(0000) knlGS:0000000000000000 [71794.634523] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [71794.703088] CR2: 00007f4d44d76cdc CR3: 00000001cfe57000 CR4: 00000000000007e0 [71794.788250] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [71794.873410] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [71794.958573] Process fsstress (pid: 31128, threadinfo ffff8801cfd0e000, task ffff88023273c500) [71795.060330] Stack: [71795.084294] ffff8801cfd0fb68 ffff8802315fb000 0000000000000132 000000000f60006c [71795.172981] 0000000000000000 ffff8801d0307000 ffff8801cfd0fa88 ffff880231425000 [71795.261668] ffff8801d25f9ce0 0000000000105000 00000000000a8000 0000000000000002 [71795.350357] Call Trace: [71795.379520] [<ffffffffa05452ec>] __btrfs_drop_extents+0x5bc/0xba0 [btrfs] [71795.461564] [<ffffffffa0511c94>] ? btrfs_search_slot+0xb34/0xb40 [btrfs] [71795.542581] [<ffffffffa0569679>] btrfs_log_changed_extents+0x659/0x6f0 [btrfs] [71795.629816] [<ffffffffa056ca6c>] btrfs_log_inode+0x4dc/0x650 [btrfs] [71795.706774] [<ffffffff811b0b00>] ? d_hash_and_lookup+0x30/0x70 [71795.777735] [<ffffffffa056cd70>] btrfs_log_inode_parent+0x190/0x4b0 [btrfs] [71795.862367] [<ffffffff811b0b57>] ? dget_parent+0x17/0xd0 [71795.927105] [<ffffffffa056d13f>] btrfs_log_dentry_safe+0x3f/0x60 [btrfs] [71796.008427] [<ffffffffa0543611>] btrfs_sync_file+0x121/0x290 [btrfs] [71796.085695] [<ffffffff811c7f73>] vfs_fsync_range+0x23/0x30 [71796.152496] [<ffffffff811c7f97>] vfs_fsync+0x17/0x20 [71796.213098] [<ffffffff811c81c4>] do_fsync+0x34/0x60 [71796.272615] [<ffffffff811c81fe>] sys_fdatasync+0xe/0x20 [71796.336305] [<ffffffff8193f5a2>] system_call_fastpath+0x16/0x1b [71796.408292] Code: 7d f8 c9 c3 0f 1f 80 00 00 00 00 72 1e 0f b6 48 08 40 38 ce 76 0d 48 8b 70 09 e9 5f ff ff ff 0f 1f 40 00 73 36 66 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 eb f8 66 0f 1f 44 00 00 48 3b 48 09 0f [71796.642304] RIP [<ffffffffa050a348>] btrfs_set_item_key_safe+0x1d8/0x250 [btrfs] [71796.732134] RSP <ffff8801cfd0fa28> [71796.774296] ---[ end trace f2639a7e4750b6d5 ]--- Due to added printk statements, my line numbers are shifted. The corresponding BUG is: 3150 void btrfs_set_item_key_safe(struct btrfs_trans_handle *trans, 3151 struct btrfs_root *root, struct btrfs_path *path, 3152 struct btrfs_key *new_key) 3153 { 3154 struct btrfs_disk_key disk_key; 3155 struct extent_buffer *eb; 3156 int slot; 3157 3158 eb = path->nodes[0]; 3159 slot = path->slots[0]; 3160 if (slot > 0) { 3161 btrfs_item_key(eb, &disk_key, slot - 1); 3162 BUG_ON(comp_keys(&disk_key, new_key) >= 0); 3163 } 3164 if (slot < btrfs_header_nritems(eb) - 1) { 3165 btrfs_item_key(eb, &disk_key, slot + 1); 3166 BUG_ON(comp_keys(&disk_key, new_key) <= 0); 3167 } -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 26, 2012 at 12:35:43AM -0600, Jan Schmidt wrote:> Hi, > > while running extensive qgroup and tree mod log tests I got to the following > BUG, which is probably not related to tree mod log:So our keys are out of order...are you able to reproduce this? -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, October 26, 2012 at 14:19 (+0200), Chris Mason wrote:> On Fri, Oct 26, 2012 at 12:35:43AM -0600, Jan Schmidt wrote: >> Hi, >> >> while running extensive qgroup and tree mod log tests I got to the following >> BUG, which is probably not related to tree mod log: > > So our keys are out of order...are you able to reproduce this?No reproducer yet. My tests ran fine for about 15 hours before that happened. I''ll have an extensive amount of (tree mod log related) debug information if I should get there again. My guess is that it''s not related to tree mod log, though, as there are tree log functions (fs/btrfs/tree-log.c) on the stack. I don''t think rewinded buffers can make it there. I''ll call back when I''ll hit this again. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 26, 2012 at 06:30:06AM -0600, Jan Schmidt wrote:> On Fri, October 26, 2012 at 14:19 (+0200), Chris Mason wrote: > > On Fri, Oct 26, 2012 at 12:35:43AM -0600, Jan Schmidt wrote: > >> Hi, > >> > >> while running extensive qgroup and tree mod log tests I got to the following > >> BUG, which is probably not related to tree mod log: > > > > So our keys are out of order...are you able to reproduce this? > > No reproducer yet. My tests ran fine for about 15 hours before that happened. > I''ll have an extensive amount of (tree mod log related) debug information if I > should get there again. > > My guess is that it''s not related to tree mod log, though, as there are tree log > functions (fs/btrfs/tree-log.c) on the stack. I don''t think rewinded buffers can > make it there. > > I''ll call back when I''ll hit this again.This is more likely to be memory corruption. I''d suggest turning on slab debug and CONFIG_DEBUG_PAGE_ALLOC...hopefully you''ll get a faster crash. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Fri, Oct 26, 2012 at 08:35:43AM +0200, Jan Schmidt wrote:> while running extensive qgroup and tree mod log tests I got to the following > BUG, which is probably not related to tree mod log:I''ve left the fsx DIO tester overnight, after Josef identified the fix for the csum mismatch bug, and found it crashed at the same point as Jan reported: [80582.012720] ------------[ cut here ]------------ [80582.013045] kernel BUG at fs/btrfs/ctree.c:2945! [80582.013045] invalid opcode: 0000 [#1] PREEMPT SMP io_blk virtio_pci sr_mod 8139cp parport microcode virtio_ring floppy virtio sg pcspkr button i2c_piix4 cdrom processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_ge neric ata_piix [last unloaded: btrfs] [80582.013045] CPU 16 [80582.013045] Pid: 13045, comm: btrfs-endio-wri Tainted: G W 3.8.0-rc4-desktop+ #54 Bochs Bochs [80582.013045] RIP: 0010:[<ffffffffa0393a25>] [<ffffffffa0393a25>] btrfs_set_item_key_safe+0x65/0x1e0 [btrfs] [80582.013045] RSP: 0018:ffff880adb9d9ad8 EFLAGS: 00010206 [80582.013045] RAX: 000000000000006c RBX: ffff880adb9d9c18 RCX: 000000000000012e [80582.013045] RDX: 0000000000016000 RSI: ffff880adeeee076 RDI: ffff880adb9d9af9 [80582.013045] RBP: ffff880adb9d9b38 R08: ffff880ae4cd9dc0 R09: 0000000000000000 [80582.013045] R10: ffff880bee7c0f40 R11: 0000000000000015 R12: 0000000000000001 [80582.013045] R13: ffff880add976c78 R14: ffff880adb9d9ae8 R15: ffff880be92301c0 [80582.013045] FS: 0000000000000000(0000) GS:ffff880c1f200000(0000) knlGS:0000000000000000 [80582.013045] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [80582.013045] CR2: ffffffffff600400 CR3: 0000000be6cb8000 CR4: 00000000000006e0 [80582.013045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [80582.013045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [80582.013045] Process btrfs-endio-wri (pid: 13045, threadinfo ffff880adb9d8000, task ffff880b301545c0) [80582.013045] Stack: [80582.013045] ffff880bec37a000 ffff880b932a5170 000000000000012e 000000000160006c [80582.013045] ffff880adb9d9b00 ffffffffa03a1ad0 000000000000012e 0000000000010000 [80582.013045] ffff880add976c78 ffff880be92301c0 0000000000016000 0000000000000f96 [80582.013045] Call Trace: [80582.013045] [<ffffffffa03a1ad0>] ? btrfs_inc_extent_ref+0x90/0xb0 [btrfs] [80582.013045] [<ffffffffa03cdc50>] __btrfs_drop_extents+0x530/0xb70 [btrfs] [80582.013045] [<ffffffffa03cecab>] btrfs_drop_extents+0x6b/0x90 [btrfs] [80582.013045] [<ffffffffa03bce1a>] insert_reserved_file_extent+0x8a/0x2a0 [btrfs] [80582.013045] [<ffffffffa03c0c6e>] btrfs_finish_ordered_io+0x39e/0x3e0 [btrfs] [80582.013045] [<ffffffffa03c0cc0>] finish_ordered_fn+0x10/0x20 [btrfs] [80582.013045] [<ffffffffa03e7205>] worker_loop+0x165/0x4c0 [btrfs] [80582.013045] [<ffffffffa03e70a0>] ? check_pending_worker_creates+0xe0/0xe0 [btrfs] [80582.013045] [<ffffffffa03e70a0>] ? check_pending_worker_creates+0xe0/0xe0 [btrfs] [80582.013045] [<ffffffff8106d026>] kthread+0xc6/0xd0 [80582.013045] [<ffffffff8106cf60>] ? kthread_freezable_should_stop+0x70/0x70 [80582.013045] [<ffffffff815f7a7c>] ret_from_fork+0x7c/0xb0 [80582.013045] [<ffffffff8106cf60>] ? kthread_freezable_should_stop+0x70/0x70 [80582.013045] Code: 00 00 00 4c 89 ef 48 63 d2 4c 89 f6 48 8d 14 92 48 8d 54 92 65 e8 6c 0c 04 00 48 8b 0b 48 39 4d b0 48 8b 55 b9 0f b6 45 b8 76 0b <0f> 0b eb fe 0f 1f 80 00 00 00 00 72 1e 3a 43 08 77 ee 66 0f 1f [80582.013045] RIP [<ffffffffa0393a25>] btrfs_set_item_key_safe+0x65/0x1e0 [btrfs] [80582.013045] RSP <ffff880adb9d9ad8> [80582.071249] ---[ end trace 5360a95c1506db85 ]--- The lines are shifted, but it''s the same code: 2933 void btrfs_set_item_key_safe(struct btrfs_trans_handle *trans, 2934 struct btrfs_root *root, struct btrfs_path *path, 2935 struct btrfs_key *new_key) 2936 { 2937 struct btrfs_disk_key disk_key; 2938 struct extent_buffer *eb; 2939 int slot; 2940 2941 eb = path->nodes[0]; 2942 slot = path->slots[0]; 2943 if (slot > 0) { 2944 btrfs_item_key(eb, &disk_key, slot - 1); 2945 BUG_ON(comp_keys(&disk_key, new_key) >= 0); ^^^^ 2946 } 2947 if (slot < btrfs_header_nritems(eb) - 1) { 2948 btrfs_item_key(eb, &disk_key, slot + 1); 2949 BUG_ON(comp_keys(&disk_key, new_key) <= 0); 2950 } 2951 2952 btrfs_cpu_key_to_disk(&disk_key, new_key); 2953 btrfs_set_item_key(eb, &disk_key, slot); 2954 btrfs_mark_buffer_dirty(eb); 2955 if (slot == 0) 2956 fixup_low_keys(trans, root, path, &disk_key, 1); 2957 } It was unexpected so I haven''t put any debugging into config or code. According to the timestamps, the test has been running for 16h 58m. Test setup: * kvm virtual machine with 48 cpus * 48G memory (single bit ECC) * 5x 136GB 15K SAS disks (virtio,cache=none) filesystem was freshly created before the test as data/raid0 and meta/radi0 and then converted via balance to data/single meta/single (that was one of the reproducers of the csum problem). balance log: [19441.016374] btrfs: disk space caching is enabled [19452.900668] btrfs: relocating block group 4332584960 flags 9 [19455.935676] btrfs: found 4 extents [19456.247759] btrfs: found 4 extents [19456.471688] btrfs: relocating block group 37683200 flags 12 [19459.345387] btrfs: found 16 extents [19459.688217] btrfs: relocating block group 20971520 flags 10 [19460.194694] btrfs: found 1 extents [19460.465852] btrfs: relocating block group 12582912 flags 1 [19463.829548] btrfs: found 1 extents [19464.123115] btrfs: found 1 extents [19464.512113] btrfs: relocating block group 4194304 flags 4 [19475.407355] btrfs: found 1 extents # btrfs fi df /mnt/btrfs/ Data: total=3.00GiB, used=11.29MiB System, RAID1: total=32.00MiB, used=4.00KiB System: total=4.00MiB, used=0.00 Metadata, RAID1: total=2.00GiB, used=336.00KiB # btrfs fi show Label: none uuid: a15e1f6c-0e19-420e-93e2-82b4aa072a47 Total devices 5 FS bytes used 10.41MiB devid 1 size 136.70GiB used 1.04GiB path /dev/vda devid 2 size 136.70GiB used 2.00GiB path /dev/vdb devid 3 size 136.70GiB used 1.00GiB path /dev/vdc devid 4 size 136.70GiB used 1.03GiB path /dev/vdd devid 5 size 136.70GiB used 2.00GiB path /dev/vde the sources used were cmason/for-linus (0d90060f330ef49a60f56df2d9439ba263c979d9) plus https://patchwork.kernel.org/patch/2020661/ (Btrfs: put csums on the right ordered extent) plus https://patchwork.kernel.org/patch/1987481/ (Btrfs: fix wrong max device number for single profile) btrfs-debug-tree dumped the filesystem without problems, so it''s an in-memory issue. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html