When running a practical stress-test on 2.6.29-rc2 trying to reproduce an older (extent refcounting) issue, I am consistently able to hit an oops [1] and an assertion failure [2]. Here, I''m testing with 8 block ramdisks, configured in the kernel to 256MB each (intentionally testing free-space handling): for i in `seq 0 7`; do mknod /dev/ram$i b 1 $i; dd if=/dev/zero of=/dev/ram$i bs=1024k count=256; done mkfs.btrfs -m raid10 -d raid10 /dev/ram0 /dev/ram1 /dev/ram2 /dev/ram3 /dev/ram4 /dev/ram5 /dev/ram6 /dev/ram7 mount /dev/ram0 /mnt -o space_cache,ssd,nobarrier,compress # try without compress also cp -xa / /mnt the next steps are executed in parallel: while :; do cp -xa / /mnt; done & while :; do btrfs filesystem balance /mnt; done & while :; do find /mnt -print0 | xargs -0 btrfs filesystem defragment -c; done & --- [1] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/bus/hid/drivers/generic-usb/new_id CPU 0 Modules linked in: brd loop [last unloaded: brd] Pid: 28000, comm: btrfs Tainted: G W 2.6.39-rc2-350cd #2 Supermicro X8STi/X8STi RIP: 0010:[<ffffffff812d40a4>] [<ffffffff812d40a4>] btrfs_write_out_cache+0x9d4/0xdf0 RSP: 0018:ffff8802af913968 EFLAGS: 00010246 RAX: db73880000000000 RBX: 0000000000000000 RCX: 0000000000000200 RDX: 0000000000001000 RSI: ffff8802ba9b1048 RDI: db73880000000000 RBP: ffff8802af913ae8 R08: 0000000000000001 R09: 0000000000000000 R10: ffffffff810e8130 R11: 0000000000000000 R12: ffff8802510a3be0 R13: ffff8802acf8b948 R14: ffff8802510a3bb0 R15: ffff8802b9f561c8 FS: 00007fabcef8d740(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000003c960c8 CR3: 00000002afa29000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs (pid: 28000, threadinfo ffff8802af912000, task ffff8803090d0000) Stack: ffff8802af913988 0000000000000001 ffff8802af913998 ffff8802af913a88 ffff8802af9139a8 0000000000000000 0000000000000040 ffff880215059770 ffff880215059710 0000000000000010 ffff8802acf8b908 000000000000000f Call Trace: [<ffffffff810506b1>] ? get_parent_ip+0x11/0x50 [<ffffffff8105584d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff8128a98b>] btrfs_write_dirty_block_groups+0x2ab/0x300 [<ffffffff81296a35>] commit_cowonly_roots+0x105/0x1e0 [<ffffffff8129782d>] btrfs_commit_transaction+0x37d/0x720 [<ffffffff81080ad0>] ? wake_up_bit+0x40/0x40 [<ffffffff812e0afc>] relocate_block_group+0x4bc/0x600 [<ffffffff812e0de8>] btrfs_relocate_block_group+0x1a8/0x2d0 [<ffffffff812c14ed>] btrfs_relocate_chunk+0x6d/0x3b0 [<ffffffff810506b1>] ? get_parent_ip+0x11/0x50 [<ffffffff8105584d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff812c20dd>] btrfs_balance+0x20d/0x280 [<ffffffff812c9ec0>] btrfs_ioctl+0x450/0x590 [<ffffffff81152e8d>] do_vfs_ioctl+0x8d/0x330 [<ffffffff81141444>] ? fget_light+0x274/0x3c0 [<ffffffff81106cc0>] ? __do_fault+0x150/0x5d0 [<ffffffff8115317a>] sys_ioctl+0x4a/0x80 [<ffffffff81709ffb>] system_call_fastpath+0x16/0x1b Code: 89 ad 38 ff ff ff 49 89 c7 4c 8b ad 48 ff ff ff e9 e4 00 00 00 66 90 40 f6 c7 04 0f 85 6e 01 00 00 89 d1 c1 e9 03 f6 c2 04 89 c9 <f3> 48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f RIP [<ffffffff812d40a4>] btrfs_write_out_cache+0x9d4/0xdf0 RSP <ffff8802af913968> ---[ end trace a7919e7f17c0a728 ]--- --- [2] kernel BUG at fs/btrfs/relocation.c:4282! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent CPU 0 Modules linked in: brd loop Pid: 7775, comm: flush-btrfs-1 Tainted: G W 2.6.39-rc2-350cd #2 Supermicro X8STi/X8STi RIP: 0010:[<ffffffff812da5ab>] [<ffffffff812da5ab>] btrfs_reloc_cow_block+0x28b/0x2c0 RSP: 0018:ffff8803057817f0 EFLAGS: 00010246 RAX: ffff880305728000 RBX: ffff880305640000 RCX: ffff880235d92e40 RDX: ffff880209c1f5f0 RSI: ffff880308bdd168 RDI: ffff8802ff1fb220 RBP: ffff880305781850 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff812d8630 R11: 0000000000000000 R12: ffff880308bdd168 R13: ffff880209c1f5f0 R14: ffff8802ff1fb220 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f7355663650 CR3: 00000001f75f7000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-btrfs-1 (pid: 7775, threadinfo ffff880305780000, task ffff880308fa8000) Stack: ffff880305781850 ffffffff81276c5d fffffffffffffff7 ffffea0006cd7e90 0000000000000000 ffff880235d92e40 ffff880305781850 ffff880308bdd168 ffff880235d92e40 ffff880209c1f5f0 ffff8802ff1fb220 0000000000000000 Call Trace: [<ffffffff81276c5d>] ? update_ref_for_cow+0x26d/0x360 [<ffffffff81277401>] __btrfs_cow_block+0x6b1/0x980 [<ffffffff81277e4b>] btrfs_cow_block+0x11b/0x2c0 [<ffffffff8127b915>] btrfs_search_slot+0x3c5/0x790 [<ffffffff812762d5>] ? btrfs_alloc_path+0x15/0x30 [<ffffffff812a1640>] btrfs_truncate_inode_items+0x110/0x770 [<ffffffff810506b1>] ? get_parent_ip+0x11/0x50 [<ffffffff817094d0>] ? _raw_spin_unlock+0x30/0x60 [<ffffffff812a21fb>] btrfs_evict_inode+0x18b/0x200 [<ffffffff8115b511>] evict+0x81/0x180 [<ffffffff8115b9c6>] iput_final+0xe6/0x1a0 [<ffffffff8115bab6>] iput+0x36/0x50 [<ffffffff811672de>] writeback_sb_inodes+0x12e/0x1d0 [<ffffffff81167e9b>] writeback_inodes_wb+0x7b/0x180 [<ffffffff8116825b>] wb_writeback+0x2bb/0x320 [<ffffffff8115c882>] ? get_nr_inodes+0x62/0xb0 [<ffffffff811684dc>] wb_do_writeback+0x21c/0x230 [<ffffffff81168582>] bdi_writeback_thread+0x92/0x180 [<ffffffff811684f0>] ? wb_do_writeback+0x230/0x230 [<ffffffff81080596>] kthread+0xb6/0xc0 [<ffffffff8109629d>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffff8170b154>] kernel_thread_helper+0x4/0x10 [<ffffffff81055718>] ? finish_task_switch+0x78/0x110 [<ffffffff81709884>] ? retint_restore_args+0xe/0xe [<ffffffff810804e0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8170b150>] ? gs_change+0xb/0xb Code: ff ff e8 79 bf 42 00 e9 ae fe ff ff eb 02 90 90 e8 6b bf 42 00 eb 01 90 e9 33 fe ff ff 48 83 be 47 01 00 00 f7 0f 85 c2 fd ff ff <0f> 0b eb fe 48 3b 50 20 0f 84 04 ff ff ff 0f 0b eb fe 83 7d c4 RIP [<ffffffff812da5ab>] btrfs_reloc_cow_block+0x28b/0x2c0 RSP <ffff8803057817f0> ---[ end trace a7919e7f17c0a728 ]--- -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/07/2011 03:21 AM, Daniel J Blueman wrote:> When running a practical stress-test on 2.6.29-rc2 trying to reproduce > an older (extent refcounting) issue, I am consistently able to hit an > oops [1] and an assertion failure [2]. >Sorry about that, please apply the patch I just sent this morning [PATCH] Btrfs: deal with the case that we run out of space in the cache Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Josef, Chris, On 8 April 2011 00:23, Josef Bacik <josef@redhat.com> wrote:> On 04/07/2011 03:21 AM, Daniel J Blueman wrote: >> >> When running a practical stress-test on 2.6.29-rc2 trying to reproduce >> an older (extent refcounting) issue, I am consistently able to hit an >> oops [] and an assertion failure []. > > Sorry about that, please apply the patch I just sent this morning > > [PATCH] Btrfs: deal with the case that we run out of space in the cacheSuperb work - the btrfs_write_out_cache oops is addressed, so now we (separately) hit a few other assertions at: volumes.c:2013 [1], volumes.c:2063 [2] and volumes.c:2703 [3] with the previous reproducer. Let me know if adding any debugging or other testing may be useful. Thanks, Daniel --- [1] kernel BUG at fs/btrfs/volumes.c:2013! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/virtual/block/ram7/removable CPU 0 Modules linked in: ppp_generic slhc tun brd loop Pid: 17040, comm: btrfs Tainted: G W 2.6.39-rc2-350cd+ #3 Supermicro X8STi/X8STi RIP: 0010:[<ffffffff812c214b>] [<ffffffff812c214b>] btrfs_balance+0x27b/0x280 RSP: 0018:ffff88015c923e08 EFLAGS: 00010282 RAX: 00000000fffffffb RBX: ffff880301d6e1b0 RCX: 0000000000000040 RDX: 00000000fffffffb RSI: 0000000000000000 RDI: ffffffff8112e425 RBP: ffff88015c923e88 R08: 0000000000000000 R09: ffff8802f8ee53f0 R10: 0000000000000012 R11: 0000000000000098 R12: ffff8802f909a490 R13: ffff8802f909bc38 R14: 0000000010000000 R15: 00007fffd1599ce0 FS: 00007f3c4b6f4740(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000f00098 CR3: 000000015c921000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs (pid: 17040, threadinfo ffff88015c922000, task ffff88030b898000) Stack: ffff880307cd5498 ffff880301d6c120 ffff88015c923e38 ffffffff81085b9e ffff880308a5d700 0000000000000008 ffff88015c923f48 ffffffff81031d5c ffffea000a9e7b40 ffff88015c923f58 ffff88030b898000 ffff88015c8aa300 Call Trace: [<ffffffff81085b9e>] ? up_read+0x1e/0x40 [<ffffffff81031d5c>] ? do_page_fault+0x1cc/0x440 [<ffffffff812c9ec0>] btrfs_ioctl+0x450/0x590 [<ffffffff81152e8d>] do_vfs_ioctl+0x8d/0x330 [<ffffffff81141444>] ? fget_light+0x274/0x3c0 [<ffffffff81106cc0>] ? __do_fault+0x150/0x5d0 [<ffffffff8115317a>] sys_ioctl+0x4a/0x80 [<ffffffff8170a03b>] system_call_fastpath+0x16/0x1b Code: 81 c7 d8 22 00 00 e8 05 4b 44 00 8b 45 80 e9 e7 fd ff ff 31 c0 eb d2 85 c0 74 a7 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 90 55 48 89 e5 48 83 ec 40 8b 05 e2 62 72 00 4c 89 RIP [<ffffffff812c214b>] btrfs_balance+0x27b/0x280 RSP <ffff88015c923e08> --- [2] kernel BUG at fs/btrfs/volumes.c:2063! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/virtual/block/ram7/removable CPU 0 Modules linked in: brd loop Pid: 13460, comm: btrfs Tainted: G W 2.6.39-rc2-350cd+ #3 Supermicro X8STi/X8STi RIP: 0010:[<ffffffff812c213b>] [<ffffffff812c213b>] btrfs_balance+0x26b/0x280 RSP: 0018:ffff8800b1827e08 EFLAGS: 00010282 RAX: 00000000fffffffb RBX: ffff88030934d168 RCX: 0000000000000006 RDX: 00000000fffffffb RSI: ffff880308fc06f0 RDI: ffff880308fc0000 RBP: ffff8800b1827e88 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8802ff5455e8 R13: ffff8800b1827e38 R14: 000000010d560000 R15: ffff8800b1827e18 FS: 00007fce737e5740(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000002371688 CR3: 00000000b1ff8000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs (pid: 13460, threadinfo ffff8800b1826000, task ffff880308fc0000) Stack: 0000000000000100 ffff88030934e1b0 0000000000000100 0000010d560000e4 ffff880308837a00 0000000000000008 0000000000000100 00000113bbffffe4 ffff880308fc0600 ffff8800b1827f58 ffff880308fc0000 ffff8801f8c56c00 Call Trace: [<ffffffff812c9ec0>] btrfs_ioctl+0x450/0x590 [<ffffffff81152e8d>] do_vfs_ioctl+0x8d/0x330 [<ffffffff8114148f>] ? fget_light+0x2bf/0x3c0 [<ffffffff8109629d>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffff8115317a>] sys_ioctl+0x4a/0x80 [<ffffffff8170a03b>] system_call_fastpath+0x16/0x1b Code: 7c 90 fb ff 48 8b 55 88 48 8b ba 58 01 00 00 48 81 c7 d8 22 00 00 e8 05 4b 44 00 8b 45 80 e9 e7 fd ff ff 31 c0 eb d2 85 c0 74 a7 <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 90 RIP [<ffffffff812c213b>] btrfs_balance+0x26b/0x280 RSP <ffff8800b1827e08> --- [3] kernel BUG at fs/btrfs/volumes.c:2703! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/virtual/bdi/btrfs-3/uevent CPU 0 Modules linked in: brd loop Pid: 14333, comm: btrfs-delalloc- Tainted: G W 2.6.39-rc2-350cd+ #3 Supermicro X8STi/X8STi RIP: 0010:[<ffffffff812c08c2>] [<ffffffff812c08c2>] __finish_chunk_alloc+0x212/0x220 RSP: 0018:ffff8803007e7af0 EFLAGS: 00010286 RAX: 00000000ffffffe4 RBX: ffff88024e54e000 RCX: 0000000000000040 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8112e425 RBP: ffff8803007e7b70 R08: 0000000000000000 R09: ffff8803072fe168 R10: 0000000000000012 R11: 0000000000000098 R12: ffff880303c192a8 R13: ffff88020a461e70 R14: ffff8801c2632090 R15: 00000000000000b0 FS: 0000000000000000(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000002953c98 CR3: 00000002fdfd3000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs-delalloc- (pid: 14333, threadinfo ffff8803007e6000, task ffff880308ccc020) Stack: 00000002007e7b70 0000000000000003 00000000006e0000 000000007ffc0000 ffff8801c2634120 0000000000370000 0000000000000100 0000007ffc0000e4 0000000000370000 ffff88024e54e000 0000000000000246 ffff8801c2632090 Call Trace: [<ffffffff812c3d0e>] btrfs_alloc_chunk+0x8e/0xa0 [<ffffffff81281ed6>] do_chunk_alloc+0x1b6/0x280 [<ffffffff812844e4>] btrfs_reserve_extent+0xb4/0x170 [<ffffffff81706c39>] ? mutex_unlock+0x9/0x10 [<ffffffff812980c7>] ? start_transaction+0x247/0x2b0 [<ffffffff8129db9e>] submit_compressed_extents+0xfe/0x460 [<ffffffff810506b1>] ? get_parent_ip+0x11/0x50 [<ffffffff8129df7f>] async_cow_submit+0x7f/0x90 [<ffffffff812c452b>] run_ordered_completions+0x7b/0xc0 [<ffffffff812c4f9c>] worker_loop+0x16c/0x3c0 [<ffffffff812c4e30>] ? check_pending_worker_creates+0xd0/0xd0 [<ffffffff81080596>] kthread+0xb6/0xc0 [<ffffffff8170b194>] kernel_thread_helper+0x4/0x10 [<ffffffff81055718>] ? finish_task_switch+0x78/0x110 [<ffffffff817098c4>] ? retint_restore_args+0xe/0xe [<ffffffff810804e0>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8170b190>] ? gs_change+0xb/0xb Code: 1d 07 00 44 89 a3 58 07 00 00 4c 89 ef e8 c7 ef e6 ff 31 c0 48 83 c4 58 5b 41 5c 41 5d 41 5e 41 5f c9 c3 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe eb 08 90 90 90 90 90 90 90 90 55 49 89 ca 48 89 e5 RIP [<ffffffff812c08c2>] __finish_chunk_alloc+0x212/0x220 RSP <ffff8803007e7af0> -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/07/2011 10:26 PM, Daniel J Blueman wrote:> Hi Josef, Chris, > > On 8 April 2011 00:23, Josef Bacik<josef@redhat.com> wrote: >> On 04/07/2011 03:21 AM, Daniel J Blueman wrote: >>> >>> When running a practical stress-test on 2.6.29-rc2 trying to reproduce >>> an older (extent refcounting) issue, I am consistently able to hit an >>> oops [] and an assertion failure []. >> >> Sorry about that, please apply the patch I just sent this morning >> >> [PATCH] Btrfs: deal with the case that we run out of space in the cache > > Superb work - the btrfs_write_out_cache oops is addressed, so now we > (separately) hit a few other assertions at: volumes.c:2013 [1], > volumes.c:2063 [2] and volumes.c:2703 [3] with the previous > reproducer. > > Let me know if adding any debugging or other testing may be useful. > > Thanks, > DanielLooks like the first 2 panics are basically the same thing. You are getting -EIO back from btrfs_shrink_device(), which could either come from searching or it could come from the stuff in relocation.c. So will you put printk''s at the 2 places in relocation.c where we return -EIO and figure out which one is getting tripped? Once we know who is returning EIO we can go from there. As for the last one, that''s just a normal ENOSPC, but it''s because we''re allocating a chunk in the submission path, so that''s going to be a little trickier to deal with. Lets fix these first two panics first and then hopefully that last one will just go away :). Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html