Hello! We recently upgraded our backup server kernel (rsync with snapshots and compression) to Linus git master from yesterday (3.2-rc4+ 09d9673d53005) that contains the btrfs for-linus as of yesterday. We''ve been seeing a few warnings and bugs: ------------[ cut here ]------------ WARNING: at mm/page-writeback.c:1763 __set_page_dirty_nobuffers+0x17b/0x190() Hardware name: PowerEdge 1950 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Pid: 14299, comm: btrfs-delalloc- Tainted: G W 3.2.0-rc4-hw+ #71 Call Trace: [<ffffffff810dec2b>] ? __set_page_dirty_nobuffers+0x17b/0x190 [<ffffffff8105b050>] warn_slowpath_common+0x80/0xc0 [<ffffffff8105b0a5>] warn_slowpath_null+0x15/0x20 [<ffffffff810dec2b>] __set_page_dirty_nobuffers+0x17b/0x190 [<ffffffff81303b95>] compress_file_range+0x535/0x5e0 [<ffffffff811174ee>] ? kfree+0xee/0x120 [<ffffffff81303c70>] async_cow_start+0x30/0x50 [<ffffffff813220a3>] worker_loop+0x173/0x530 [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310 [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310 [<ffffffff8107c7f6>] kthread+0x96/0xb0 [<ffffffff816e09b4>] kernel_thread_helper+0x4/0x10 [<ffffffff8107c760>] ? kthread_worker_fn+0x190/0x190 [<ffffffff816e09b0>] ? gs_change+0x13/0x13 ---[ end trace 52453f1ad38744b8 ]--- (several hours later) ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:1587! invalid opcode: 0000 [#1] SMP CPU 2 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Pid: 4477, comm: btrfs-fixup-0 Tainted: G W 3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937 RIP: 0010:[<ffffffff812fbe10>] [<ffffffff812fbe10>] btrfs_writepage_fixup_worker+0x160/0x170 RSP: 0018:ffff88040ff1dde0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000013d6000 RCX: 0000000000000000 RDX: 0000000000000065 RSI: 00000000013d6000 RDI: ffff8800996fe8e0 RBP: ffff88040ff1de30 R08: ffff88040ff1dd34 R09: ffff88040ff1dda0 R10: dead000000200200 R11: 0000000000000000 R12: ffffea000ea54840 R13: 00000000013d6fff R14: ffff8800996fe9b0 R15: ffff8800996fe850 FS: 0000000000000000(0000) GS:ffff88043fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000002051c80 CR3: 0000000427492000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfs-fixup-0 (pid: 4477, threadinfo ffff88040ff1c000, task ffff8804135b9630) Stack: ffffffff8106b2c0 ffff880261a9bae0 0000000000000000 0000000000000286 ffff8801b10103f0 ffff880261a9bae8 ffff880261a9bb10 ffff880412f606c0 ffff880412f60710 ffff880412f606d8 ffff88040ff1dee0 ffffffff813220a3 Call Trace: [<ffffffff8106b2c0>] ? del_timer+0xd0/0xd0 [<ffffffff813220a3>] worker_loop+0x173/0x530 [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310 [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310 [<ffffffff8107c7f6>] kthread+0x96/0xb0 [<ffffffff816e09b4>] kernel_thread_helper+0x4/0x10 [<ffffffff8107c760>] ? kthread_worker_fn+0x190/0x190 [<ffffffff816e09b0>] ? gs_change+0x13/0x13 Code: 5d 41 5e 41 5f c9 c3 0f 1f 40 00 48 8d 4d d0 41 b8 50 00 00 00 4c 89 ea 48 89 de 4c 89 ff e8 f8 c1 01 00 eb ba 66 0f 1f 44 00 00 <0f> 0b eb fe 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 RIP [<ffffffff812fbe10>] btrfs_writepage_fixup_worker+0x160/0x170 RSP <ffff88040ff1dde0> ---[ end trace 52453f1ad38744b9 ]--- Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Simon Kirby
2011-Dec-10 02:21 UTC
Re: warning and bug on 3.2-rc4 + for-linus from yesterday
On Fri, Dec 09, 2011 at 12:39:48PM -0800, Simon Kirby wrote:> Hello! > > We recently upgraded our backup server kernel (rsync with snapshots and > compression) to Linus git master from yesterday (3.2-rc4+ 09d9673d53005) > that contains the btrfs for-linus as of yesterday. We''ve been seeing a > few warnings and bugs:Then it kept pinging but didn''t accept SSH anymore, with this captured via serial console: [79214.481458] ------------[ cut here ]------------ [79214.485335] kernel BUG at fs/btrfs/inode.c:2893! [79214.485335] invalid opcode: 0000 [#2] SMP [79214.485335] CPU 0 [79214.485335] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 [79214.485335] [79214.485335] Pid: 24202, comm: btrfsctl Tainted: G D W 3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937 [79214.485335] RIP: 0010:[<ffffffff81304d88>] [<ffffffff81304d88>] btrfs_unlink_subvol+0x268/0x270 [79214.485335] RSP: 0018:ffff880344babd28 EFLAGS: 00010286 [79214.485335] RAX: 00000000ffffffe4 RBX: 0000000000000c46 RCX: ffff880336fd1588 [79214.485335] RDX: 00000000ffffffe4 RSI: 0000000000000000 RDI: ffff880336fd15a8 [79214.485335] RBP: ffff880344babda8 R08: 0000000000000000 R09: 0000000000000000 [79214.485335] R10: 0000000000000000 R11: 9000000000000001 R12: ffff880405cf5e88 [79214.485335] R13: ffff880428a9ba20 R14: ffff880405158c00 R15: 0000000000000100 [79214.485335] FS: 00007f27ff13d740(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000 [79214.485335] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [79214.485335] CR2: 00007fffdf79f950 CR3: 00000003f79fe000 CR4: 00000000000006f0 [79214.485335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [79214.485335] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [79214.485335] Process btrfsctl (pid: 24202, threadinfo ffff880344baa000, task ffff8803dcec0000) [79214.485335] Stack: [79214.485335] ffff8804037d53f8 ffff880300000010 0000001044babd58 ffff8804037d53f8 [79214.485335] 00000000000008a0 ffff8803fd8b43f0 00000000000008a0 ffffffffffffff84 [79214.485335] 00000000000000ff 0000000000000268 ffff880037e73008 0000000000000000 [79214.485335] Call Trace: [79214.485335] [<ffffffff81323975>] btrfs_ioctl_snap_destroy+0x3b5/0x480 [79214.485335] [<ffffffff81326ef2>] btrfs_ioctl+0x3a2/0x10d0 [79214.485335] [<ffffffff816da594>] ? do_page_fault+0x254/0x4b0 [79214.485335] [<ffffffff8112cdc0>] do_vfs_ioctl+0xa0/0x520 [79214.485335] [<ffffffff8112d28a>] sys_ioctl+0x4a/0x80 [79214.485335] [<ffffffff816de8d2>] system_call_fastpath+0x16/0x1b [79214.485335] Code: 48 8d 54 92 65 e8 89 f2 00 00 48 8b 5d b9 4c 89 ef e8 4d 2c fd ff 48 89 5d c8 e9 ca fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 1f 40 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c [79214.485335] RIP [<ffffffff81304d88>] btrfs_unlink_subvol+0x268/0x270 [79214.485335] RSP <ffff880344babd28> [79214.700401] ---[ end trace 52453f1ad38744ba ]--- Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2011-Dec-12 00:30 UTC
Re: warning and bug on 3.2-rc4 + for-linus from yesterday
Hi, On Fri, Dec 09, 2011 at 12:39:48PM -0800, Simon Kirby wrote:> ------------[ cut here ]------------ > WARNING: at mm/page-writeback.c:1763 __set_page_dirty_nobuffers+0x17b/0x190() > Hardware name: PowerEdge 1950 > Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 > Pid: 14299, comm: btrfs-delalloc- Tainted: G W 3.2.0-rc4-hw+ #71 > Call Trace: > [<ffffffff810dec2b>] ? __set_page_dirty_nobuffers+0x17b/0x190 > [<ffffffff8105b050>] warn_slowpath_common+0x80/0xc0 > [<ffffffff8105b0a5>] warn_slowpath_null+0x15/0x20 > [<ffffffff810dec2b>] __set_page_dirty_nobuffers+0x17b/0x190 > [<ffffffff81303b95>] compress_file_range+0x535/0x5e0 > [<ffffffff811174ee>] ? kfree+0xee/0x120 > [<ffffffff81303c70>] async_cow_start+0x30/0x50 > [<ffffffff813220a3>] worker_loop+0x173/0x530 > [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310 > [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310 > [<ffffffff8107c7f6>] kthread+0x96/0xb0 > [<ffffffff816e09b4>] kernel_thread_helper+0x4/0x10 > [<ffffffff8107c760>] ? kthread_worker_fn+0x190/0x190 > [<ffffffff816e09b0>] ? gs_change+0x13/0x13 > ---[ end trace 52453f1ad38744b8 ]--- > > (several hours later)1761 if (mapping2) { /* Race with truncate? */ 1762 BUG_ON(mapping2 != mapping); 1763 WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); ^^^^ 1764 account_page_dirtied(page, mapping); 1765 radix_tree_tag_set(&mapping->page_tree, 1766 page_index(page), PAGECACHE_TAG_DIRTY); 1767 } The warning pops up just the first time, so I think it may happen more often, would be interesting to verify this.> ------------[ cut here ]------------ > kernel BUG at fs/btrfs/inode.c:1587! > invalid opcode: 0000 [#1] SMP > CPU 2 > Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 > > Pid: 4477, comm: btrfs-fixup-0 Tainted: G W 3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937 > RIP: 0010:[<ffffffff812fbe10>] [<ffffffff812fbe10>] btrfs_writepage_fixup_worker+0x160/0x170I was chasing this BUG last week with 3.1+cmason/for-linus and I''m able to trigger it quite reliably with looped xfstests/209 and low writeback activity like looped make all & clean in kernel tree (reliably means like from 10 minutes to 1 day). Chris told me to instrument SetPageDirty and based on a similar tracing patchset (http://lwn.net/Articles/315511/) I tried but to no avail, the instrumentation must have significantly changed timing and the bug did not fire at all. I skip the details of my debugging results for now, the first stacktrace contains something I was hoping for :) async_cow_start -- calls compress_file_range directly, compress_file_range+0x535 (gdb) l *(compress_file_range+0x535) 0x31605 is in compress_file_range (fs/btrfs/inode.c:540). 535 * for the async work queue to run cow_file_range to do 536 * the normal delalloc dance 537 */ 538 if (page_offset(locked_page) >= start && 539 page_offset(locked_page) <= end) { 540 __set_page_dirty_nobuffers(locked_page); 541 /* unlocked later on in the async handlers */ 542 } 543 add_async_extent(async_cow, start, end - start + 1, 544 0, NULL, 0, BTRFS_COMPRESS_NONE); this code is reached from (or in a general case when compression is not done): 356 /* 357 * we don''t want to send crud past the end of i_size through 358 * compression, that''s just a waste of CPU time. So, if the 359 * end of the file is before the start of our current 360 * requested range of bytes, we bail out to the uncompressed 361 * cleanup code that can deal with all of this. 362 * 363 * It isn''t really the fastest way to fix things, but this is a 364 * very uncommon corner. 365 */ 366 if (actual_end <= start) 367 goto cleanup_and_bail_uncompressed; ... but seems that ''uncommon'' happens and could trigger the bug in fixup worker if there is some race with truncate. I''ll try to put together details and logs from my debugging. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Simon Kirby
2011-Dec-13 00:18 UTC
Re: warning and bug on 3.2-rc4 + for-linus from yesterday
Hello! On Mon, Dec 12, 2011 at 01:30:31AM +0100, David Sterba wrote:> On Fri, Dec 09, 2011 at 12:39:48PM -0800, Simon Kirby wrote: > > ------------[ cut here ]------------ > > WARNING: at mm/page-writeback.c:1763 __set_page_dirty_nobuffers+0x17b/0x190() > > Hardware name: PowerEdge 1950 > > Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 > > Pid: 14299, comm: btrfs-delalloc- Tainted: G W 3.2.0-rc4-hw+ #71 > > Call Trace: > > [<ffffffff810dec2b>] ? __set_page_dirty_nobuffers+0x17b/0x190 > > [<ffffffff8105b050>] warn_slowpath_common+0x80/0xc0 > > [<ffffffff8105b0a5>] warn_slowpath_null+0x15/0x20 > > [<ffffffff810dec2b>] __set_page_dirty_nobuffers+0x17b/0x190 > > [<ffffffff81303b95>] compress_file_range+0x535/0x5e0 > > [<ffffffff811174ee>] ? kfree+0xee/0x120 > > [<ffffffff81303c70>] async_cow_start+0x30/0x50 > > [<ffffffff813220a3>] worker_loop+0x173/0x530 > > [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310 > > [<ffffffff81321f30>] ? btrfs_queue_worker+0x310/0x310 > > [<ffffffff8107c7f6>] kthread+0x96/0xb0 > > [<ffffffff816e09b4>] kernel_thread_helper+0x4/0x10 > > [<ffffffff8107c760>] ? kthread_worker_fn+0x190/0x190 > > [<ffffffff816e09b0>] ? gs_change+0x13/0x13 > > ---[ end trace 52453f1ad38744b8 ]--- > > > > (several hours later) > > 1761 if (mapping2) { /* Race with truncate? */ > 1762 BUG_ON(mapping2 != mapping); > 1763 WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > ^^^^ > 1764 account_page_dirtied(page, mapping); > 1765 radix_tree_tag_set(&mapping->page_tree, > 1766 page_index(page), PAGECACHE_TAG_DIRTY); > 1767 } > > The warning pops up just the first time, so I think it may happen more > often, would be interesting to verify this.We''ll try again with s/WARN_ON_ONCE/WARN_ON/ here. Meanwhile, other problems have come up on this kernel, like a BUG() on snapshot delete: ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:2893! invalid opcode: 0000 [#1] SMP CPU 1 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler aoe bnx2 Pid: 20842, comm: btrfsctl Tainted: G W 3.2.0-rc4-hw+ #71 Dell Inc. PowerEdge 1950/0NK937 RIP: 0010:[<ffffffff81304d88>] [<ffffffff81304d88>] btrfs_unlink_subvol+0x268/0x270 RSP: 0018:ffff88008eca7d28 EFLAGS: 00010286 RAX: 00000000ffffffe4 RBX: 0000000000000c46 RCX: ffff8800922a2928 RDX: 00000000ffffffe4 RSI: 0000000000000000 RDI: ffff8800922a2948 RBP: ffff88008eca7da8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 9000000000000001 R12: ffff880075b20548 R13: ffff88042876aab0 R14: ffff88026d2f3400 R15: 0000000000000100 FS: 00007ffc81a2c740(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007ffc813e8480 CR3: 00000002e7a92000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process btrfsctl (pid: 20842, threadinfo ffff88008eca6000, task ffff8803d8239630) Stack: ffff8802080323f8 ffff880300000010 000000108eca7d58 ffff8802080323f8 00000000000008a0 ffff8803fda1f708 00000000000008a0 ffffffffffffff84 00000000000000ff 0000000000000268 ffff880088788008 0000000000000000 Call Trace: [<ffffffff81323975>] btrfs_ioctl_snap_destroy+0x3b5/0x480 [<ffffffff81326ef2>] btrfs_ioctl+0x3a2/0x10d0 [<ffffffff816da594>] ? do_page_fault+0x254/0x4b0 [<ffffffff8112cdc0>] do_vfs_ioctl+0xa0/0x520 [<ffffffff8112d28a>] sys_ioctl+0x4a/0x80 [<ffffffff816de8d2>] system_call_fastpath+0x16/0x1b Code: 48 8d 54 92 65 e8 89 f2 00 00 48 8b 5d b9 4c 89 ef e8 4d 2c fd ff 48 89 5d c8 e9 ca fe ff ff 0f 0b eb fe 0f 0b eb fe 0f 1f 40 00 <0f> 0b eb fe 0f 0b eb fe 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c RIP [<ffffffff81304d88>] btrfs_unlink_subvol+0x268/0x270 RSP <ffff88008eca7d28> ---[ end trace 485334ee1ecd2cc8 ]--- After this, we hit a number of other BUG() and WARN() cases during mounting on the next boot: kernel BUG at fs/btrfs/extent-tree.c:6343! WARNING: at fs/btrfs/extent-tree.c:4754 __btrfs_free_extent+0x627/0x720() WARNING: at fs/btrfs/extent-tree.c:4801 __btrfs_free_extent+0x6cc/0x720() kernel BUG at fs/btrfs/extent-tree.c:2287! I posted them here if they are of any use: http://0x.ca/sim/ref/3.2-rc4/ We''ll probably go back to 2.6.38 again for now, the latest mostly-working kernel on this box. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html