Saran Neti
2014-May-01 04:48 UTC
Unable to rebuild a 3 drive raid1 - blocked for more than 120 seconds.
Hi, I had 3 x 3 TB drives in an almost full btrfs raid1 setup containing only large (~20 GB) files linearly written and not modified after. Then one of the drives got busted. Mounting the fs in degraded mode and adding a new fresh drive to rebuild raid1, generated several "...blocked for more than 120 seconds." messages. I left it running for a couple of days, but "btrfs device add..." command wouldn't return. I did a hard reboot, and after a degraded mount, am unable to unmount, or add a drive or delete missing without getting stuck with the same error. iostat shows no disk activity. When attempting an unmount, both "umount" and "[btrfs-transacti]" processes become defunct. Tried -o skip_balance as well to no avail. Described in https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg30017.html are two possible causes, fragmentation due to COW and hardlinks, both of which I think are unlikely in this case. I can mount in degraded mode and read files, but that's about it. Is there something I'm missing? Any debugging tips would be appreciated. Please let me know if I can provide more information. --- Info --- # uname -a Linux localhost 3.14.1-1-ARCH #1 SMP PREEMPT Mon Apr 14 20:40:47 CEST 2014 x86_64 GNU/Linux # btrfs --version Btrfs v3.14 # btrfs fi show Label: 'cohenraid1' uuid: 288723c3-2e98-4a6c-87d3-058451d87d26 Total devices 3 FS bytes used 3.44TiB devid 1 size 2.73TiB used 2.19TiB path /dev/sdg1 devid 2 size 2.73TiB used 2.46TiB path /dev/sdf1 *** Some devices missing # btrfs fi df /mnt/cohenraid1 Data, RAID1: total=3.54TiB, used=3.43TiB System, RAID1: total=32.00MiB, used=528.00KiB Metadata, RAID1: total=6.00GiB, used=3.57GiB (Originally, there were two 2.19 TiB filled drives and one 2.46 TiB filled drive. All drives incl. the new one I'm unable to add are SMART-longtest good.) Kernel messages: Apr 30 09:49:34 localhost kernel: INFO: task btrfs-transacti:4080 blocked for more than 120 seconds. Apr 30 09:49:34 localhost kernel: Not tainted 3.14.1-1-ARCH #1 Apr 30 09:49:34 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 30 09:49:34 localhost kernel: btrfs-transacti D ffff8804fe89eb40 0 4080 2 0x00000000 Apr 30 09:49:34 localhost kernel: ffff8804b0bdbdc0 0000000000000046 ffff8804cb3193a0 ffff8804b0bdbfd8 Apr 30 09:49:34 localhost kernel: 00000000000142c0 00000000000142c0 ffff8804cb3193a0 00000000000142c0 Apr 30 09:49:34 localhost kernel: ffff8804cb3193a0 0000000000000000 0000000200000000 0000000000000009 Apr 30 09:49:34 localhost kernel: Call Trace: Apr 30 09:49:34 localhost kernel: [<ffffffffa076a7f8>] ? start_transaction+0x138/0x5a0 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffff8109c648>] ? __enqueue_entity+0x78/0x80 Apr 30 09:49:34 localhost kernel: [<ffffffff8109580e>] ? set_task_cpu+0x6e/0x1d0 Apr 30 09:49:34 localhost kernel: [<ffffffff8107279b>] ? lock_timer_base.isra.35+0x2b/0x50 Apr 30 09:49:34 localhost kernel: [<ffffffff814d7eb9>] schedule+0x29/0x70 Apr 30 09:49:34 localhost kernel: [<ffffffffa076934f>] wait_current_trans.isra.19+0x9f/0x100 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffff810aa350>] ? __wake_up_sync+0x20/0x20 Apr 30 09:49:34 localhost kernel: [<ffffffffa076a978>] start_transaction+0x2b8/0x5a0 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa076ad17>] btrfs_attach_transaction+0x17/0x20 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa0765acb>] transaction_kthread+0x16b/0x240 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa0765960>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffff810872a2>] kthread+0xd2/0xf0 Apr 30 09:49:34 localhost kernel: [<ffffffff810871d0>] ? kthread_create_on_node+0x180/0x180 Apr 30 09:49:34 localhost kernel: [<ffffffff814e2ffc>] ret_from_fork+0x7c/0xb0 Apr 30 09:49:34 localhost kernel: [<ffffffff810871d0>] ? kthread_create_on_node+0x180/0x180 Apr 30 09:49:34 localhost kernel: INFO: task umount:4298 blocked for more than 120 seconds. Apr 30 09:49:34 localhost kernel: Not tainted 3.14.1-1-ARCH #1 Apr 30 09:49:34 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 30 09:49:34 localhost kernel: umount D ffff8804ab5f99f0 0 4298 4296 0x00000004 Apr 30 09:49:34 localhost kernel: ffff8804ab5f9960 0000000000000082 ffff8804d72ff5c0 ffff8804ab5f9fd8 Apr 30 09:49:34 localhost kernel: 00000000000142c0 00000000000142c0 ffff8804d72ff5c0 ffff880509cd26a8 Apr 30 09:49:34 localhost kernel: 0000000000000080 ffff8804ab5f98f8 ffffffff81251ea8 ffff8804e7f34260 Apr 30 09:49:34 localhost kernel: Call Trace: Apr 30 09:49:34 localhost kernel: [<ffffffff81251ea8>] ? submit_bio+0x78/0x160 Apr 30 09:49:34 localhost kernel: [<ffffffffa0793841>] ? btrfs_map_bio+0x2a1/0x550 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffff8101d3c9>] ? read_tsc+0x9/0x20 Apr 30 09:49:34 localhost kernel: [<ffffffff81133760>] ? filemap_fdatawait+0x30/0x30 Apr 30 09:49:34 localhost kernel: [<ffffffff814d7eb9>] schedule+0x29/0x70 Apr 30 09:49:34 localhost kernel: [<ffffffff814d815f>] io_schedule+0x8f/0xe0 Apr 30 09:49:34 localhost kernel: [<ffffffff8113376e>] sleep_on_page+0xe/0x20 Apr 30 09:49:34 localhost kernel: [<ffffffff814d84d2>] __wait_on_bit+0x62/0x90 Apr 30 09:49:34 localhost kernel: [<ffffffff8113352f>] wait_on_page_bit+0x7f/0x90 Apr 30 09:49:34 localhost kernel: [<ffffffff810aa390>] ? autoremove_wake_function+0x40/0x40 Apr 30 09:49:34 localhost kernel: [<ffffffff81141471>] ? pagevec_lookup_tag+0x21/0x30 Apr 30 09:49:34 localhost kernel: [<ffffffff811336aa>] filemap_fdatawait_range+0x10a/0x190 Apr 30 09:49:34 localhost kernel: [<ffffffffa078333f>] btrfs_wait_ordered_range+0x6f/0x140 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa07a9c30>] __btrfs_write_out_cache+0x6d0/0x8e0 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa07aaf1d>] btrfs_write_out_cache+0x8d/0xe0 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa075a393>] btrfs_write_dirty_block_groups+0x593/0x680 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa0768023>] commit_cowonly_roots+0x163/0x230 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa076a118>] btrfs_commit_transaction+0x428/0x9d0 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa07641ff>] btrfs_commit_super+0x8f/0xa0 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffffa0765e10>] close_ctree+0x270/0x2a0 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffff811bee6c>] ? evict_inodes+0x11c/0x130 Apr 30 09:49:34 localhost kernel: [<ffffffffa073d049>] btrfs_put_super+0x19/0x20 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffff811a66b2>] generic_shutdown_super+0x72/0xf0 Apr 30 09:49:34 localhost kernel: [<ffffffff811a68f2>] kill_anon_super+0x12/0x20 Apr 30 09:49:34 localhost kernel: [<ffffffffa073cdd6>] btrfs_kill_super+0x16/0x90 [btrfs] Apr 30 09:49:34 localhost kernel: [<ffffffff811a6c4d>] deactivate_locked_super+0x3d/0x60 Apr 30 09:49:34 localhost kernel: [<ffffffff811a7206>] deactivate_super+0x46/0x60 Apr 30 09:49:34 localhost kernel: [<ffffffff811c25c5>] mntput_no_expire+0xe5/0x170 Apr 30 09:49:34 localhost kernel: [<ffffffff811c3890>] SyS_umount+0x90/0x3c0 Apr 30 09:49:34 localhost kernel: [<ffffffff814e30a9>] system_call_fastpath+0x16/0x1b -- Saran -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html