Marc MERLIN
2014-May-22 09:09 UTC
3.15.0-rc5: btrfs and sync deadlock: call_rwsem_down_read_failed
I got m laptop to hang all IO to one of its devices again, this time drive #2. This is the 3rd time it happens, and I've already lost data as a result since things that haven't hit disk, don't make it at this point. I was doing balance and btrfs send/receive. Then cron started a scrub in the background too. IO to drive #1 was working fine, I didn't even notice that drive #2 IO was hung. And then I typed sync and it never returned. legolas:~# ps -eo pid,user,args,wchan | grep sync 23605 root sync call_rwsem_down_read_failed 31885 root sync call_rwsem_down_read_failed What does this mean when sync is stuck that way? When I'm in that state, accessing btrfs on drive 1 still works (read and write). Any access on drive 2 through btrfs hangs Both block devices still work. legolas:~# dd if=/dev/sda of=/dev/null bs=1M 2593128448 bytes (2.6 GB) copied, 6.47656 s, 400 MB/s legolas:~# dd if=/dev/sdb of=/dev/null bs=1M 148897792 bytes (149 MB) copied, 7.99576 s, 18.6 MB/s So at least it shows that I don't have a hardware problem, right? After reboot, most of the data to disk1 made it, so at least sync worked there. How can I confirm that it is btrfs deadlocking and not something else in the kernel? The state of btrfs is: legolas:~# ps -eo pid,user,args,wchan | grep btrfs 527 root [btrfs-worker] rescuer_thread 528 root [btrfs-worker-hi] rescuer_thread 529 root [btrfs-delalloc] rescuer_thread 530 root [btrfs-flush_del] rescuer_thread 531 root [btrfs-cache] rescuer_thread 532 root [btrfs-submit] rescuer_thread 533 root [btrfs-fixup] rescuer_thread 534 root [btrfs-endio] rescuer_thread 535 root [btrfs-endio-met] rescuer_thread 536 root [btrfs-endio-met] rescuer_thread 537 root [btrfs-endio-rai] rescuer_thread 538 root [btrfs-rmw] rescuer_thread 539 root [btrfs-endio-wri] rescuer_thread 540 root [btrfs-freespace] rescuer_thread 541 root [btrfs-delayed-m] rescuer_thread 542 root [btrfs-readahead] rescuer_thread 543 root [btrfs-qgroup-re] rescuer_thread 544 root [btrfs-cleaner] cleaner_kthread 545 root [btrfs-transacti] transaction_kthread 2111 root [btrfs-worker] rescuer_thread 2112 root [btrfs-worker-hi] rescuer_thread 2113 root [btrfs-delalloc] rescuer_thread 2114 root [btrfs-flush_del] rescuer_thread 2115 root [btrfs-cache] rescuer_thread 2116 root [btrfs-submit] rescuer_thread 2117 root [btrfs-fixup] rescuer_thread 2119 root [btrfs-endio] rescuer_thread 2120 root [btrfs-endio-met] rescuer_thread 2121 root [btrfs-endio-met] rescuer_thread 2122 root [btrfs-endio-rai] rescuer_thread 2123 root [btrfs-rmw] rescuer_thread 2124 root [btrfs-endio-wri] rescuer_thread 2125 root [btrfs-freespace] rescuer_thread 2126 root [btrfs-delayed-m] rescuer_thread 2127 root [btrfs-readahead] rescuer_thread 2128 root [btrfs-qgroup-re] rescuer_thread 3205 root [btrfs-cleaner] cleaner_kthread 3206 root [btrfs-transacti] transaction_kthread 19156 root gvim /etc/cron.d/btrfs_back poll_schedule_timeout 19729 root btrfs send var_ro.20140521_ pipe_wait 19730 root btrfs receive /mnt/btrfs_po sleep_on_page 19824 root btrfs balance start -dusage btrfs_wait_and_free_delalloc_work 24611 root /bin/sh -c cd /mnt/btrfs_po wait 24619 root btrfs subvolume snapshot /m btrfs_start_delalloc_inodes 32044 root /sbin/btrfs scrub start -Bd futex_wait_queue_me Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html