Marc MERLIN
2014-May-22 09:09 UTC
3.15.0-rc5: btrfs and sync deadlock: call_rwsem_down_read_failed
I got m laptop to hang all IO to one of its devices again, this time
drive #2.
This is the 3rd time it happens, and I've already lost data as a result
since things that haven't hit disk, don't make it at this point.
I was doing balance and btrfs send/receive.
Then cron started a scrub in the background too.
IO to drive #1 was working fine, I didn't even notice that drive #2 IO
was hung.
And then I typed sync and it never returned.
legolas:~# ps -eo pid,user,args,wchan | grep sync
23605 root sync call_rwsem_down_read_failed
31885 root sync call_rwsem_down_read_failed
What does this mean when sync is stuck that way?
When I'm in that state, accessing btrfs on drive 1 still works (read and
write).
Any access on drive 2 through btrfs hangs
Both block devices still work.
legolas:~# dd if=/dev/sda of=/dev/null bs=1M
2593128448 bytes (2.6 GB) copied, 6.47656 s, 400 MB/s
legolas:~# dd if=/dev/sdb of=/dev/null bs=1M
148897792 bytes (149 MB) copied, 7.99576 s, 18.6 MB/s
So at least it shows that I don't have a hardware problem, right?
After reboot, most of the data to disk1 made it, so at least sync worked
there.
How can I confirm that it is btrfs deadlocking and not something else in
the kernel?
The state of btrfs is:
legolas:~# ps -eo pid,user,args,wchan | grep btrfs
527 root [btrfs-worker] rescuer_thread
528 root [btrfs-worker-hi] rescuer_thread
529 root [btrfs-delalloc] rescuer_thread
530 root [btrfs-flush_del] rescuer_thread
531 root [btrfs-cache] rescuer_thread
532 root [btrfs-submit] rescuer_thread
533 root [btrfs-fixup] rescuer_thread
534 root [btrfs-endio] rescuer_thread
535 root [btrfs-endio-met] rescuer_thread
536 root [btrfs-endio-met] rescuer_thread
537 root [btrfs-endio-rai] rescuer_thread
538 root [btrfs-rmw] rescuer_thread
539 root [btrfs-endio-wri] rescuer_thread
540 root [btrfs-freespace] rescuer_thread
541 root [btrfs-delayed-m] rescuer_thread
542 root [btrfs-readahead] rescuer_thread
543 root [btrfs-qgroup-re] rescuer_thread
544 root [btrfs-cleaner] cleaner_kthread
545 root [btrfs-transacti] transaction_kthread
2111 root [btrfs-worker] rescuer_thread
2112 root [btrfs-worker-hi] rescuer_thread
2113 root [btrfs-delalloc] rescuer_thread
2114 root [btrfs-flush_del] rescuer_thread
2115 root [btrfs-cache] rescuer_thread
2116 root [btrfs-submit] rescuer_thread
2117 root [btrfs-fixup] rescuer_thread
2119 root [btrfs-endio] rescuer_thread
2120 root [btrfs-endio-met] rescuer_thread
2121 root [btrfs-endio-met] rescuer_thread
2122 root [btrfs-endio-rai] rescuer_thread
2123 root [btrfs-rmw] rescuer_thread
2124 root [btrfs-endio-wri] rescuer_thread
2125 root [btrfs-freespace] rescuer_thread
2126 root [btrfs-delayed-m] rescuer_thread
2127 root [btrfs-readahead] rescuer_thread
2128 root [btrfs-qgroup-re] rescuer_thread
3205 root [btrfs-cleaner] cleaner_kthread
3206 root [btrfs-transacti] transaction_kthread
19156 root gvim /etc/cron.d/btrfs_back poll_schedule_timeout
19729 root btrfs send var_ro.20140521_ pipe_wait
19730 root btrfs receive /mnt/btrfs_po sleep_on_page
19824 root btrfs balance start -dusage btrfs_wait_and_free_delalloc_work
24611 root /bin/sh -c cd /mnt/btrfs_po wait
24619 root btrfs subvolume snapshot /m btrfs_start_delalloc_inodes
32044 root /sbin/btrfs scrub start -Bd futex_wait_queue_me
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html