Chris Kastorff
2013-Dec-19 09:26 UTC
Unmountable Array After Drive Failure During Device Deletion
I'm using btrfs in data and metadata RAID10 on drives (not on md or any other fanciness.) I was removing a drive (btrfs dev del) and during that operation, a different drive in the array failed. Having not had this happen before, I shut down the machine immediately due to the extremely loud piezo buzzer on the drive controller card. I attempted to do so cleanly, but the buzzer cut through my patience and after 4 minutes I cut the power. Afterwards, I located and removed the failed drive from the system, and then got back to linux. The array no longer mounts ("failed to read the system array on sdc"), with nearly identical messages when attempted with -o recovery and -o recovery,ro. btrfsck asserts and coredumps, as usual. The drive that was being removed is devid 9 in the array, and is /dev/sdm1 in the btrfs fi show seen below. Kernel 3.12.4-1-ARCH, btrfs-progs v0.20-rc1-358-g194aa4a-dirty (archlinux build.) Can I recover the array? == dmesg during failure = ... sd 0:2:3:0: [sdd] Unhandled error code sd 0:2:3:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 sd 0:2:3:0: [sdd] CDB: cdb[0]=0x2a: 2a 00 26 89 5b 00 00 00 80 00 end_request: I/O error, dev sdd, sector 646535936 btrfs_dev_stat_print_on_error: 7791 callbacks suppressed btrfs: bdev /dev/sdd errs: wr 315858, rd 230194, flush 0, corrupt 0, gen 0 sd 0:2:3:0: [sdd] Unhandled error code sd 0:2:3:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 sd 0:2:3:0: [sdd] CDB: cdb[0]=0x2a: 2a 00 26 89 5b 80 00 00 80 00 end_request: I/O error, dev sdd, sector 646536064 ... == dmesg after new boot, mounting attempt = btrfs: device label lake devid 11 transid 4893967 /dev/sda btrfs: disk space caching is enabled btrfs: failed to read the system array on sdc btrfs: open_ctree failed == dmesg after new boot, mounting attempt with -o recovery,ro = btrfs: device label lake devid 11 transid 4893967 /dev/sda btrfs: enabling auto recovery btrfs: disk space caching is enabled btrfs: failed to read the system array on sdc btrfs: open_ctree failed == btrfsck = deep# btrfsck /dev/sda warning, device 14 is missing warning devid 14 not found already parent transid verify failed on 87601116364800 wanted 4893969 found 4893913 parent transid verify failed on 87601116364800 wanted 4893969 found 4893913 parent transid verify failed on 87601116381184 wanted 4893969 found 4893913 parent transid verify failed on 87601116381184 wanted 4893969 found 4893913 parent transid verify failed on 87601115320320 wanted 4893969 found 4893913 parent transid verify failed on 87601115320320 wanted 4893969 found 4893913 parent transid verify failed on 87601117097984 wanted 4893969 found 4892460 parent transid verify failed on 87601117097984 wanted 4893969 found 4892460 Ignoring transid failure Checking filesystem on /dev/sda UUID: d5e17c49-d980-4bde-bd96-3c8bc95ea077 checking extents parent transid verify failed on 87601117159424 wanted 4893969 found 4893913 parent transid verify failed on 87601117159424 wanted 4893969 found 4893913 parent transid verify failed on 87601116368896 wanted 4893969 found 4893913 parent transid verify failed on 87601116368896 wanted 4893969 found 4893913 parent transid verify failed on 87601117163520 wanted 4893969 found 4893913 parent transid verify failed on 87601117163520 wanted 4893969 found 4893913 parent transid verify failed on 87601117638656 wanted 4893969 found 4893913 parent transid verify failed on 87601117638656 wanted 4893969 found 4893913 Ignoring transid failure parent transid verify failed on 87601117171712 wanted 4893969 found 4893913 parent transid verify failed on 87601117171712 wanted 4893969 found 4893913 parent transid verify failed on 87601117175808 wanted 4893969 found 4893913 parent transid verify failed on 87601117175808 wanted 4893969 found 4893913 parent transid verify failed on 87601117188096 wanted 4893969 found 4893913 parent transid verify failed on 87601117188096 wanted 4893969 found 4893913 parent transid verify failed on 87601116807168 wanted 4893969 found 4893913 parent transid verify failed on 87601116807168 wanted 4893969 found 4893913 Ignoring transid failure parent transid verify failed on 87601117642752 wanted 4893969 found 4893913 parent transid verify failed on 87601117642752 wanted 4893969 found 4893913 Ignoring transid failure parent transid verify failed on 87601117650944 wanted 4893969 found 4893913 parent transid verify failed on 87601117650944 wanted 4893969 found 4893913 Ignoring transid failure Couldn't map the block 5764607523034234880 btrfsck: volumes.c:1019: btrfs_num_copies: Assertion `!(!ce)' failed. zsh: abort (core dumped) btrfsck /dev/sda == btrfs fi show = Label: 'lake' uuid: d5e17c49-d980-4bde-bd96-3c8bc95ea077 Total devices 10 FS bytes used 7.43TB devid 9 size 1.82TB used 1.61TB path /dev/sdm1 devid 12 size 1.82TB used 1.47TB path /dev/sdb devid 16 size 1.82TB used 1.47TB path /dev/sde devid 13 size 1.82TB used 1.47TB path /dev/sdc devid 11 size 1.82TB used 1.47TB path /dev/sda devid 19 size 1.82TB used 1.47TB path /dev/sdk devid 17 size 1.82TB used 1.47TB path /dev/sdf devid 18 size 1.82TB used 1.47TB path /dev/sdg devid 15 size 1.82TB used 1.47TB path /dev/sdd *** Some devices missing -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html