thr3ads.net - Btrfs devel - Recovering btrfs fs after "failed to read chunk root" [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Daniele Buono

2013-Oct-04 08:50 UTC

Recovering btrfs fs after "failed to read chunk root"

So I''m writing my (dis)adventure with btrfs here hoping to help the
developers or someone with similar problems.

I had a btrfs filesystem at work, using two 1TB disks, raid1 for both
data and metadata.

A week ago one of the two disks start having hundreds of relocated
sectors, so I decide to change it.
I remove the failing disk, mount with -o degraded and every works fine.
The day later I decide to use an usb drive to temporarily work as the
secondary copy.
Now the usb drive had some GB less then the old drive, so I add the
drive plus a small partition of another disk to avoid space problems
on the rebalancing.

I start the "btrfs device delete missing" and btrfs starts rebalancing
the drive. After a while I get

Sep 30 11:35:31 tambura kernel: [264654.275303] kernel BUG at
fs/btrfs/relocation.c:1055!

The complete error is here http://paste.fedoraproject.org/44139/

I look around and find that the problem has been fixed with the latest
linux kernel (however, not yet marked as stable). Having a fedora 19,
I manage to install kernel 3.12.0-0.rc2.git3.1.fc21.x86_64 from
rawhide, reboot and try again.

The rebalancing works well this time but is, as expected, very slow
(about 50GB in half a day). I return home and leave the pc runnung all
the night. The morning I find this:

Sep 30 23:11:53 tambura kernel: [37787.652143] BTRFS error (device
sdc2) in btrfs_create_pending_block_groups:8516: errno=-5 IO failure
Sep 30 23:11:53 tambura kernel: [37787.652145] BTRFS info (device
sdc2): forced readonly

the whole messages.log is here: http://paste.fedoraproject.org/44143/24287138/

N.B: sdc2 is the partition of the new drive that, however, has no
relocated nor pending sectors according to SMART data.

I do not panic (the drive seems readable and the fs mounted) and
decide to reboot the pc.
It hangs during the reboot so I turn it off after some minutes and restart.

Now it comes the fun:
btrfs: allowing degraded mounts
btrfs: disk space caching is enabled
btrfs: failed to read chunk root on sdb1
btrfs: open_ctree failed

From this point I found no way of accessing my data. Nothing works,
including mount -o recovery and even btrfs-restore.

I find this discussion:
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg25990.html

So I tried several times with btrfs chunk-recover:

If I keep all the disks, it goes like this:
[root@tambura btrfs-progs]# ./btrfs chunk-recover -v /dev/sdb1 > log_2.txt
All Devices:
        Device: id = 3, name = /dev/sdc2
        Device: id = 4, name = /dev/sda2
        Device: id = 1, name = /dev/sdb1
btrfs: cmds-chunk.c:125: process_extent_buffer: Assertion
`!(exist->nmirrors >= 2)'' failed.

So I did try removing sdc2 (that is the drive where btrfs was
replicating data) and the chunk-recover works fine, but end like this:

Fail to recover the chunk tree.

The complete output is here: http://cwillu.com:8080/131.114.3.240/4

After some digging (by adding printf on errors) I found that the
problem lies in the function

"build_device_maps_by_chunk_records"

But I can''t tell you more for the moment.


Now, the good part of the story is that the disk with relocated
sectors was not too bad after all, and I managed to recover all the
lost files from there; so, I''m not really interested in recovering
this filesystem.

However, I''m writing this hoping that it may help the developers in
some way, perhaps improving the chunk-recover function.
If usefuly, I can "offer" my filesystem for testing new versions of
the chunk-recover, or any other tool.


Daniele Buono
PhD Student in Computer Science
Department of Computer Science, University of Pisa
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Possibly Parallel Threads

Search for more maybe matching threads

Btrfs devel - Oct 2013 - Recovering btrfs fs after "failed to read chunk root"

Recovering btrfs fs after "failed to read chunk root"

Possibly Parallel Threads

Wisdom of the Ancients