thr3ads.net - Btrfs devel - scrub fails, any way to recover? [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Neil Schemenauer

2013-Jan-20 15:54 UTC

scrub fails, any way to recover?

I have a ~350 GB Btrfs filesystem that is corrupted.  I think the
damage was caused by a bad SATA cable.  I can mount the filesystem
and read most of the data (I already have backups of most everything).

The scrub is aborted after a few seconds with the following error in
the kernel log:

    parent transid verify failed on 795639808 wanted 102145 found 101462
    parent transid verify failed on 795639808 wanted 102145 found 101462
    verify_parent_transid: 16273 callbacks suppressed
    ...

Trying to remove the corrupted directory tree results in the following:

    device label DATA devid 1 transid 102169 /dev/sda2
    btrfs: enabling auto recovery
    btrfs: disk space caching is enabled
    verify_parent_transid: 12197 callbacks suppressed
    parent transid verify failed on 795062272 wanted 102145 found 101462
    ...
    ------------[ cut here ]------------
    WARNING: at fs/btrfs/super.c:256 __btrfs_abort_transaction+0x43/0xb6()
    Hardware name: MS-7388
    btrfs: Transaction aborted
    ...
    Pid: 24332, comm: rm Not tainted 3.8.0-rc4 #66
    Call Trace:
    [<ffffffff81162000>] ? __btrfs_abort_transaction+0x39/0xb6
    [<ffffffff8102cf85>] warn_slowpath_common+0x7e/0x97
    [<ffffffff8102d032>] warn_slowpath_fmt+0x41/0x43
    [<ffffffff81198d6e>] ? set_extent_dirty+0x1b/0x1d
    [<ffffffff8116200a>] __btrfs_abort_transaction+0x43/0xb6
    [<ffffffff81170db9>] __btrfs_free_extent+0x612/0x64e
    [<ffffffff811950eb>] ? btrfs_get_token_32+0x79/0xc7
    [<ffffffff811b8fa9>] ? btrfs_merge_delayed_refs+0x24b/0x266
    [<ffffffff81173cfe>] run_clustered_refs+0x7e3/0x8b9
    [<ffffffff81176b20>] btrfs_run_delayed_refs+0xde/0x268
    [<ffffffff811843f8>] __btrfs_end_transaction+0xd8/0x2cf
    [<ffffffff8118461a>] btrfs_end_transaction+0xb/0xd
    [<ffffffff81186b15>] __unlink_end_trans+0x5e/0x63
    [<ffffffff8118baf1>] btrfs_unlink+0x86/0xa0
    [<ffffffff810bc29f>] vfs_unlink+0x6f/0xdc
    [<ffffffff810bc3f9>] do_unlinkat+0xed/0x199
    [<ffffffff810b1d2e>] ? vfs_write+0x100/0x127
    [<ffffffff810b1f32>] ? sys_write+0x44/0x75
    [<ffffffff810bdf8a>] sys_unlinkat+0x1d/0x29
    [<ffffffff8147f9d2>] system_call_fastpath+0x16/0x1b
    ---[ end trace ce4d352b0ec7d230 ]---
    BTRFS error (device sda2) in __btrfs_free_extent:5184: IO failure
    btrfs is forced readonly
    btrfs: run_one_delayed_ref returned -5

I''ve tried btrfsck but it fails as well.  Is there some way I can
remove the damaged data and save the good or is a re-format the only
solution?

  Neil
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2013-Jan-21 11:45 UTC

head link

Re: scrub fails, any way to recover?

On Sun, Jan 20, 2013 at 09:54:38AM -0600, Neil Schemenauer
wrote:> I have a ~350 GB Btrfs filesystem that is corrupted.  I think the
> damage was caused by a bad SATA cable.  I can mount the filesystem
> and read most of the data (I already have backups of most everything).
> 
> The scrub is aborted after a few seconds with the following error in
> the kernel log:
> 
>     parent transid verify failed on 795639808 wanted 102145 found 101462
>     parent transid verify failed on 795639808 wanted 102145 found 101462
>     verify_parent_transid: 16273 callbacks suppressed
the difference between 102145 and 101462 is small and looks like a bunch
lost writes (ie. not a random corruption), this supports the ''bad
cable''
root cause.

From ''16273 callbacks suppressed'', there is a large number of
broken
b-tree connections.

So far the rescue operation is to run btrfs-restore and copy the data
out.
> I''ve tried btrfsck but it fails as well.  Is there some way I can
> remove the damaged data and save the good or is a re-format the only
> solution?
IIRC removing the damaged data hasn''t been proposed yet, there was a
patch to ignore the failures in a read-only mount

https://patchwork.kernel.org/patch/913642/
(probably does not apply today)

I think that the -o recovery mode could be extended in a way that a
read-only + recovery would ignore the failures.

I see two ways how to fix the on-disk b-tree structure (via fsck):

1) wipe the broken blocks and unlink from b-tree -- but a broen node on
   high level would kill lots of data unpredictably

2) in some cases it would be possible to promote the old transids to the
   current ones (to satisfy the transid verify check), however there may
   be some blocks already overwritten so it only pushes the problem
   farther

Level of success depends on amount of data lost during the cable unplug
and whether data or metadata were affected. It''s more likely to rescue
the filesystem if less metadata were affected.

david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jan 2013 - scrub fails, any way to recover?

scrub fails, any way to recover?

Re: scrub fails, any way to recover?