thr3ads.net - Btrfs devel - Advice on FS corruption [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Sam Thursfield

2012-Sep-03 14:23 UTC

Advice on FS corruption

Hi

I''ve been running btrfs in various VMs for a while, and periodically 
I''ve experienced corruption in the filesystems being used. None of the 
data is important, but I''d like to track down how the corruption 
occurred in the first place.

Trying to mount any of the corrupt filesystems fails with an error of 
this form:

[   47.805146] device label baserock devid 1 transid 90 /dev/sdb1
[   47.810073] btrfs: disk space caching is enabled
[   47.817261] parent transid verify failed on 1636728832 wanted 76 found 95
[   47.818081] parent transid verify failed on 1636728832 wanted 76 found 95
[   47.818522] Failed to read block groups: -5
[   47.826103] btrfs: open_ctree failed

This is with Linux master as of 29/Aug/2012, so including the latest 
''for-linus'' branch from the btrfs tree. Attempts to run
btrfs-debug-tree
on the disk images fail with the same error, and btrfsck segfaults.

I''ve not yet been able to reliably reproduce the cause of the 
corruption, but I know that in at least one case the VM was compiling 
code and then had a forced power-off. However, in at least one case the 
corruption appeared after a clean shut down. I have a suspicion that it 
may be linked with suspending the host machine and thus causing weird 
things to happen to time in the VM''s universe, since btrfs has been 
working fine with the same kernel in a VM on a machine that is never 
suspended or powered off, but I''ve not yet managed to prove anything.

I''ll keep trying to reproduce the issue, in the mean time I''m
interested
how common this sort of issue is and if anyone has any tips for 
repairing the image, or if there is work in progress to prevent or fix 
the corruption.

Thanks
Sam
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cwillu

2012-Sep-03 14:35 UTC

head link

Re: Advice on FS corruption

On Mon, Sep 3, 2012 at 8:23 AM, Sam Thursfield
<sam.thursfield@codethink.co.uk> wrote:> Hi
>
> I''ve been running btrfs in various VMs for a while, and
periodically I''ve
> experienced corruption in the filesystems being used. None of the data is
> important, but I''d like to track down how the corruption occurred
in the
> first place.
>
> Trying to mount any of the corrupt filesystems fails with an error of this
> form:
>
> [   47.805146] device label baserock devid 1 transid 90 /dev/sdb1
> [   47.810073] btrfs: disk space caching is enabled
> [   47.817261] parent transid verify failed on 1636728832 wanted 76 found
95
> [   47.818081] parent transid verify failed on 1636728832 wanted 76 found
95
> [   47.818522] Failed to read block groups: -5
> [   47.826103] btrfs: open_ctree failed
Try mounting with -o recovery.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sam Thursfield

2012-Sep-03 16:12 UTC

head link

Re: Advice on FS corruption

[I''m resending this mail to list instead of as a personal reply, sorry]

On 09/03/2012 03:35 PM, cwillu wrote:> On Mon, Sep 3, 2012 at 8:23 AM, Sam Thursfield
> <sam.thursfield@codethink.co.uk> wrote:
>> Hi
>>
>> I''ve been running btrfs in various VMs for a while, and
periodically I''ve
>> experienced corruption in the filesystems being used. None of the data
is
>> important, but I''d like to track down how the corruption
occurred in the
>> first place.
>>
>> Trying to mount any of the corrupt filesystems fails with an error of
this
>> form:
>>
>> [   47.805146] device label baserock devid 1 transid 90 /dev/sdb1
>> [   47.810073] btrfs: disk space caching is enabled
>> [   47.817261] parent transid verify failed on 1636728832 wanted 76
found 95
>> [   47.818081] parent transid verify failed on 1636728832 wanted 76
found 95
>> [   47.818522] Failed to read block groups: -5
>> [   47.826103] btrfs: open_ctree failed
>
> Try mounting with -o recovery.
>
Thanks, this gets more interesting!

For two of the FS''s I got the exact same error message.

For a much larger (40GB) filesystem the recovery silently succeeded. At 
this point I ran ''find'' in the root directory, which gave
frequent:

find: ./foo: Input/output error

messages for various small files. I aborted and found all this in dmesg:

[   29.498581] device fsid 7aaaea86-e354-46f7-aa9e-2278c858170a devid 1 
transid 35 /dev/sdb1
[   42.937330] parent transid verify failed on 31920128 wanted 9 found 26
[   42.961755] parent transid verify failed on 31920128 wanted 9 found 26
[   42.999560] parent transid verify failed on 31875072 wanted 9 found 26
[   43.035490] parent transid verify failed on 31875072 wanted 9 found 26
[   43.078782] parent transid verify failed on 31907840 wanted 9 found 26
[   43.079767] parent transid verify failed on 31907840 wanted 9 found 26
[   43.081685] parent transid verify failed on 31920128 wanted 9 found 26
[   43.082478] parent transid verify failed on 31920128 wanted 9 found 26
[   43.110576] parent transid verify failed on 31952896 wanted 9 found 27
[   43.112616] parent transid verify failed on 31952896 wanted 9 found 27

So, it seems to have improved matters, but am I correct in thinking this 
FS would now only be suitable for extracting as much of the data as 
possible and then discarding the whole thing? Or is the intention that 
an FS in such a state should be recovered to the point of being usable 
again?

Thanks
Sam


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Sep 2012 - Advice on FS corruption

Advice on FS corruption

Re: Advice on FS corruption

Re: Advice on FS corruption