thr3ads.net - Btrfs devel - Is `btrfsck --repair` supposed to actually repair problems? [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Charles Cazabon

2013-Oct-01 21:12 UTC

Is `btrfsck --repair` supposed to actually repair problems?

Greetings,

I''ve been using btrfs for bulk-storage purposes for a couple of years
now (on
vanilla linux-stable kernels on a few machines).  I recently set up a new
filesystem and have been copying data to it, when I had an unrelated kernel
lockup.  As expected, after rebooting btrfsck reported some checksum verify
errors like:

checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E

There''s a few dozen of these.

Running btrfsck with the --repair option, however, does not appear to fix
these problems.  I''ll attach the complete output of running with the
--repair
option; running btrfsck in check-only mode afterwards reports largely the same
checksum errors as it did originally, prior to "repair".

Shouldn''t `btrfsck --repair` actually repair these errors?  Am I doing
something wrong?

System details:
  -current kernel is linux-stable 3.9.11 x86_64
  -btrfs-progs built from
    git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git, which
    doesn''t appear to have changed in a long time
  -filesystem is 16.4TiB btrfs on LVM on md_crypt on an mdadm RAID-6 array.
  I know this is perhaps an odd setup, but btrfs didn''t support RAID-6
when I
  started using it.

Any advice appreciated.  Thanks,

Charles

-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL''ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

Chris Murphy

2013-Oct-01 22:01 UTC

head link

Re: Is `btrfsck --repair` supposed to actually repair problems?

On Oct 1, 2013, at 3:12 PM, Charles Cazabon
<charlesc-lists-btrfs@pyropus.ca> wrote:
> Greetings,
> 
> I''ve been using btrfs for bulk-storage purposes for a couple of
years now (on
> vanilla linux-stable kernels on a few machines).  I recently set up a new
> filesystem and have been copying data to it, when I had an unrelated kernel
> lockup.  As expected, after rebooting btrfsck reported some checksum verify
> errors like:
> 
> checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
> checksum verify failed on 806795800576 found 01A8A8FB wanted 51361541
> checksum verify failed on 846990413824 found FB9C4BDC wanted AA2E389E
> 
> There''s a few dozen of these.
> 
> Running btrfsck with the --repair option, however, does not appear to fix
> these problems.  I''ll attach the complete output of running with
the --repair
> option; running btrfsck in check-only mode afterwards reports largely the
same
> checksum errors as it did originally, prior to "repair".
> 
> Shouldn''t `btrfsck --repair` actually repair these errors?  Am I
doing
> something wrong?
It looks like the file system thinks the file has changed and isn''t
matching checksum. That''s not obviously fixable unless both data and
metadata are raid1. More information is needed:

btrfs fi df <mountpoint>
btrfs show
dmesg | grep -i btrfs
dmesg | grep ata<port#>

I''m assuming it''s a SATA drive, and if so you can get the port
number with the last command and no port number, and figure out what port the
drive is on. For me I get a line:
[    1.388091] ata1.00: ATA-8: WDC WD5000BEVT-22ZAT0, 01.01A01, max UDMA/133

So I''d use dmesg |grep ata1

Do that for all drives in the btrfs volume.

And report the version of btrfs-progs.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Charles Cazabon

2013-Oct-01 23:46 UTC

head link

Re: Is `btrfsck --repair` supposed to actually repair problems?

Hi, Chris,

Chris Murphy <lists@colorremedies.com> wrote:> On Oct 1, 2013, at 3:12 PM, Charles Cazabon
> <charlesc-lists-btrfs@pyropus.ca> wrote:
> 
> > Running btrfsck with the --repair option, however, does not appear to
fix
> > these [checksum verify] problems.  I''ll attach the complete
output of
> > running with the --repair option; running btrfsck in check-only mode
> > afterwards reports largely the same checksum errors as it did
originally,
> > prior to "repair".  something wrong?
> 
> It looks like the file system thinks the file has changed and
isn''t matching
> checksum. That''s not obviously fixable unless both data and
metadata are
> raid1.i
Perhaps this wasn''t clear from my original message, but I''m
not using btrfs''
RAID or lvm-like capabilities.  The filesystem is on an LVM logical volume,
with the actual underlying storage being an 8-disk RAID-6 array (mdadm array).
So the stack is:

    vanilla btrfs filesystem (not using subvolumes, btrfs'' multiple
device
       support or any other advanced features)

    LVM logical volume

    LVM volume group

    LVM physical volume

    md_crypt / LUKS encrypted volume

    mdadm RAID-6 array

    8 x SATA disks
> More information is needed:
Okay:

  # btrfs fi df /media/bigbackup/
  Data: total=4.53TB, used=4.22TB
  System, DUP: total=8.00MB, used=508.00KB
  System: total=4.00MB, used=0.00
  Metadata, DUP: total=18.00GB, used=17.13GB
  Metadata: total=8.00MB, used=0.00
> btrfs show
This fails with `btrfs: unknown token ''show''`.
> dmesg | grep -i btrfs
After mounting the filesystem read-only, the following ends up in the syslog:

  [13333.117462] Btrfs loaded
  [13333.157078] device label bigbackup devid 1 transid 5249
      /dev/mapper/extbackup-bigbackup
  [13333.158445] btrfs: disk space caching is enabled

That''s the only btrfs-related info that gets logged.
> dmesg | grep ata<port#>
> 
> I''m assuming it''s a SATA drive,
As I say, it''s 8 disks (yes, SATA).  What info exactly do you want
about the
disks and ports?  The log is quite noisy because these are behind SATA port
multipliers, and there are a bunch of other SATA drives in the system.  But if
I filter out all the extra stuff, then when I power up the port-multiplier
boxes that the disks are in, what''s logged is 126 lines (much of it
garbage
from not all possible multiplier ports being in use), log attached.

The 8 disks are, as you can see, all identical Seagate units:

  ATA-8: ST3000DM001-1E6166, CC45, max UDMA/133
> And report the version of btrfs-progs.
Btrfs v0.20-rc1-358-g194aa4a-dirty

That''s what I get when I build from the git repository at
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git

git insists I''m fully up to date, though the last time I pulled before
today
was over a month ago.

Charles

-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL''ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

Chris Murphy

2013-Oct-02 00:42 UTC

head link

Re: Is `btrfsck --repair` supposed to actually repair problems?

On Oct 1, 2013, at 5:46 PM, Charles Cazabon
<charlesc-lists-btrfs@pyropus.ca> wrote:> 
>  # btrfs fi df /media/bigbackup/
>  Data: total=4.53TB, used=4.22TB
>  System, DUP: total=8.00MB, used=508.00KB
>  System: total=4.00MB, used=0.00
>  Metadata, DUP: total=18.00GB, used=17.13GB
>  Metadata: total=8.00MB, used=0.00
Since there''s only one copy of the data, there isn''t a way to
repair it, it just notes that there is a checksum
mismatch.> 
>> btrfs show
> 
> This fails with `btrfs: unknown token ''show''`.
I meant ''btrfs fi show''

> As I say, it''s 8 disks (yes, SATA).  What info exactly do you want
about the
> disks and ports? 
Looking for problems that relate to this one.

When was the last time you did a scrub on the md device? And what was the
result?

What is the ''smartctl -l scterc /dev/sdX'' result for one of
the drives?

This sounds to me like it could be a bit flip, and btrfs is catching it but
doesn''t have a 2nd copy of the data. Just a guess.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Charles Cazabon

2013-Oct-02 03:13 UTC

head link

Re: Is `btrfsck --repair` supposed to actually repair problems?

Chris Murphy <lists@colorremedies.com> wrote:> On Oct 1, 2013, at 5:46 PM, Charles Cazabon wrote:
> > 
> >  # btrfs fi df /media/bigbackup/
> >  Data: total=4.53TB, used=4.22TB
> >  System, DUP: total=8.00MB, used=508.00KB
> >  System: total=4.00MB, used=0.00
> >  Metadata, DUP: total=18.00GB, used=17.13GB
> >  Metadata: total=8.00MB, used=0.00
> 
> Since there''s only one copy of the data, there isn''t a
way to repair it, it
> just notes that there is a checksum mismatch.
Ah, I''m not looking to repair the files -- I can recopy the files
easily
enough, and rsync will pick up any files whose contents have been corrupted.
I''d like to get the filesystem fixed, though.  i.e., even deleting the
affected files would be fine.  This is a new filesystem to replace my existing
(full) backups filesystem.  The existing backups one is ext4 but this new one
is too big for mkfs.ext4 to handle, so btrfs it is.  I wasn''t expecting
problems as I''ve been running btrfs for other purposes for years.

Am I misunderstanding something here?  It seems to me like btrfsck is telling
me there''s problems with the filesystem itself when it continues to
report
these checksum errors even after a `btrfsck --repair`.
> I meant ''btrfs fi show''
  Label: ''bigbackup''  uuid:
c18dfd04-d931-4269-b999-e94df3b1918c
  Total devices 1 FS bytes used 4.23TB
  devid    1 size 16.37TB used 4.56TB path /dev/dm-9
> > As I say, it''s 8 disks (yes, SATA).  What info exactly do you
want about
> > the disks and ports? 
> 
> Looking for problems that relate to this one.
> 
> When was the last time you did a scrub on the md device? And what was the
> result?
It''s a brand new array.  The initial sync is actually still going on
(about
half complete; it''ll take several days to initialize an array this size
on
this hardware).

So in short, the underlying array is clean.
> What is the ''smartctl -l scterc /dev/sdX'' result for one
of the drives?
  Warning: device does not support SCT Error Recovery Control command
> This sounds to me like it could be a bit flip, and btrfs is catching it but
> doesn''t have a 2nd copy of the data. Just a guess.
If one of the disks flipped a bit, it would be caught at the md RAID-6 level,
no?

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL''ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Murphy

2013-Oct-02 03:50 UTC

head link

Re: Is `btrfsck --repair` supposed to actually repair problems?

On Oct 1, 2013, at 9:13 PM, Charles Cazabon
<charlesc-lists-btrfs@pyropus.ca> wrote:> 
> Ah, I''m not looking to repair the files -- I can recopy the files
easily
> enough, and rsync will pick up any files whose contents have been
corrupted.
> I''d like to get the filesystem fixed, though.  i.e., even deleting
the
> affected files would be fine.
If you run a scrub, dmesg should contain the path for affected files which you
can then delete. If it''s just a checksum problem with files, the file
system doesn''t need fixing. I''d wait until the raid is
finished syncing.
>  This is a new filesystem to replace my existing
> (full) backups filesystem.  The existing backups one is ext4 but this new
one
> is too big for mkfs.ext4 to handle, so btrfs it is.  I wasn''t
expecting
> problems as I''ve been running btrfs for other purposes for years.
It''s still experimental. I''d expect almost anything.

> 
> Am I misunderstanding something here?  It seems to me like btrfsck is
telling
> me there''s problems with the filesystem itself when it continues
to report
> these checksum errors even after a `btrfsck --repair`.
Well I haven''t seen the entire btrfsck or the entire dmesg so like I
said I''m sorta guessing it''s just a file problem, but maybe
you''ve stumbled on something else.
> 
> It''s a brand new array.  The initial sync is actually still going
on (about
> half complete; it''ll take several days to initialize an array this
size on
> this hardware).
OK maybe someone else can comment if this is expected to work, maybe on
linux-raid even. But now you tell us this? You didn''t think it might be
important to mention that you''ve got a raid initially syncing, that
you''ve formatted btrfs, copied files over, and at some point you got a
kerne lock up, and then once restarted you ran a btrfsck?

I would expect problems with any file system, with a system that locks up while
the raid is still syncing.
> So in short, the underlying array is clean.
Well except you''ve got either file system corruption, or corrupt files.
> 
>> What is the ''smartctl -l scterc /dev/sdX'' result for
one of the drives?
> 
>  Warning: device does not support SCT Error Recovery Control command
These drives aren''t well suited for RAID of any kind. Hopefully, at
least, you will change the scsi layer time out for each drive using
echo 121 >/sys/block/sdX/device/timeout

That may not even be long enough, but without more information about what the
ERC timeout of the drive is, which the manufacturer might have in the exhaustive
version of their spec book, it''s a guess. Consumer drives try to
recover for up to a couple minutes. If the scsi layer resets in 30 seconds (the
default) then sector problems are never fixed because the drive never reports
the read error back to the kernel. And md won''t write over the bad
sector with reconstructed data. So you get an accumulation of bad sectors,
rather than them being taken care of normally.

Your application layer might get frustrated, or worse, with up to 2 minute
delays in the storage stack.
> 
>> This sounds to me like it could be a bit flip, and btrfs is catching it
but
>> doesn''t have a 2nd copy of the data. Just a guess.
> 
> If one of the disks flipped a bit, it would be caught at the md RAID-6
level,
> no?
No. In normal operation the parity is never consulted, so it would have no idea
if there''s a flipped bit. The hardware ought to catch it, but we know
that isn''t always true.



Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Charles Cazabon

2013-Oct-02 16:53 UTC

head link

Re: Is `btrfsck --repair` supposed to actually repair problems?

Chris Murphy <lists@colorremedies.com> wrote:> On Oct 1, 2013, at 9:13 PM, Charles Cazabon wrote:
> > 
> > Ah, I''m not looking to repair the files -- I can recopy the
files easily
> > enough, and rsync will pick up any files whose contents have been
corrupted.
> 
> If you run a scrub, dmesg should contain the path for affected files which
> you can then delete. If it''s just a checksum problem with files,
the file
> system doesn''t need fixing.
Okay, I''ll do that.
> I''d wait until the raid is finished syncing.
Strictly speaking, this shouldn''t be necessary.  mdadm arrays are fully
usable
from creation during the initial sync; the system tracks which bits have been
initialized and which haven''t.
> > It''s a brand new array.  The initial sync is actually still
going on
> > (about half complete; it''ll take several days to initialize
an array this
> > size on this hardware).
> 
> OK maybe someone else can comment if this is expected to work, maybe on
> linux-raid even.
https://raid.wiki.kernel.org/index.php/Initial_Array_Creation talks about the
initial (re)sync.  It explicitly states:

  This can take quite a time and the array is not fully resilient whilst this
  is happening (it is however fully useable). 
> But now you tell us this? You didn''t think it might be important
to mention
> that you''ve got a raid initially syncing, that you''ve
formatted btrfs,
> copied files over, and at some point you got a kerne lock up, and then once
> restarted you ran a btrfsck?
Yes.  The array uses a write-intent bitmap, so the kernel lockup during the
initial sync does not cause corruption; when the system is brought back up, it
may re-initialize a portion that it had already initialized (i.e. it''s
not
100% efficient), but it doesn''t result in corruption.
> I would expect problems with any file system, with a system that locks up
> while the raid is still syncing.
No, this doesn''t cause any particular problems.  It''s just
like the normal
case of a single-drive filesystem and the system crashing during a write.
You just fsck to address any problems the interrupted write caused and recover
the journal (if applicable).

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL''ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Murphy

2013-Oct-02 19:13 UTC

head link

Re: Is `btrfsck --repair` supposed to actually repair problems?

On Oct 2, 2013, at 10:53 AM, Charles Cazabon
<charlesc-lists-btrfs@pyropus.ca> wrote:
>> I''d wait until the raid is finished syncing.
> 
> Strictly speaking, this shouldn''t be necessary.  mdadm arrays are
fully usable
> from creation during the initial sync; the system tracks which bits have
been
> initialized and which haven''t.
I know but it''s a 16TB array, do you really want to start over from
scratch? No. And neither do most people. So this isn''t a use case
that''s probably getting a ton of testing.

>> But now you tell us this? You didn''t think it might be
important to mention
>> that you''ve got a raid initially syncing, that you''ve
formatted btrfs,
>> copied files over, and at some point you got a kerne lock up, and then
once
>> restarted you ran a btrfsck?
> 
> Yes.  The array uses a write-intent bitmap, so the kernel lockup during the
> initial sync does not cause corruption; when the system is brought back up,
it
> may re-initialize a portion that it had already initialized (i.e.
it''s not
> 100% efficient), but it doesn''t result in corruption.
OK except there is corruption. We just don''t know for sure if
it''s just files or if it''s the file system. If you
don''t know already what caused it, it''s not really correct to
say what doesn''t result in corruption.

Also the write-intent bitmap isn''t configured by default, and you
didn''t previous say that it was. Is this an internal or external
bitmap?

>> I would expect problems with any file system, with a system that locks
up
>> while the raid is still syncing.
> 
> No, this doesn''t cause any particular problems.  It''s
just like the normal
> case of a single-drive filesystem and the system crashing during a write.
> You just fsck to address any problems the interrupted write caused and
recover
> the journal (if applicable).
If only hardware worked exactly per spec, and also didn''t lie about
committing data to disk rather than merely keeping it in cache, this may be
true. But hardware lies, it has bugs. And the kernel isn''t bug free
either.


Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Charles Cazabon

2013-Oct-02 19:56 UTC

head link

Re: Is `btrfsck --repair` supposed to actually repair problems?

Chris Murphy <lists@colorremedies.com> wrote:> On Oct 2, 2013, at 10:53 AM, Charles Cazabon wrote:
> 
> >> I''d wait until the raid is finished syncing.
> > 
> > Strictly speaking, this shouldn''t be necessary.
> 
> I know but it''s a 16TB array, do you really want to start over
from scratch?
> No. And neither do most people. So this isn''t a use case
that''s probably
> getting a ton of testing.
Fair enough.  The sync should be done late today or early tomorrow, and I am
waiting for it to complete before continuing to debug this.  I''ll start
with
the scrub you mentioned.
> Also the write-intent bitmap isn''t configured by default, and you
didn''t
> previous say that it was. Is this an internal or external bitmap?
Internal.

Thanks for your assistance to date.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL''ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Oct 2013 - Is `btrfsck --repair` supposed to actually repair problems?

Is `btrfsck --repair` supposed to actually repair problems?

Re: Is `btrfsck --repair` supposed to actually repair problems?

Re: Is `btrfsck --repair` supposed to actually repair problems?

Re: Is `btrfsck --repair` supposed to actually repair problems?

Re: Is `btrfsck --repair` supposed to actually repair problems?

Re: Is `btrfsck --repair` supposed to actually repair problems?

Re: Is `btrfsck --repair` supposed to actually repair problems?

Re: Is `btrfsck --repair` supposed to actually repair problems?

Re: Is `btrfsck --repair` supposed to actually repair problems?