thr3ads.net - Btrfs devel - btrfs csum failed [May 2011]

If this information is useful, please help other people find it:
Share via:

Martin Schitter

2011-May-03 21:56 UTC

btrfs csum failed

since my last debian kernel-update to 2.6.38-2-amd64 i got troubles with 
csum failures. it''s a volume full of huge kvm-images on md-RAID1 and 
LVM, so i used the mount options: ''noatime,nodatasum'' to
maximize the
performance.

it happened two weeks ago for the fist time. and now again a kvm-image 
isn''t readable again. i have to use an older snapshot to substitute the
virtual machine.

this are the entries in dmesg/kernel-log on any access:
...
  [2412668.409442] btrfs csum failed ino 258 off 2331529216 csum 
3632892464 private 2115348581
...

it''s a production machine, so i can not make to much experiments on it.
do you see an obvious way to solve this problem?

thanks!
martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2011-May-04 00:28 UTC

head link

Re: btrfs csum failed

On Tue, May 03, 2011 at 11:56:32PM +0200, Martin Schitter
wrote:> since my last debian kernel-update to 2.6.38-2-amd64 i got troubles with  
> csum failures. it''s a volume full of huge kvm-images on md-RAID1
and
> LVM, so i used the mount options: ''noatime,nodatasum'' to
maximize the
> performance.
>
> it happened two weeks ago for the fist time. and now again a kvm-image  
> isn''t readable again. i have to use an older snapshot to
substitute the
> virtual machine.
>
> this are the entries in dmesg/kernel-log on any access:
> ...
>  [2412668.409442] btrfs csum failed ino 258 off 2331529216 csum  
> 3632892464 private 2115348581
> ...
>
> it''s a production machine, so i can not make to much experiments
on it.
> do you see an obvious way to solve this problem?
>
Wait why are you running with btrfs in production?  What OS is in this vm image?
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Schitter

2011-May-04 00:44 UTC

head link

Re: btrfs csum failed

Am 2011-05-04 02:28, schrieb Josef Bacik:> Wait why are you running with btrfs in production?
do you know a better alternative for continuous snapshots? :)

it works surprisingly well since more than a year.
well the performance could be better for vm-image-hosting but it works.

we used cache=''writeback'' for a long time but now all virtual
instances
have set cache=''none''
> What OS is in this vm image?
2.6.30-bpo.1-amd64 with virtio-driver

could you give me some advice how to debug/report this specific problem 
more precise?

thanks
martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fajar A. Nugraha

2011-May-04 02:18 UTC

head link

Re: btrfs csum failed

On Wed, May 4, 2011 at 7:44 AM, Martin Schitter <ms@mur.at>
wrote:> Am 2011-05-04 02:28, schrieb Josef Bacik:
>>
>> Wait why are you running with btrfs in production?
>
> do you know a better alternative for continuous snapshots? :)
zfs :D
>
> it works surprisingly well since more than a year.
> well the performance could be better for vm-image-hosting but it works.
>
> we used cache=''writeback'' for a long time but now all
virtual instances have
> set cache=''none''
>
>> What OS is in this vm image?
>
> 2.6.30-bpo.1-amd64 with virtio-driver
>
> could you give me some advice how to debug/report this specific problem
more
> precise?
If it''s not reproducible then I''d suspect it''d be
hard to do.

Usually checksum errors is early sign of hardware failure (most common
are disk or power supply).

-- 
Fajar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Schitter

2011-May-04 11:39 UTC

head link

Re: btrfs csum failed

Am 2011-05-04 04:18, schrieb Fajar A. Nugraha:>> could you give me some advice how to debug/report this specific
>> problem more
>>> precise?
> If it''s not reproducible then I''d suspect it''d
be hard to do.
the last working snapshot is from 2011-05-02-17:13. i can reproduce this 
file system corruption on one specific file in any hourly snapshot later.

whenever i make a simple:

   cat snapshot-2011-05-02-18:13/sata-images/image_xy.raw > /dev/null

i get an "Input/output error" and the quoted debug messages in dmesg
and
kernel-log

could this be seen as an useful starting point for further investigations?
> Usually checksum errors is early sign of hardware failure (most
> common are disk or power supply).
that looks very unplausible to me. there is an RAID1 layer beneath btrfs 
in our setup and i don''t see any errors there.

and the ''nodatasum'' option should also ignore csum issues.--
isn''t it?

martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hugo Mills

2011-May-04 11:47 UTC

head link

Re: btrfs csum failed

On Wed, May 04, 2011 at 01:39:46PM +0200, Martin Schitter
wrote:> and the ''nodatasum'' option should also ignore csum
issues.-- isn''t it?
   No, "nodatasum" will prevent newly-written data from being
checksummed.  However, if a checksum already exists (because the data
was written to a filesystem mounted without the "nodatasum" option),
btrfs will still verify the checksum, regardless of the current
setting of nodatasum.

   There is currently no way of preventing btrfs from verifying
checksums if they exist; I don''t believe that there''s any way
of
removing an existing checksum, either.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==  PGP
key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Charting the inexorable advance of Western syphilisation... ---

cwillu

2011-May-04 11:51 UTC

head link

Re: btrfs csum failed

On Wed, May 4, 2011 at 5:39 AM, Martin Schitter <ms@mur.at>
wrote:> Am 2011-05-04 04:18, schrieb Fajar A. Nugraha:
>>>
>>> could you give me some advice how to debug/report this specific
>>> problem more
>>>>
>>>> precise?
>>
>> If it''s not reproducible then I''d suspect
it''d be hard to do.
>
> the last working snapshot is from 2011-05-02-17:13. i can reproduce this
> file system corruption on one specific file in any hourly snapshot later.
That''s not surprising, any later snapshots will be sharing the same
corrupted block.
> that looks very unplausible to me. there is an RAID1 layer beneath btrfs in
> our setup and i don''t see any errors there.
That doesn''t rule out the possibility of corruption when it was
written in the first place, or some similar problem that the raid1
faithfully reproduced on both mirrors.  That''s not to say that
it''s
impossible that the problem is in btrfs, just that it''s not the only
plausible possibility.
> and the ''nodatasum'' option should also ignore csum
issues.-- isn''t it?
No, it only affects writing new checksums; any existing checksums are
still checked.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Schitter

2011-May-04 12:27 UTC

head link

Re: btrfs csum failed

Am 2011-05-04 13:51, schrieb cwillu:>> that looks very unplausible to me. there is an RAID1 layer beneath
btrfs in
>> our setup and i don''t see any errors there.
>
> That doesn''t rule out the possibility of corruption when it was
> written in the first place, or some similar problem that the raid1
> faithfully reproduced on both mirrors.  That''s not to say that
it''s
> impossible that the problem is in btrfs, just that it''s not the
only
> plausible possibility.
well -- i am doing a backup of all images every night. this process 
should work like a simple "scrub" because all data (and its
checksumes)
will be read. that''s the way i stumbled over this problem!
>> and the ''nodatasum'' option should also ignore csum
issues.-- isn''t it?
 >> No, it only affects writing new checksums; any existing checksums are
> still checked.
would it make some sense to remount the volume with checksumming enabled 
and run additional tests to find similar suspect blocks to prevent this 
kind of suddenly broken files?

martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kaspar Schleiser

2011-May-04 12:31 UTC

head link

Re: btrfs csum failed

Hey Martin,

On 05/04/11 13:39, Martin Schitter wrote:>> Usually checksum errors is early sign of hardware failure (most
>> common are disk or power supply).
>
> that looks very unplausible to me. there is an RAID1 layer beneath btrfs
> in our setup and i don''t see any errors there.Is the btrfs RAID1 itself inside a virtual machine? I''ve had data 
corruption with virtio block devices > 1TB on early squeeze kernels.

Kaspar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2011-May-04 12:39 UTC

head link

Re: btrfs csum failed

Excerpts from Martin Schitter''s message of 2011-05-03 17:56:32
-0400:> since my last debian kernel-update to 2.6.38-2-amd64 i got troubles with 
> csum failures. it''s a volume full of huge kvm-images on md-RAID1
and
> LVM, so i used the mount options: ''noatime,nodatasum'' to
maximize the
> performance.
> 
> it happened two weeks ago for the fist time. and now again a kvm-image 
> isn''t readable again. i have to use an older snapshot to
substitute the
> virtual machine.
> 
> this are the entries in dmesg/kernel-log on any access:
> ...
>   [2412668.409442] btrfs csum failed ino 258 off 2331529216 csum 
> 3632892464 private 2115348581
> ...
> 
> it''s a production machine, so i can not make to much experiments
on it.
> do you see an obvious way to solve this problem?
What OS is inside these virtual machines?  The btrfs unstable tree has
some fixes for windows based OSes.

Is your kvm config using O_DIRECT?

I''ve also got patches here that force us to honor nodatasum even when
the file has csums, that can help if the contents of the file are
actually good.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Edward Ned Harvey

2011-May-04 13:23 UTC

head link

RE: btrfs csum failed

> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of Martin Schitter
> 
> well -- i am doing a backup of all images every night. this process
> should work like a simple "scrub" because all data (and its
checksumes)
> will be read. 
Sorry, not correct.  When you read all the data using something in user-land,
the OS only needs to read one side of the data.  It can accelerate by staggering
the read requests across multiple disks.  So some sectors remain unread on some
disks.

When you scrub, it reads all the data from all the redundant copies (mirrored or
raid) on all the individual disks in the raid set.

For this reason, you always want to use JBOD, and don''t use hardware
raid.  Because if there''s an undetected hardware error, the hardware
raid will make it impossible for the OS to examine individual disks to identify
the failing one.

At least I know all the above is true for reading & scrubbing in another
filesystem, I don''t actually know any of this for fact in btrfs, but it
seems so basic I would be flabbergasted if I learned that wasn''t the
btrfs behavior.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Schitter

2011-May-04 13:25 UTC

head link

Re: btrfs csum failed

Am 2011-05-04 14:31, schrieb Kaspar Schleiser:> Is the btrfs RAID1 itself inside a virtual machine? I''ve had data
> corruption with virtio block devices > 1TB on early squeeze kernels.
no -- it''s on the (native) host side. and we use a very actual kernel 
from debian ''testing'' (2.6.38-2).

martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Schitter

2011-May-04 14:06 UTC

head link

Re: btrfs csum failed

Am 2011-05-04 14:39, schrieb Chris Mason:> What OS is inside these virtual machines?  The btrfs unstable tree has
> some fixes for windows based OSes.
we have only linux guests of different flavor, no windows guests.

both corruptions during this last weeks belong to different virtual 
block device images of the same guest instance.
> Is your kvm config using O_DIRECT?
yes -- the kvm/qemu option cache="none" implies O_DIRECT.
> I''ve also got patches here that force us to honor nodatasum even
when
> the file has csums, that can help if the contents of the file are
> actually good.
that sounds interessting! in our case it may be easier do use same 
recent backup data, but it could be very helpful in similar situations.

i would really like to help isolating the reasons of this failure and a 
find a practical strategy to prevent additional breakdowns.

thanks
martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jan Schmidt

2011-May-04 14:09 UTC

head link

Re: btrfs csum failed

On 04.05.2011 13:51, cwillu wrote:> On Wed, May 4, 2011 at 5:39 AM, Martin Schitter <ms@mur.at> wrote:
>> and the ''nodatasum'' option should also ignore csum
issues.-- isn''t it?
> 
> No, it only affects writing new checksums; any existing checksums are
> still checked.
From the report I assume this must be the case for meta data, but it
does not stand true for data. I was just looking at
btrfs_readpage_end_io_hook for some other reason and realized it skips
checksum checking when the file system is mounted nodatasum.

-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik

2011-May-04 14:39 UTC

head link

Re: btrfs csum failed

On 05/03/2011 08:44 PM, Martin Schitter wrote:> Am 2011-05-04 02:28, schrieb Josef Bacik:
>> Wait why are you running with btrfs in production?
>
> do you know a better alternative for continuous snapshots? :)
>
> it works surprisingly well since more than a year.
> well the performance could be better for vm-image-hosting but it works.
>
> we used cache=''writeback'' for a long time but now all
virtual instances
> have set cache=''none''
>
>> What OS is in this vm image?
>
> 2.6.30-bpo.1-amd64 with virtio-driver
>
> could you give me some advice how to debug/report this specific problem
> more precise?
>
So there is a problem with DIO, since userspace can modify pages in 
flight we will end up with the wrong checksums since the data can change 
in flight.  I was trying to come up with a way to fix this but there''s 
really nothing to be done at the moment other than turn off checksumming 
per file.  Windows was particularly bad about this, but I hadn''t seen
it
with Linux guests (even though it should still be happening).  So I''ll 
come up with something to turn off checksumming per file to get around 
this for now, I''ll try and get to that soonish.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin Schitter

2011-May-04 14:42 UTC

head link

Re: btrfs csum failed

Am 2011-05-04 15:23, schrieb Edward Ned Harvey:>> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
>> owner@vger.kernel.org] On Behalf Of Martin Schitter
>>
>> well -- i am doing a backup of all images every night. this
>> process should work like a simple "scrub" because all data
(and its
>> checksumes) will be read.
>
> Sorry, not correct.  When you read all the data using something in
> user-land, the OS only needs to read one side of the data.  It can
> accelerate by staggering the read requests across multiple disks.  So
> some sectors remain unread on some disks.
>
> When you scrub, it reads all the data from all the redundant copies
> (mirrored or raid) on all the individual disks in the raid set.
ok -- i see -- you''re right!

i know, there a some befits in the way btrfs and zfs implement RAID / 
multiply disk usage and checksumming, but i a also want to stay on the 
save side, when it comes to real practical problems. so i decided to use 
''classical'' linux software RAID-1 as the base layer.
that''s a very old
fashioned solution, but it usually simply works... and you can change a 
broken disk without any respect of the used filesystem(s). in general i 
try to use btrfs only on account of its snapshot features in a very 
simple way.

it looks very strange to me, that i don''t see any SMART warnings on the
harddisks or errors on other filsystems on the same raid-array. there 
was also no reboot, power-failure or similar when the corruption 
suddenly appeared. so i think, a btrfs bug would be the most evident 
explanation.

martin

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2011-May-04 18:10 UTC

head link

Re: btrfs csum failed

Excerpts from Martin Schitter''s message of 2011-05-04 10:42:51
-0400:> Am 2011-05-04 15:23, schrieb Edward Ned Harvey:
> >> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> >> owner@vger.kernel.org] On Behalf Of Martin Schitter
> >>
> >> well -- i am doing a backup of all images every night. this
> >> process should work like a simple "scrub" because all
data (and its
> >> checksumes) will be read.
> >
> > Sorry, not correct.  When you read all the data using something in
> > user-land, the OS only needs to read one side of the data.  It can
> > accelerate by staggering the read requests across multiple disks.  So
> > some sectors remain unread on some disks.
> >
> > When you scrub, it reads all the data from all the redundant copies
> > (mirrored or raid) on all the individual disks in the raid set.
> 
> ok -- i see -- you''re right!
> 
> i know, there a some befits in the way btrfs and zfs implement RAID / 
> multiply disk usage and checksumming, but i a also want to stay on the 
> save side, when it comes to real practical problems. so i decided to use 
> ''classical'' linux software RAID-1 as the base layer.
that''s a very old
> fashioned solution, but it usually simply works... and you can change a 
> broken disk without any respect of the used filesystem(s). in general i 
> try to use btrfs only on account of its snapshot features in a very 
> simple way.
> 
> it looks very strange to me, that i don''t see any SMART warnings
on the
> harddisks or errors on other filsystems on the same raid-array. there 
> was also no reboot, power-failure or similar when the corruption 
> suddenly appeared. so i think, a btrfs bug would be the most evident 
> explanation.
That''s the bad news, it can be very hard to tell.   The disk could be
returning garbage or btrfs would be messing up the csums.

The btrfs unstable tree does have one fix that is related to O_DIRECT
and kvm, but we''ve only ever seen it happen with a windows guest.  This
doesn''t mean it is impossible for a linux guest to trigger it though.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - May 2011 - btrfs csum failed

btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

RE: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed

Re: btrfs csum failed