thr3ads.net - Btrfs devel - btrfs csum failed on git .pack file [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Markus Trippelsdorf

2009-Sep-07 20:35 UTC

btrfs csum failed on git .pack file

Just got this error today in my dmesg:
btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private 43905798

linux % find . -inum 1483065
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack

It''s the main pack file from my git linux kernel tree:

linux % ls -l ./.git/objects/pack/
total 562848
-rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49
pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
-rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54
pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
-rw-r--r-- 1 markus markus    730332 2008-11-29 19:49
pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
-r--r--r-- 1 markus markus  36061684 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
-r--r--r-- 1 markus markus 335202742 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
-rw------- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER

I''m running the latest git kernel and I''ve been using btrfs as
my root
fs for the last few weeks without problems so far.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2009-Sep-08 20:00 UTC

head link

Re: btrfs csum failed on git .pack file

On Mon, Sep 07 2009, Markus Trippelsdorf wrote:> Just got this error today in my dmesg:
> btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private
43905798
> 
> linux % find . -inum 1483065
> ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> 
> It''s the main pack file from my git linux kernel tree:
> 
> linux % ls -l ./.git/objects/pack/
> total 562848
> -rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49
pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
> -rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54
pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
> -rw-r--r-- 1 markus markus    730332 2008-11-29 19:49
pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
> -r--r--r-- 1 markus markus  36061684 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
> -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> -rw------- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER
> 
> I''m running the latest git kernel and I''ve been using
btrfs as my root
> fs for the last few weeks without problems so far.
Hmm, I ran into something very similar. Care to check what the corrupted
block of data looks like (and how big it is)?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Trippelsdorf

2009-Sep-08 20:22 UTC

head link

Re: btrfs csum failed on git .pack file

On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe
wrote:> On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > Just got this error today in my dmesg:
> > btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private
43905798
> > 
> > linux % find . -inum 1483065
> > ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > 
> > It''s the main pack file from my git linux kernel tree:
> > 
> > linux % ls -l ./.git/objects/pack/
> > total 562848
> > -rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49
pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
> > -rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54
pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
> > -rw-r--r-- 1 markus markus    730332 2008-11-29 19:49
pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
> > -r--r--r-- 1 markus markus  36061684 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
> > -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > -rw------- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER
> > 
> > I''m running the latest git kernel and I''ve been
using btrfs as my root
> > fs for the last few weeks without problems so far.
> 
> Hmm, I ran into something very similar. Care to check what the corrupted
> block of data looks like (and how big it is)?
I''ve already deleted the file in question unfortunately.
On IRC Chris decided that either bad RAM or a harddrive error was the
most likely reason for this chechsum mismatch.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2009-Sep-08 20:32 UTC

head link

Re: btrfs csum failed on git .pack file

On Tue, Sep 08 2009, Markus Trippelsdorf wrote:> On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > Just got this error today in my dmesg:
> > > btrfs csum failed ino 1483065 off 158482432 csum 4283543305
private 43905798
> > > 
> > > linux % find . -inum 1483065
> > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > 
> > > It''s the main pack file from my git linux kernel tree:
> > > 
> > > linux % ls -l ./.git/objects/pack/
> > > total 562848
> > > -rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49
pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
> > > -rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54
pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
> > > -rw-r--r-- 1 markus markus    730332 2008-11-29 19:49
pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
> > > -r--r--r-- 1 markus markus  36061684 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
> > > -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > -rw------- 1 markus markus 158457856 2009-09-07 22:15
tmp_pack_OUdxER
> > > 
> > > I''m running the latest git kernel and I''ve been
using btrfs as my root
> > > fs for the last few weeks without problems so far.
> > 
> > Hmm, I ran into something very similar. Care to check what the
corrupted
> > block of data looks like (and how big it is)?
> 
> I''ve already deleted the file in question unfortunately.
> On IRC Chris decided that either bad RAM or a harddrive error was the
> most likely reason for this chechsum mismatch.
Darn, that''s too bad. The corruption issue I had was also in a git pack
file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
in the file, and I blamed it on the (cheap) SSD drive that hosted the
local git repo. It''s still the most likely explanation given the nature
of the problem, however it would have been really interesting to see
what corruption you had.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Torcz

2009-Sep-08 20:55 UTC

head link

Re: btrfs csum failed on git .pack file

On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe
wrote:> On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
> > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > Just got this error today in my dmesg:
> > > > btrfs csum failed ino 1483065 off 158482432 csum 4283543305
private 43905798
> > 
> > I''ve already deleted the file in question unfortunately.
> > On IRC Chris decided that either bad RAM or a harddrive error was the
> > most likely reason for this chechsum mismatch.
> 
> Darn, that''s too bad. The corruption issue I had was also in a git
pack
> file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
> in the file, and I blamed it on the (cheap) SSD drive that hosted the
> local git repo. It''s still the most likely explanation given the
nature
> of the problem, however it would have been really interesting to see
> what corruption you had.
  BTW, I had some similar issue. One file on btrfs had csum failed.
I''ve copied it using dd_rescue and, suprise, reading new file yields
this error also. How to retrieve block failing csum check from btrfs volume?

-- 
Tomasz Torcz                                                       72->|  
80->|
xmpp: zdzichubg@chrome.pl                                          72->|  
80->|

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tracy Reed

2009-Sep-08 21:53 UTC

head link

Re: btrfs csum failed on git .pack file

On Tue, Sep 08, 2009 at 10:22:11PM +0200, Markus Trippelsdorf spake
thusly:> I''ve already deleted the file in question unfortunately.
> On IRC Chris decided that either bad RAM or a harddrive error was the
> most likely reason for this chechsum mismatch.
Which raises an interesting point: I know reiserfs had its problems
but it also turned up a lot of machines with bad RAM which contributed
to giving the fs a bad name. With more and more complicated and memory
consuming filesystem datastructures being stored in RAM, larger volumes
of RAM in systems, and RAM not really getting any more reliable will
we ever see a day where something like btrfs is not recommended for
use in any machine that doesn''t have ECC? Does the filesystem do
anything to protect itself from bad hardware?

-- 
Tracy Reed
http://tracyreed.org

Markus Trippelsdorf

2009-Sep-09 06:55 UTC

head link

Re: btrfs csum failed on git .pack file

On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe
wrote:> On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
> > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > Just got this error today in my dmesg:
> > > > btrfs csum failed ino 1483065 off 158482432 csum 4283543305
private 43905798
> > > > 
> > > > linux % find . -inum 1483065
> > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > 
> > > > It''s the main pack file from my git linux kernel
tree:
> > > > 
> > > 
> > > Hmm, I ran into something very similar. Care to check what the
corrupted
> > > block of data looks like (and how big it is)?
> > 
> > I''ve already deleted the file in question unfortunately.
> > On IRC Chris decided that either bad RAM or a harddrive error was the
> > most likely reason for this chechsum mismatch.
> 
> Darn, that''s too bad. The corruption issue I had was also in a git
pack
> file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
> in the file, and I blamed it on the (cheap) SSD drive that hosted the
> local git repo. It''s still the most likely explanation given the
nature
> of the problem, however it would have been really interesting to see
> what corruption you had.
If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
be using the same hardware (30GB Vertex in my case). 
What a strange coincidence that it affected git pack files in both cases.
It''s almost too improbable...

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2009-Sep-09 07:01 UTC

head link

Re: btrfs csum failed on git .pack file

On Wed, Sep 09 2009, Markus Trippelsdorf wrote:> On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
> > On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
> > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > > Just got this error today in my dmesg:
> > > > > btrfs csum failed ino 1483065 off 158482432 csum
4283543305 private 43905798
> > > > > 
> > > > > linux % find . -inum 1483065
> > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > > 
> > > > > It''s the main pack file from my git linux
kernel tree:
> > > > > 
> > > > 
> > > > Hmm, I ran into something very similar. Care to check what
the corrupted
> > > > block of data looks like (and how big it is)?
> > > 
> > > I''ve already deleted the file in question unfortunately.
> > > On IRC Chris decided that either bad RAM or a harddrive error was
the
> > > most likely reason for this chechsum mismatch.
> > 
> > Darn, that''s too bad. The corruption issue I had was also in
a git pack
> > file. It was fine one day, bad the next. Turned out to be 16kb of 0xff
> > in the file, and I blamed it on the (cheap) SSD drive that hosted the
> > local git repo. It''s still the most likely explanation given
the nature
> > of the problem, however it would have been really interesting to see
> > what corruption you had.
> 
> If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
> be using the same hardware (30GB Vertex in my case). 
Spooky, yes indeed that''s the very same drive I''m using. Also
see my
postings on this very issue here, top two entries:

http://axboe.livejournal.com/

So that pretty much looks like it reaffirms some of my suspicions. Is
the drive in a laptop that you suspend and resume?
> What a strange coincidence that it affected git pack files in both cases.
> It''s almost too improbable...
Probably more than a coincidence I think, the question is what though...

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Trippelsdorf

2009-Sep-09 07:23 UTC

head link

Re: btrfs csum failed on git .pack file

On Wed, Sep 09, 2009 at 09:01:41AM +0200, Jens Axboe
wrote:> On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
> > On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
> > > On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
> > > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > > > Just got this error today in my dmesg:
> > > > > > btrfs csum failed ino 1483065 off 158482432 csum
4283543305 private 43905798
> > > > > > 
> > > > > > linux % find . -inum 1483065
> > > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > > > 
> > > > > > It''s the main pack file from my git linux
kernel tree:
> > > > > > 
> > > > > 
> > > > > Hmm, I ran into something very similar. Care to check
what the corrupted
> > > > > block of data looks like (and how big it is)?
> > > > 
> > > > I''ve already deleted the file in question
unfortunately.
> > > > On IRC Chris decided that either bad RAM or a harddrive
error was the
> > > > most likely reason for this chechsum mismatch.
> > > 
> > > Darn, that''s too bad. The corruption issue I had was
also in a git pack
> > > file. It was fine one day, bad the next. Turned out to be 16kb of
0xff
> > > in the file, and I blamed it on the (cheap) SSD drive that hosted
the
> > > local git repo. It''s still the most likely explanation
given the nature
> > > of the problem, however it would have been really interesting to
see
> > > what corruption you had.
> > 
> > If by cheap SSD drive you mean an Indilinx Barefoot based one, we
might
> > be using the same hardware (30GB Vertex in my case). 
> 
> Spooky, yes indeed that''s the very same drive I''m using.
Also see my
> postings on this very issue here, top two entries:
> 
> http://axboe.livejournal.com/
> 
> So that pretty much looks like it reaffirms some of my suspicions. Is
> the drive in a laptop that you suspend and resume?
No. I use it in my workstation, that I never switch off normally.
> > What a strange coincidence that it affected git pack files in both
cases.
> > It''s almost too improbable...
> 
> Probably more than a coincidence I think, the question is what though...
If it really was an SSD error, then it should happen randomly, messing up
random files. But (contrary to your experience) I never had any issues with 
this SSD until this single failed checksum.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Gregory Maxwell

2009-Sep-09 07:28 UTC

head link

Re: btrfs csum failed on git .pack file

On Tue, Sep 8, 2009 at 5:53 PM, Tracy Reed<treed@ultraviolet.org>
wrote:> On Tue, Sep 08, 2009 at 10:22:11PM +0200, Markus Trippelsdorf spake thusly:
>> I''ve already deleted the file in question unfortunately.
>> On IRC Chris decided that either bad RAM or a harddrive error was the
>> most likely reason for this chechsum mismatch.
>
> Which raises an interesting point: I know reiserfs had its problems
> but it also turned up a lot of machines with bad RAM which contributed
> to giving the fs a bad name. With more and more complicated and memory
> consuming filesystem datastructures being stored in RAM, larger volumes
> of RAM in systems, and RAM not really getting any more reliable will
> we ever see a day where something like btrfs is not recommended for
> use in any machine that doesn''t have ECC? Does the filesystem do
> anything to protect itself from bad hardware?
Such as the checksums that started this thread?  That *is* a
protection against bad hardware feature.

A large part of reiserfs'' problem was a religious degree of "panic
on
inconsistency!" so failures of identical severity that might slip by
unnoticed on other file systems were more likely to be noticed. Sadly
shooting the messenger is still a popular sport and the qualities of
BTRFS which make it more bad hardware resistant may well give it a bad
reputation.  I don''t know that there is much that can be done about
that.

On Wed, Sep 9, 2009 at 3:01 AM, Jens Axboe<jens.axboe@oracle.com>
wrote:> On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
>> What a strange coincidence that it affected git pack files in both
cases.
>> It''s almost too improbable...
>
> Probably more than a coincidence I think, the question is what though...
Could this have been the same data in both cases?  Either way— if the
hardware was randomly corrupting high entropy blocks with very-low
probability it''s quite possible that you two would have seen it while
anyone else who did chalked it up to some other problem.

I''ve encountered telecom equipment where a particular packet data
interacted poorly with the clock recovery hardware. "Any file
transfers fine, except for this one. This one stalls and never
finishes, but if I unzip it. it''s fine!". Ugh. or it could be some
busted ECC that always ''corrects'' a particular class of
perfectly
valid blocks to something wrong... or it could be a million other
things. At the end of the day you just need to accept that the
hardware is junk. Black list it, give the vendor the best black eye
that you can, and move on.

I can only expect that this is going to get worse over time. I really
wish that it had become the norm for drive makers to expose an
optional raw interface to the flash. Alas, we''re stuck with the
equivalent of running Linux on a hypervisor provided by Microsoft...
except the SSD makers are less experienced.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2009-Sep-09 07:29 UTC

head link

Re: btrfs csum failed on git .pack file

On Wed, Sep 09 2009, Markus Trippelsdorf wrote:> On Wed, Sep 09, 2009 at 09:01:41AM +0200, Jens Axboe wrote:
> > On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
> > > On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
> > > > On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
> > > > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe
wrote:
> > > > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > > > > Just got this error today in my dmesg:
> > > > > > > btrfs csum failed ino 1483065 off 158482432
csum 4283543305 private 43905798
> > > > > > > 
> > > > > > > linux % find . -inum 1483065
> > > > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > > > > 
> > > > > > > It''s the main pack file from my git
linux kernel tree:
> > > > > > > 
> > > > > > 
> > > > > > Hmm, I ran into something very similar. Care to
check what the corrupted
> > > > > > block of data looks like (and how big it is)?
> > > > > 
> > > > > I''ve already deleted the file in question
unfortunately.
> > > > > On IRC Chris decided that either bad RAM or a harddrive
error was the
> > > > > most likely reason for this chechsum mismatch.
> > > > 
> > > > Darn, that''s too bad. The corruption issue I had
was also in a git pack
> > > > file. It was fine one day, bad the next. Turned out to be
16kb of 0xff
> > > > in the file, and I blamed it on the (cheap) SSD drive that
hosted the
> > > > local git repo. It''s still the most likely
explanation given the nature
> > > > of the problem, however it would have been really
interesting to see
> > > > what corruption you had.
> > > 
> > > If by cheap SSD drive you mean an Indilinx Barefoot based one, we
might
> > > be using the same hardware (30GB Vertex in my case). 
> > 
> > Spooky, yes indeed that''s the very same drive I''m
using. Also see my
> > postings on this very issue here, top two entries:
> > 
> > http://axboe.livejournal.com/
> > 
> > So that pretty much looks like it reaffirms some of my suspicions. Is
> > the drive in a laptop that you suspend and resume?
> 
> No. I use it in my workstation, that I never switch off normally.
OK, so we can rule out any interactions between suspending and resuming
the drive. That''s at least something.
> > > What a strange coincidence that it affected git pack files in
both cases.
> > > It''s almost too improbable...
> > 
> > Probably more than a coincidence I think, the question is what
though...
> 
> If it really was an SSD error, then it should happen randomly, messing up
> random files. But (contrary to your experience) I never had any issues with
> this SSD until this single failed checksum.
Not necessarily, they may be some pattern to how the pack files are
accessed (that propagates through to the drive). The fact is, 0xff is an
extremely weird piece of corruption that just reeks of bad flash blocks.
It''s almost impossible that it is a software error. If it was all
zeroes, or a bit flip, the likely causes would be very different.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2009-Sep-09 08:18 UTC

head link

Re: btrfs csum failed on git .pack file

On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboe<jens.axboe@oracle.com>
wrote:> On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
>> On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
>> > On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
>> > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
>> > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
>> > > > > Just got this error today in my dmesg:
>> > > > > btrfs csum failed ino 1483065 off 158482432 csum
4283543305 private 43905798
>> > > > >
>> > > > > linux % find . -inum 1483065
>> > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
>> > > > >
>> > > > > It''s the main pack file from my git linux
kernel tree:
>> > > > >
>> > > >
>> > > > Hmm, I ran into something very similar. Care to check
what the corrupted
>> > > > block of data looks like (and how big it is)?
>> > >
>> > > I''ve already deleted the file in question
unfortunately.
>> > > On IRC Chris decided that either bad RAM or a harddrive error
was the
>> > > most likely reason for this chechsum mismatch.
>> >
>> > Darn, that''s too bad. The corruption issue I had was also
in a git pack
>> > file. It was fine one day, bad the next. Turned out to be 16kb of
0xff
>> > in the file, and I blamed it on the (cheap) SSD drive that hosted
the
>> > local git repo. It''s still the most likely explanation
given the nature
>> > of the problem, however it would have been really interesting to
see
>> > what corruption you had.
>>
>> If by cheap SSD drive you mean an Indilinx Barefoot based one, we might
>> be using the same hardware (30GB Vertex in my case).
>
> Spooky, yes indeed that''s the very same drive I''m using.
Also see my
> postings on this very issue here, top two entries:
>
> http://axboe.livejournal.com/
>
> So that pretty much looks like it reaffirms some of my suspicions. Is
> the drive in a laptop that you suspend and resume?
If you''re on firmware < 1.30, the changlog includes some fixes which
may be relevant, eg if "block 0" is relative, or you''re
suspending/resuming:

- Race condition occurred during soft reset handler
- If read fail occurs during reading stamp information, firmware
corrupted block 0.
- Power off recovery had bug in certain circumstances

http://www.ocztechnologyforum.com/forum/showthread.php?t=57516
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2009-Sep-09 08:26 UTC

head link

Re: btrfs csum failed on git .pack file

On Wed, Sep 09 2009, Daniel J Blueman wrote:> On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboe<jens.axboe@oracle.com>
wrote:
> > On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
> >> On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
> >> > On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
> >> > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe
wrote:
> >> > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> >> > > > > Just got this error today in my dmesg:
> >> > > > > btrfs csum failed ino 1483065 off 158482432
csum 4283543305 private 43905798
> >> > > > >
> >> > > > > linux % find . -inum 1483065
> >> > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> >> > > > >
> >> > > > > It''s the main pack file from my git
linux kernel tree:
> >> > > > >
> >> > > >
> >> > > > Hmm, I ran into something very similar. Care to
check what the corrupted
> >> > > > block of data looks like (and how big it is)?
> >> > >
> >> > > I''ve already deleted the file in question
unfortunately.
> >> > > On IRC Chris decided that either bad RAM or a harddrive
error was the
> >> > > most likely reason for this chechsum mismatch.
> >> >
> >> > Darn, that''s too bad. The corruption issue I had was
also in a git pack
> >> > file. It was fine one day, bad the next. Turned out to be
16kb of 0xff
> >> > in the file, and I blamed it on the (cheap) SSD drive that
hosted the
> >> > local git repo. It''s still the most likely
explanation given the nature
> >> > of the problem, however it would have been really interesting
to see
> >> > what corruption you had.
> >>
> >> If by cheap SSD drive you mean an Indilinx Barefoot based one, we
might
> >> be using the same hardware (30GB Vertex in my case).
> >
> > Spooky, yes indeed that''s the very same drive I''m
using. Also see my
> > postings on this very issue here, top two entries:
> >
> > http://axboe.livejournal.com/
> >
> > So that pretty much looks like it reaffirms some of my suspicions. Is
> > the drive in a laptop that you suspend and resume?
> 
> If you''re on firmware < 1.30, the changlog includes some fixes
which
> may be relevant, eg if "block 0" is relative, or you''re
> suspending/resuming:
> 
> - Race condition occurred during soft reset handler
> - If read fail occurs during reading stamp information, firmware
> corrupted block 0.
> - Power off recovery had bug in certain circumstances
> 
> http://www.ocztechnologyforum.com/forum/showthread.php?t=57516
The issue is pretty much moot at this point, since OCZ support were not
really interested in providing any sort of real technical support to
find out what really caused this issue. My main worry was reliability of
these cheaper SSD drives, and that worry is still not resolved. If you
read the blog entries, I do comment on the apparently scary basic bugs
taht are still being fixed on the Indilinx controllers. I do expect some
basic level of data integrity from a consumer product and at least some
interest in resolving weird corruption issues if things go wrong. Since
OCZ cannot provide anything like that, I have a hard time recommending
these drives for anything but very casual use. Fast, cheap, reliable.
Pick any two.

My drive was running 1.10 at the time of the problem.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2009-Sep-09 08:37 UTC

head link

Re: btrfs csum failed on git .pack file

On Wed, Sep 9, 2009 at 9:26 AM, Jens Axboe<jens.axboe@oracle.com>
wrote:> On Wed, Sep 09 2009, Daniel J Blueman wrote:
>> On Wed, Sep 9, 2009 at 8:01 AM, Jens Axboe<jens.axboe@oracle.com>
wrote:
>> > On Wed, Sep 09 2009, Markus Trippelsdorf wrote:
>> >> On Tue, Sep 08, 2009 at 10:32:14PM +0200, Jens Axboe wrote:
>> >> > On Tue, Sep 08 2009, Markus Trippelsdorf wrote:
>> >> > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe
wrote:
>> >> > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
>> >> > > > > Just got this error today in my dmesg:
>> >> > > > > btrfs csum failed ino 1483065 off
158482432 csum 4283543305 private 43905798
>> >> > > > >
>> >> > > > > linux % find . -inum 1483065
>> >> > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
>> >> > > > >
>> >> > > > > It''s the main pack file from my
git linux kernel tree:
>> >> > > > >
>> >> > > >
>> >> > > > Hmm, I ran into something very similar. Care to
check what the corrupted
>> >> > > > block of data looks like (and how big it is)?
>> >> > >
>> >> > > I''ve already deleted the file in question
unfortunately.
>> >> > > On IRC Chris decided that either bad RAM or a
harddrive error was the
>> >> > > most likely reason for this chechsum mismatch.
>> >> >
>> >> > Darn, that''s too bad. The corruption issue I had
was also in a git pack
>> >> > file. It was fine one day, bad the next. Turned out to be
16kb of 0xff
>> >> > in the file, and I blamed it on the (cheap) SSD drive
that hosted the
>> >> > local git repo. It''s still the most likely
explanation given the nature
>> >> > of the problem, however it would have been really
interesting to see
>> >> > what corruption you had.
>> >>
>> >> If by cheap SSD drive you mean an Indilinx Barefoot based one,
we might
>> >> be using the same hardware (30GB Vertex in my case).
>> >
>> > Spooky, yes indeed that''s the very same drive
I''m using. Also see my
>> > postings on this very issue here, top two entries:
>> >
>> > http://axboe.livejournal.com/
>> >
>> > So that pretty much looks like it reaffirms some of my suspicions.
Is
>> > the drive in a laptop that you suspend and resume?
>>
>> If you''re on firmware < 1.30, the changlog includes some
fixes which
>> may be relevant, eg if "block 0" is relative, or
you''re
>> suspending/resuming:
>>
>> - Race condition occurred during soft reset handler
>> - If read fail occurs during reading stamp information, firmware
>> corrupted block 0.
>> - Power off recovery had bug in certain circumstances
>>
>> http://www.ocztechnologyforum.com/forum/showthread.php?t=57516
>
> The issue is pretty much moot at this point, since OCZ support were not
> really interested in providing any sort of real technical support to
> find out what really caused this issue. My main worry was reliability of
> these cheaper SSD drives, and that worry is still not resolved. If you
> read the blog entries, I do comment on the apparently scary basic bugs
> taht are still being fixed on the Indilinx controllers. I do expect some
> basic level of data integrity from a consumer product and at least some
> interest in resolving weird corruption issues if things go wrong. Since
> OCZ cannot provide anything like that, I have a hard time recommending
> these drives for anything but very casual use. Fast, cheap, reliable.
> Pick any two.
>
> My drive was running 1.10 at the time of the problem.
It looks like we need a small tool which performs patterned block I/O
to the device, updating a checksum as it goes, and performing
integrity sweeps at intervals, lower level than fsx. It must be
trusted or not.

I had a problem like this with nVidia CK804/MCP55 chipsets corrupting
data under a triple-edge case workload.
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2009-Sep-09 11:19 UTC

head link

Re: btrfs csum failed on git .pack file

On Wed, Sep 09, 2009 at 09:37:42AM +0100, Daniel J Blueman
wrote:> >>
> >> http://www.ocztechnologyforum.com/forum/showthread.php?t=57516
> >
> > The issue is pretty much moot at this point, since OCZ support were
not
> > really interested in providing any sort of real technical support to
> > find out what really caused this issue. My main worry was reliability
of
> > these cheaper SSD drives, and that worry is still not resolved. If you
> > read the blog entries, I do comment on the apparently scary basic bugs
> > taht are still being fixed on the Indilinx controllers. I do expect
some
> > basic level of data integrity from a consumer product and at least
some
> > interest in resolving weird corruption issues if things go wrong.
Since
> > OCZ cannot provide anything like that, I have a hard time recommending
> > these drives for anything but very casual use. Fast, cheap, reliable.
> > Pick any two.
> >
> > My drive was running 1.10 at the time of the problem.
> 
> It looks like we need a small tool which performs patterned block I/O
> to the device, updating a checksum as it goes, and performing
> integrity sweeps at intervals, lower level than fsx. It must be
> trusted or not.
> 
> I had a problem like this with nVidia CK804/MCP55 chipsets corrupting
> data under a triple-edge case workload.
Well, just use git ;)  Apply a bunch of patches (say the mm tree) with
guilt and repack in a loop.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Oliver Mattos

2009-Sep-09 21:01 UTC

head link

Re: btrfs csum failed on git .pack file

>> What a strange coincidence that it affected git pack files in both
cases.
>> It''s almost too improbable...
>Probably more than a coincidence I think, the question is what though...
Some SSD drives (or rather the cheap wear levelling controllers in things
like USB sticks) have firmware which tries to recognise certain data
structures of common filesystems (like FAT and NTFS), and uses information
in those data structures to optimise the allocation and erasure of blocks
(for example the free space linked list in FAT).  If the data you were
saving to the disk was similar to one of those data structures, you
might''ve
triggered one of those algorithms, which would cause data corruption.  This
is common in high performance usb sticks because they want to pre-erase
blocks on file deletion for operating systems not supporting SCSI TRIM - I
imagine the same technology might carry across to cheap SSD''s.

Not much BTRFS can do about it though.  If the piece of data that triggers
the bug could be identified, workarounds could possibly be introduced for
the particular buggy controllers.

Oliver Mattos

(resent as I emailled wrong recipients before) 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bryan Østergaard

2009-Sep-10 10:49 UTC

head link

Re: btrfs csum failed on git .pack file

On Wed, Sep 9, 2009 at 11:01 PM, Oliver Mattos
<oliver.mattos08@imperial.ac.uk> wrote:>
>>> What a strange coincidence that it affected git pack files in both
cases.
>>> It''s almost too improbable...
>I had similar problems with a broken git repository about two weeks
ago. This was on a regular laptop harddrive that''s never reported any
errors.

Unfortunately I rm''ed the repository and cloned it again so I
can''t
check exactly what caused the corruption. Interestingly I''ve just
discovered a broken tar.bz2 file that shows similar symptoms as what''s
been described here earlier.

The first (and by far largest) chunk of the file consists entirely of
0x01 bytes followed by a smaller chunk that appears to be a PNG file
and then arch/sparc/include/asm/fhc.h from the linux kernel. After
this I have a small chunk of 0x00 bytes followed by
arch/sparc/include/asm/floppy.h.

This pattern is repeated several times with different include files
from the kernel sources and the file ends with a small chunk of 0x01
bytes again.

The harddisk in question is:
=== START OF INFORMATION SECTION ==Model Family:     Fujitsu MHV series
Device Model:     FUJITSU MHV2080BH
Serial Number:    NW05T6425FRY
Firmware Version: 00840028
User Capacity:    80,025,280,000 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:    Thu Sep 10 12:40:10 2009 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

As already mentioned it''s never reported any errors and I also
haven''t
seen any problems like this before when using ext3 or ext4. The broken
file is available at http://omploader.org/vMmJtbg if that''s any help.

Regards,
Bryan Østergaard
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Trippelsdorf

2009-Sep-17 05:05 UTC

head link

Re: btrfs csum failed on git .pack file

On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe
wrote:> On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > Just got this error today in my dmesg:
> > btrfs csum failed ino 1483065 off 158482432 csum 4283543305 private
43905798
> > 
> > linux % find . -inum 1483065
> > ./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > 
> > It''s the main pack file from my git linux kernel tree:
> > 
> > linux % ls -l ./.git/objects/pack/
> > total 562848
> > -rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49
pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
> > -rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54
pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
> > -rw-r--r-- 1 markus markus    730332 2008-11-29 19:49
pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
> > -r--r--r-- 1 markus markus  36061684 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
> > -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > -rw------- 1 markus markus 158457856 2009-09-07 22:15 tmp_pack_OUdxER
> > 
> > I''m running the latest git kernel and I''ve been
using btrfs as my root
> > fs for the last few weeks without problems so far.
> 
> Hmm, I ran into something very similar. Care to check what the corrupted
> block of data looks like (and how big it is)?
I''ve hit the same problem again today:

btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private 1660028275

The file in question is:
./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack

I can''t read the file directly, because of the csum mismatch:

08F3FF80   58 C8 18 3D  36 58 B0 B0  CC 35 3A 3D  72 95 8E 71  9E AA 34 14  0B
C4 B4 41  5F E0 6F 66  03 B9 0B 79  X..=6X...5:=r..q..4....A_.of...y
08F3FFA0   9C 94 6B 15  F9 CA 93 AC  C4 34 6E 2C  FA 4C 99 31  55 35 36 3B  46
04 71 7E  2E 66 21 1C  89 FC 1B 92  ..k......4n,.L.1U56;F.q~.f!.....
08F3FFC0   90 FE B2 4D  0D 28 A9 3F  CC D8 B1 9A  38 28 51 86  10 69 88 CA  46
A6 07 FE  EC 0F 2B 7E  81 65 30 86  ...M.(.?....8(Q..i..F.....+~.e0.
08F3FFE0   8E 2A 37 E9  88 CC 6F 1A  8D CF 82 7C  9D 43 A5 B1  FF 2C 62 72  2E
06 E6 44  44 02 45 03  BC 12 EA 3B  .*7...o....|.C...,br...DD.E....;
08F40000

where 0x8F40000=150208512.

# hdparm --fibmap
/usr/src/linux/.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
0,13: device not found in /dev

does not work unfortunately. How do I get the LBAs of the file instead?

I did a hex-search on the raw devive with hexedit for 
"90FEB24D0D28A93FCCD8B19A38285186106988CA46A607FEEC0F2B7E81653086",
but
there is no obvious corruption in the vicinity of the few places that are
found.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2009-Sep-17 06:44 UTC

head link

Re: btrfs csum failed on git .pack file

On Thu, Sep 17 2009, Markus Trippelsdorf wrote:> On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > Just got this error today in my dmesg:
> > > btrfs csum failed ino 1483065 off 158482432 csum 4283543305
private 43905798
> > > 
> > > linux % find . -inum 1483065
> > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > 
> > > It''s the main pack file from my git linux kernel tree:
> > > 
> > > linux % ls -l ./.git/objects/pack/
> > > total 562848
> > > -rw-r--r-- 1 markus markus   1891324 2008-11-29 19:49
pack-011b43fa6956667db5e67fba859e40cb4b154226.idx
> > > -rw-r--r-- 1 markus markus  44002938 2008-11-29 19:54
pack-011b43fa6956667db5e67fba859e40cb4b154226.pack.temp
> > > -rw-r--r-- 1 markus markus    730332 2008-11-29 19:49
pack-67be92b3fab3dab175683582dab0b719517e55a5.idx
> > > -r--r--r-- 1 markus markus  36061684 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.idx
> > > -r--r--r-- 1 markus markus 335202742 2009-09-06 21:48
pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > -rw------- 1 markus markus 158457856 2009-09-07 22:15
tmp_pack_OUdxER
> > > 
> > > I''m running the latest git kernel and I''ve been
using btrfs as my root
> > > fs for the last few weeks without problems so far.
> > 
> > Hmm, I ran into something very similar. Care to check what the
corrupted
> > block of data looks like (and how big it is)?
> 
> I''ve hit the same problem again today:
> 
> btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private
1660028275
> 
> The file in question is:
> ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
> 
> I can''t read the file directly, because of the csum mismatch:
Chris, is there a way to force reading the file? Seems like that would
be a very handy feature.

Markus, not sure if that works, but you could always try and remount
with data checksumming disabled.

mount /dev/fooX -o remount,rw,nodatasum

should do the trick.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Trippelsdorf

2009-Sep-17 09:04 UTC

head link

Re: btrfs csum failed on git .pack file

On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe
wrote:> On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
> > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > Just got this error today in my dmesg:
> > > > btrfs csum failed ino 1483065 off 158482432 csum 4283543305
private 43905798
> > > > 
> > > > linux % find . -inum 1483065
> > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > 
> > > > It''s the main pack file from my git linux kernel
tree:
> > > > 
> > > 
> > > Hmm, I ran into something very similar. Care to check what the
corrupted
> > > block of data looks like (and how big it is)?
> > 
> > I''ve hit the same problem again today:
> > 
> > btrfs csum failed ino 1826333 off 150208512 csum 4148434891 private
1660028275
> > 
> > The file in question is:
> > ./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
> > 
> > I can''t read the file directly, because of the csum mismatch:
> 
> Chris, is there a way to force reading the file? Seems like that would
> be a very handy feature.
> 
> Markus, not sure if that works, but you could always try and remount
> with data checksumming disabled.
> 
> mount /dev/fooX -o remount,rw,nodatasum
> 
> should do the trick.
That doesn''t work unfortunately, btrfs still calculates and compares
the
checksums (it won''t write new ones I guess).

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe

2009-Sep-17 09:05 UTC

head link

Re: btrfs csum failed on git .pack file

On Thu, Sep 17 2009, Markus Trippelsdorf wrote:> On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote:
> > On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
> > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > > Just got this error today in my dmesg:
> > > > > btrfs csum failed ino 1483065 off 158482432 csum
4283543305 private 43905798
> > > > > 
> > > > > linux % find . -inum 1483065
> > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > > 
> > > > > It''s the main pack file from my git linux
kernel tree:
> > > > > 
> > > > 
> > > > Hmm, I ran into something very similar. Care to check what
the corrupted
> > > > block of data looks like (and how big it is)?
> > > 
> > > I''ve hit the same problem again today:
> > > 
> > > btrfs csum failed ino 1826333 off 150208512 csum 4148434891
private 1660028275
> > > 
> > > The file in question is:
> > >
./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
> > > 
> > > I can''t read the file directly, because of the csum
mismatch:
> > 
> > Chris, is there a way to force reading the file? Seems like that would
> > be a very handy feature.
> > 
> > Markus, not sure if that works, but you could always try and remount
> > with data checksumming disabled.
> > 
> > mount /dev/fooX -o remount,rw,nodatasum
> > 
> > should do the trick.
> 
> That doesn''t work unfortunately, btrfs still calculates and
compares the
> checksums (it won''t write new ones I guess).
Ah ok, as mentioned I wasn''t sure whether that would work or not.
I''ll
defer to Chris :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Trippelsdorf

2009-Sep-17 12:15 UTC

head link

Re: btrfs csum failed on git .pack file

On Thu, Sep 17, 2009 at 11:05:49AM +0200, Jens Axboe
wrote:> On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
> > On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote:
> > > On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
> > > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe wrote:
> > > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > > > Just got this error today in my dmesg:
> > > > > > btrfs csum failed ino 1483065 off 158482432 csum
4283543305 private 43905798
> > > > > > 
> > > > > > linux % find . -inum 1483065
> > > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > > > 
> > > > > > It''s the main pack file from my git linux
kernel tree:
> > > > > > 
> > > > > 
> > > > > Hmm, I ran into something very similar. Care to check
what the corrupted
> > > > > block of data looks like (and how big it is)?
> > > > 
> > > > I''ve hit the same problem again today:
> > > > 
> > > > btrfs csum failed ino 1826333 off 150208512 csum 4148434891
private 1660028275
> > > > 
> > > > The file in question is:
> > > >
./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
> > > > 
> > > > I can''t read the file directly, because of the csum
mismatch:
> > > 
> > > Chris, is there a way to force reading the file? Seems like that
would
> > > be a very handy feature.
> > > 
> > > Markus, not sure if that works, but you could always try and
remount
> > > with data checksumming disabled.
> > > 
> > > mount /dev/fooX -o remount,rw,nodatasum
> > > 
> > > should do the trick.
> > 
> > That doesn''t work unfortunately, btrfs still calculates and
compares the
> > checksums (it won''t write new ones I guess).
> 
> Ah ok, as mentioned I wasn''t sure whether that would work or not.
I''ll
> defer to Chris :-)
Understood.

I did some further investigations and was able to reconstruct exactly
the same pack file in question by starting from an older backup copy of
my git repro and then running the same git commands as previous. 
Then I did a binary comparison between this reconstructed file and a
corrupted backup copy from the time before the csum errors occured (I
automatically backup every 4h).

This is the result (first line good pack file, second line corrupted
file):

vbindiff
debug/.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
debug2/.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack

0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 B7 FD AB DA 74 2D 1C
0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 33 FD AB DA 74 2D 1C

06CD DF90: B0 22 6B 46 9F ED 6E 47  73 5E 7E EB DA 5F D6 11
06CD DF90: B0 22 6B 46 9F ED 6E 47  73 1E 7E EB DA 5F D6 11

06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 4B 08 94 C0 65 17 3A
06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 0B 08 94 C0 65 17 3A

0802 C3C0: 5C A5 E1 4A 1C BC 14 04  16 4A 29 D3 CC EF A6 80
0802 C3C0: 5C 25 E1 4A 1C BC 14 04  16 48 29 D3 CC EF A6 80

081A B3C0: 7D 7A 2C CD 20 89 E5 F2  A8 D3 32 38 04 BA 8A B5
081A B3C0: 7D 3A 2C CD 20 89 E5 F2  A8 D3 32 38 04 BA 8A B5

098E C430: FE 24 4A 19 09 F4 D5 1F  22 E8 36 FA F8 55 B2 6E
098E C430: FE 24 4A 19 09 F4 D5 1F  22 E0 36 FA F8 55 B2 6E

098E C440: 1B 3F C1 B4 BB 80 F8 5A  FB EE 0D A3 3F C5 A4 DB
098E C440: 1B 3D C1 B4 BB 80 F8 5A  FB EE 0D A3 3F C5 A4 DB

098E C4D0: F8 6C E2 65 18 7A 5D 33  2E 35 77 64 B2 81 BE DF
098E C4D0: F8 6C E2 65 18 7A 5D 33  2E 25 77 64 B2 81 BE DF

098E C4E0: 05 18 DE E3 00 78 D2 2C  4F 91 8F AF 0B F6 0C 31
098E C4E0: 05 1C DE E3 00 78 D2 2C  4F 91 8F AF 0B F6 0C 31

098E C500: 0A 12 D3 E7 FA B8 40 DE  0D 71 94 88 5D 4C 97 21
098E C500: 0A 12 D3 E7 FA B8 40 DE  0D 51 94 88 5D 4C 97 21

098E C540: 93 F2 58 C7 49 9A AA EB  30 3D 28 AA E3 09 4B 7B
098E C540: 93 F2 58 C7 49 9A AA EB  30 3C 28 AA E3 09 4B 7B

0FDE C420: F3 6A C2 38 76 43 9E 86  0D 9C 89 86 F1 E6 B0 F2
0FDE C420: F3 6A C2 38 76 43 9E 86  0D DC 89 86 F1 E6 B0 F2

0FDE C430: 38 E4 69 2E 22 1D E4 FF  90 A7 C6 E8 9F 08 4C 98
0FDE C430: 38 E4 69 2E 22 1D E4 FF  90 A5 C6 E8 9F 08 4C 98

1214 A4C0: 24 D6 56 AC 8B D8 D0 9B  D2 62 7B 83 C7 0B 3D BE
1214 A4C0: 24 D4 56 AC 8B D8 D0 9B  D2 62 7B 83 C7 0B 3D BE

1214 A500: EC 51 D3 FF C5 7D 30 DD  6D 45 50 FE E9 64 A4 FC
1214 A500: EC 11 D3 FF C5 7D 30 DD  6D 45 50 FE E9 64 A4 FC

1214 A520: D9 4D 63 EB 77 4D F0 BE  5E B3 6B DE E6 D2 28 67
1214 A520: D9 4D 63 EB 77 4D F0 BE  5E 33 6B DE E6 D2 28 67

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Trippelsdorf

2009-Sep-17 13:58 UTC

head link

Re: btrfs csum failed on git .pack file

On Thu, Sep 17, 2009 at 02:15:01PM +0200, Markus Trippelsdorf
wrote:> On Thu, Sep 17, 2009 at 11:05:49AM +0200, Jens Axboe wrote:
> > On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
> > > On Thu, Sep 17, 2009 at 08:44:56AM +0200, Jens Axboe wrote:
> > > > On Thu, Sep 17 2009, Markus Trippelsdorf wrote:
> > > > > On Tue, Sep 08, 2009 at 10:00:42PM +0200, Jens Axboe
wrote:
> > > > > > On Mon, Sep 07 2009, Markus Trippelsdorf wrote:
> > > > > > > Just got this error today in my dmesg:
> > > > > > > btrfs csum failed ino 1483065 off 158482432
csum 4283543305 private 43905798
> > > > > > > 
> > > > > > > linux % find . -inum 1483065
> > > > > > >
./.git/objects/pack/pack-f9251bcc6a8afe3c92193e14d1d742f2f0182ce5.pack
> > > > > > > 
> > > > > > > It''s the main pack file from my git
linux kernel tree:
> > > > > > > 
> > > > > > 
> > > > > > Hmm, I ran into something very similar. Care to
check what the corrupted
> > > > > > block of data looks like (and how big it is)?
> > > > > 
> > > > > I''ve hit the same problem again today:
> > > > > 
> > > > > btrfs csum failed ino 1826333 off 150208512 csum
4148434891 private 1660028275
> > > > > 
> > > > > The file in question is:
> > > > >
./.git/objects/pack/pack-a2330b703d5a7fd62626b39a5fdfb6eecf739d0d.pack
> > > > > 
> > > > > I can''t read the file directly, because of the
csum mismatch:
> > > > 
> > > > Chris, is there a way to force reading the file? Seems like
that would
> > > > be a very handy feature.
> > > > 
> > > > Markus, not sure if that works, but you could always try and
remount
> > > > with data checksumming disabled.
> > > > 
> > > > mount /dev/fooX -o remount,rw,nodatasum
> > > > 
> > > > should do the trick.
> > > 
> > > That doesn''t work unfortunately, btrfs still calculates
and compares the
> > > checksums (it won''t write new ones I guess).
> > 
> > Ah ok, as mentioned I wasn''t sure whether that would work or
not. I''ll
> > defer to Chris :-)
> 
> Understood.
> 
> I did some further investigations and was able to reconstruct exactly
> the same pack file in question by starting from an older backup copy of
> my git repro and then running the same git commands as previous. 
> Then I did a binary comparison between this reconstructed file and a
> corrupted backup copy from the time before the csum errors occured (I
> automatically backup every 4h).
> Thanks to Chris'' patch (from IRC) I was able to compare the file with
the csum error to the reconstructed one. You''ll find the reults as
attachments.

-- 
Markus

Zach Brown

2009-Sep-17 17:00 UTC

head link

Re: btrfs csum failed on git .pack file

> 0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 B7 FD AB DA 74 2D 1C
> 0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 33 FD AB DA 74 2D 1C
B7 = 10110111
33 = 00110011
> 06CD DF90: B0 22 6B 46 9F ED 6E 47  73 5E 7E EB DA 5F D6 11
> 06CD DF90: B0 22 6B 46 9F ED 6E 47  73 1E 7E EB DA 5F D6 11
5E = 01011110
1E = 00011110
> 06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 4B 08 94 C0 65 17 3A
> 06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 0B 08 94 C0 65 17 3A
4B = 01001011
0B = 00001011

And so on.

It looks like a few bits are getting flipped at the same byte offset.
One can imagine software bugs that would do this, certainly, but upset
hardware seems awfully likely too.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Markus Trippelsdorf

2009-Sep-17 17:10 UTC

head link

Re: btrfs csum failed on git .pack file

On Thu, Sep 17, 2009 at 10:00:28AM -0700, Zach Brown
wrote:> 
> > 0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 B7 FD AB DA 74 2D 1C
> > 0130 9FA0: E2 3B 43 AA 63 BF 28 B3  87 33 FD AB DA 74 2D 1C
> 
> B7 = 10110111
> 33 = 00110011
> 
> > 06CD DF90: B0 22 6B 46 9F ED 6E 47  73 5E 7E EB DA 5F D6 11
> > 06CD DF90: B0 22 6B 46 9F ED 6E 47  73 1E 7E EB DA 5F D6 11
> 
> 5E = 01011110
> 1E = 00011110
> 
> > 06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 4B 08 94 C0 65 17 3A
> > 06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 0B 08 94 C0 65 17 3A
> 
> 4B = 01001011
> 0B = 00001011
> 
> And so on.
> 
> It looks like a few bits are getting flipped at the same byte offset.
> One can imagine software bugs that would do this, certainly, but upset
> hardware seems awfully likely too.
I''m afraid you''re right. I did some further tests and now
I''m pretty
sure that a bad RAM module was the root cause of it all...
Oh well.

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Torcz

2009-Sep-17 17:50 UTC

head link

Re: btrfs csum failed on git .pack file

On Thu, Sep 17, 2009 at 07:10:06PM +0200, Markus Trippelsdorf
wrote:> > > 06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 4B 08 94 C0 65 17 3A
> > > 06CD DFC0: 0D 86 2B B2 57 A4 5A CD  78 0B 08 94 C0 65 17 3A
> > 
> > 4B = 01001011
> > 0B = 00001011
> > 
> > And so on.
> > 
> > It looks like a few bits are getting flipped at the same byte offset.
> > One can imagine software bugs that would do this, certainly, but upset
> > hardware seems awfully likely too.
> 
> I''m afraid you''re right. I did some further tests and now
I''m pretty
> sure that a bad RAM module was the root cause of it all...
> Oh well.
  On the other hand, that what''s so great in checksumming filesystems.
You found bad module thanks to btrfs, otherwise you wouldn''t suspect
anything wrong. If you have had raid-1 for data, this corruption would
have been fixed by btrfs.

-- 
Tomasz Torcz                                                       72->|  
80->|
xmpp: zdzichubg@chrome.pl                                          72->|  
80->|

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Sep 2009 - btrfs csum failed on git .pack file

btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file

Re: btrfs csum failed on git .pack file