thr3ads.net - Btrfs devel - Kernel 2.6.36 btrfs csum bugreport [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Andreas Bauer

2010-Oct-31 12:55 UTC

Kernel 2.6.36 btrfs csum bugreport

Hi everybody,

Today while playing around with btrfs I uncovered what must be a bug in the
btrfs checksum code. My kernel log received a couple of these messages with
various ino and off numbers:

btrfs csum failed ino 5098 off 524288 csum 2981133980 private 959545494
[..]

This happens on reading from the btrfs filesystem.

The funny thing is that the files are read correct, as verified by md5sum. I
have cross-checked this on another machine (with same kernel and btrfs utils):
same result. A full filesystem md5sum check showed no errors. The md5sums
obviously were computed before the data was copied to the btrfs.

So I conclude that these messages are faulty because data is read correctly. In
addition, when you have more than one btrfs you cannot see from the message
which fs it is refering to.

Here is my setup, maybe it has something to do with the (nowadays) unusual
kernel target:

- unmodified upstream 2.6.36 kernel
- Debian Squeeze
- Standard Debian gcc 4.3.5 with target i486
- CPU AMD Geode LX800 on ALIX board
- btrfs on USB-ATA connected IDE drive Seagate Barracuda 7200.8 ST3400832A
- btrs utils v0.19
- about 300GB of data of all sorts in 50000+ files on the fs
- data gets rsynced to another btrfs volume of 1TB when on read the csum errors
occur

Hope that some of this informations rings a bell on someones mind. If so, please
let me know ;)

bye, Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cwillu

2010-Oct-31 22:15 UTC

head link

Re: Kernel 2.6.36 btrfs csum bugreport

> Today while playing around with btrfs I uncovered what must be a bug in the
btrfs checksum code. My kernel log received a couple of these messages with
various ino and off numbers:
>
> btrfs csum failed ino 5098 off 524288 csum 2981133980 private 959545494
> [..]
>
> This happens on reading from the btrfs filesystem.
>
> The funny thing is that the files are read correct, as verified by md5sum.
I have cross-checked this on another machine (with same kernel and btrfs utils):
same result. A full filesystem md5sum check showed no errors. The md5sums
obviously were computed before the data was copied to the btrfs.
>
> So I conclude that these messages are faulty because data is read
correctly. In addition, when you have more than one btrfs you cannot see from
the message which fs it is refering to.
Is this a raid1 or a dup array?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andreas Bauer

2010-Nov-01 00:35 UTC

head link

RE: Kernel 2.6.36 btrfs csum bugreport

So I conclude that these messages are faulty because data is read correctly. 
 In addition, when you have more than one btrfs you cannot see from the message 
 which fs it is refering to.
 
 Is this a raid1 or a dup array?

No, plain vanilla partition on physical hard disk. Btrfs was made with the
command "mkfs.btrfs /dev/sdc1" no extra arguments.

bye, A.B.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andreas Bauer

2010-Nov-01 10:39 UTC

head link

RE: Kernel 2.6.36 btrfs csum bugreport

To follow up on this matter, I have created another two btrfs volumes (also
plain - no options - also on two external USB-SATA disks), and am at the moment
copying heaps of data between these two. No errors as of yet. All copies are
verified by md5sum after the deed.

The volume in question can still "reliably" reproduce the csum errors
on read, though. Aprox. 30 csum errors occur when the whole fs is read. The data
is still fine. I can put it aside for further debugging until at most Wednesday
morning.

If someone wants me to run diagnostics on it, please let me know. I am glad to
be of help (until Wednesday morning).

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Daniel J Blueman

2010-Nov-01 10:55 UTC

head link

Re: Kernel 2.6.36 btrfs csum bugreport

On 1 November 2010 00:35, Andreas Bauer <ab@voltage.de>
wrote:> So I conclude that these messages are faulty because data is read
correctly.
>  In addition, when you have more than one btrfs you cannot see from the
message
>  which fs it is refering to.
>
>  Is this a raid1 or a dup array?
>
> No, plain vanilla partition on physical hard disk. Btrfs was made with the
command "mkfs.btrfs /dev/sdc1" no extra arguments.
By default, metadata is duplicated, thus it could be that BTRFS is
using the correct copy of the metadata after finding checksum errors
in the first copy.

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

cwillu

2010-Nov-01 11:02 UTC

head link

Re: Kernel 2.6.36 btrfs csum bugreport

On Mon, Nov 1, 2010 at 4:55 AM, Daniel J Blueman
<daniel.blueman@gmail.com> wrote:> On 1 November 2010 00:35, Andreas Bauer <ab@voltage.de> wrote:
>> So I conclude that these messages are faulty because data is read
correctly.
>>  In addition, when you have more than one btrfs you cannot see from the
message
>>  which fs it is refering to.
>>
>>  Is this a raid1 or a dup array?
>>
>> No, plain vanilla partition on physical hard disk. Btrfs was made with
the command "mkfs.btrfs /dev/sdc1" no extra arguments.
>
> By default, metadata is duplicated, thus it could be that BTRFS is
> using the correct copy of the metadata after finding checksum errors
> in the first copy.
Ahhhhhhh, and that makes this make sense:

Andreas, have you checked which file(s) are giving the errors?  if
not, you can use "find /whatever/mountpoint -xdev -inum 5098 -print"
to get the filename.  And I would bet that it''s small enough that
it''s
being inlined into the metadata block group, and therefore covered
under the default "dup" profile of that block group, which is why
you''re getting the actual file data back.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andreas Bauer

2010-Nov-07 15:37 UTC

head link

RE: Kernel 2.6.36 btrfs csum bugreport

On Mon, Nov 01, 2010 at 12:02:10PM CET, cwillu wrote:

 Ahhhhhhh, and that makes this make sense:
 
 Andreas, have you checked which file(s) are giving the errors?  if
 not, you can use "find /whatever/mountpoint -xdev -inum 5098 -print"
 to get the filename.  And I would bet that it''s small enough that
it''s
 being inlined into the metadata block group, and therefore covered
 under the default "dup" profile of that block group, which is why
 you''re getting the actual file data back.

Sorry to disappoint, the files hit are from big (8 GB) to small. I took 
the  opportunity to compare the syslog from both machines I tested on,
and the csum ino and off counters are completely different in each case.

The filesystem which showed this behaviour has now been destoyed, and
in further testing I wasn''t able to reproduce the bug.

To summarize:

- a btrfs about 400GB in size showed several csum errors
on reading while the data read was correct. The same thing happened
when the filesystem was mounted on another machine (same kernel).

- the errors could be consistently reproduced by reading enough data. 

- about 60 - 120 csum happened on reading about 250 GB of data.

- the csum error happened to different inodes each time (and each run)

As I don''t have enough time at the moment to familiarize myself with
the btrfs code, I have to let go of this issue at this point. Thank 
you for your work.

-- A.B.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Oct 2010 - Kernel 2.6.36 btrfs csum bugreport

Kernel 2.6.36 btrfs csum bugreport

Re: Kernel 2.6.36 btrfs csum bugreport

RE: Kernel 2.6.36 btrfs csum bugreport

RE: Kernel 2.6.36 btrfs csum bugreport

Re: Kernel 2.6.36 btrfs csum bugreport

Re: Kernel 2.6.36 btrfs csum bugreport

RE: Kernel 2.6.36 btrfs csum bugreport