thr3ads.net - Btrfs devel - btrfs for files > 10GB = random spontaneous CRC failure. [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Tomasz Kusmierz

2013-Jan-14 11:09 UTC

btrfs for files > 10GB = random spontaneous CRC failure.

Hi,

Since I had some free time over Christmas, I decided to conduct few 
tests over btrFS to se how it will cope with "real life storage" for 
normal "gray users" and I''ve found that filesystem will
always mess up
your files that are larger than 10GB.

Long story:
I''ve used my set of data that I''ve got nicelly backed up on
personal
raid 5 to populate btrfs volumes: music, slr pics and video (an just a 
few document). Disks used in test are all "green" 2TB disks from WD.

1. First I started with creating btrfs (4k blocks) on one disk, filling 
it up and then adding second disk -> convert to raid1 through balance -> 
convert to raid10 trough balance. Unfortunately converting to raid1 
failed - because of CRC error in 49 files that vere bigger > 10GB. At 
this point I was a bit spooked up that my controllers are failing or 
that drives got some bad sectors. Tested everything (took few days) and 
it turns out that there is no "apparent" issue with hardware (bad 
sectors or io down to disks).
2. At this point I thought "cool this will be a perfect test case for 
scrub to show it''s magical power!". Created raid1 over two volumes
->
try scrubbing -> FAIL ... It turns out that magically I''ve got
corrupted
CRC in two exactly same logical locations (~34 files > 10GB affected).
3. Performed same test on raid10 setup (still 4k block). Same results 
(just diffrent file count).

Ok, time to dig more into this because it starts get intriguing. I''m 
running ubuntu server 12.10 with stock kernel, so my next step was to 
get 3.7.1 kernel + new btrfs tool straight from git repo.
Unfortunatelly 1 & 2 & 3 still provides same results, corrupt CRC only 
in files > 10GB.
At this point I thought "fine maybe when I''ll expand allocation
block -
it will make less block needed for big file to fit in resulting in 
propperly storing those" -> time for 16K leafs :) (-n 16K -l 16K) 
sectors are still 4K for known reasons :P. Well, it does exactly the 
same thing -> 1 & 2 & 3 same results, big files get automagically
corrupt.


Something about test data:
music - not more than 200MB files (tipical mix of mp3 & aac) 10 K files 
give or take.
pics - not more than 20MB (typical point & shot + dslr) 6K files give or 
take.
video1 - collection of little ones with size more than 300MB, less than 
1.5GB ~ 400 files
video2 - collection of 5GB - 18GB files ~400 files

I guess that stating that "files >10GB" are only affected is a long
shot, but so far I''ve not seen file less than 10GB affected (I was not 
really thorough about checking size, but all files that size I''ve 
checked were more than 10GB)

ps. As a footnote I''ll add that I''ve tried shuffling test 1, 2
& 3
without video2 and it all work just fine.

If you''ve got any ideas for work around ( other than zfs :D )
I''m happy
to try it out.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Kusmierz

2013-Jan-14 11:17 UTC

head link

btrfs for files > 10GB = random spontaneous CRC failure.

Hi,

Since I had some free time over Christmas, I decided to conduct few 
tests over btrFS to se how it will cope with "real life storage" for 
normal "gray users" and I''ve found that filesystem will
always mess up
your files that are larger than 10GB.

Long story:
I''ve used my set of data that I''ve got nicelly backed up on
personal
raid 5 to populate btrfs volumes: music, slr pics and video (an just a 
few document). Disks used in test are all "green" 2TB disks from WD.

1. First I started with creating btrfs (4k blocks) on one disk, filling 
it up and then adding second disk -> convert to raid1 through balance -> 
convert to raid10 trough balance. Unfortunately converting to raid1 
failed - because of CRC error in 49 files that vere bigger > 10GB. At 
this point I was a bit spooked up that my controllers are failing or 
that drives got some bad sectors. Tested everything (took few days) and 
it turns out that there is no "apparent" issue with hardware (bad 
sectors or io down to disks).
2. At this point I thought "cool this will be a perfect test case for 
scrub to show it''s magical power!". Created raid1 over two volumes
->
try scrubbing -> FAIL ... It turns out that magically I''ve got
corrupted
CRC in two exactly same logical locations on two different disks (~34 
files > 10GB affected) hence scrub can''t do anything with it. It
only
reports it as "uncorrectable errors"
3. Performed same test on raid10 setup (still 4k block). Same results 
(just different file count).

Ok, time to dig more into this because it starts get intriguing. I''m 
running ubuntu server 12.10 (64bit) with stock kernel, so my next step 
was to get 3.7.1 kernel + new btrfs tool straight from git repo.
Unfortunatelly 1 & 2 & 3 still provides same results, corrupt CRC only 
in files > 10GB.
At this point I thought "fine maybe when I''ll expand allocation
block -
it will make less block needed for big file to fit in resulting in 
propperly storing those" -> time for 16K leafs :) (-n 16K -l 16K) 
sectors are still 4K for known reasons :P. Well, it does exactly the 
same thing -> 1 & 2 & 3 same results, big files get automagically
corrupt.


Something about test data:
music - not more than 200MB files (tipical mix of mp3 & aac) 10 K files 
give or take.
pics - not more than 20MB (typical point & shot + dslr) 6K files give or 
take.
video1 - collection of little ones with size more than 300MB, less than 
1.5GB ~ 400 files
video2 - collection of 5GB - 18GB files ~400 files

I guess that stating that "files >10GB" are only affected is a long
shot, but so far I''ve not seen file less than 10GB affected (I was not 
really thorough about checking size, but all files that size I''ve 
checked were more than 10GB)

ps. As a footnote I''ll add that I''ve tried shuffling test 1, 2
& 3
without video2 and it all work just fine.

If you''ve got any ideas for work around ( other than zfs :D )
I''m happy
to try it out.

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roman Mamedov

2013-Jan-14 11:25 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Hello,

On Mon, 14 Jan 2013 11:17:17 +0000
Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:
> this point I was a bit spooked up that my controllers are failing or 
Which controller manufacturer/model?

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

Tomasz Kusmierz

2013-Jan-14 11:43 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 14/01/13 11:25, Roman Mamedov wrote:> Hello,
>
> On Mon, 14 Jan 2013 11:17:17 +0000
> Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:
>
>> this point I was a bit spooked up that my controllers are failing or
> Which controller manufacturer/model?
>Well, this is a "home server" (which I preffer to tinker on). Two 
controllers were used, mother board build in, and crappy Adaptec pcie one.

00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI 
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]
02:00.0 RAID bus controller: Adaptec Serial ATA II RAID 1430SA (rev 02)


ps. MoBo is: ASUS M4A79T Deluxe
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2013-Jan-14 14:59 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz
wrote:> Hi,
> 
> Since I had some free time over Christmas, I decided to conduct few 
> tests over btrFS to se how it will cope with "real life storage"
for
> normal "gray users" and I''ve found that filesystem will
always mess up
> your files that are larger than 10GB.
Hi Tom,

I''d like to nail down the test case a little better.

1) Create on one drive, fill with data
2) Add a second drive, convert to raid1
3) find corruptions?

What happens if you start with two drives in raid1?  In other words,
I''m
trying to see if this is a problem with the conversion code.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Kusmierz

2013-Jan-14 15:22 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 14/01/13 14:59, Chris Mason wrote:> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:
>> Hi,
>>
>> Since I had some free time over Christmas, I decided to conduct few
>> tests over btrFS to se how it will cope with "real life
storage" for
>> normal "gray users" and I''ve found that filesystem
will always mess up
>> your files that are larger than 10GB.
> Hi Tom,
>
> I''d like to nail down the test case a little better.
>
> 1) Create on one drive, fill with data
> 2) Add a second drive, convert to raid1
> 3) find corruptions?
>
> What happens if you start with two drives in raid1?  In other words,
I''m
> trying to see if this is a problem with the conversion code.
>
> -chrisOk, my description might be a bit enigmatic so to cut long story short 
tests are:
1) create a single drive default btrfs volume on single partition -> 
fill with test data -> scrub -> admire errors.
2) create a raid1 (-d raid1 -m raid1) volume with two partitions on 
separate disk, each same size etc. -> fill with test data -> scrub -> 
admire errors.
3) create a raid10 (-d raid10 -m raid1) volume with four partitions on 
separate disk, each same size etc. -> fill with test data -> scrub -> 
admire errors.

all disks are same age + size + model ... two different batches to avoid 
same time failure.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2013-Jan-14 15:57 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz
wrote:> On 14/01/13 14:59, Chris Mason wrote:
> > On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:
> >> Hi,
> >>
> >> Since I had some free time over Christmas, I decided to conduct
few
> >> tests over btrFS to se how it will cope with "real life
storage" for
> >> normal "gray users" and I''ve found that
filesystem will always mess up
> >> your files that are larger than 10GB.
> > Hi Tom,
> >
> > I''d like to nail down the test case a little better.
> >
> > 1) Create on one drive, fill with data
> > 2) Add a second drive, convert to raid1
> > 3) find corruptions?
> >
> > What happens if you start with two drives in raid1?  In other words,
I''m
> > trying to see if this is a problem with the conversion code.
> >
> > -chris
> Ok, my description might be a bit enigmatic so to cut long story short 
> tests are:
> 1) create a single drive default btrfs volume on single partition -> 
> fill with test data -> scrub -> admire errors.
> 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on 
> separate disk, each same size etc. -> fill with test data -> scrub
->
> admire errors.
> 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on 
> separate disk, each same size etc. -> fill with test data -> scrub
->
> admire errors.
> 
> all disks are same age + size + model ... two different batches to avoid 
> same time failure.
Ok, so we have two possible causes.  #1 btrfs is writing garbage to your
disks.  #2 something in your kernel is corrupting your data.

Since you''re able to see this 100% of the time, lets assume that if #2
were true, we''d be able to trigger it on other filesystems.

So, I''ve attached an old friend, stress.sh.  Use it like this:

stress.sh -n 5 -c <your source directory> -s <your btrfs mount
point>

It will run in a loop with 5 parallel processes and make 5 copies of
your data set into the destination.  It will run forever until there are
errors.  You can use a higher process count (-n) to force more
concurrency and use more ram.  It may help to pin down all but 2 or 3 GB
of your memory.

What I''d like you to do is find a data set and command line that make
the script find errors on btrfs.  Then, try the same thing on xfs or
ext4 and let it run at least twice as long.  Then report back ;)

-chris

Roman Mamedov

2013-Jan-14 16:20 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On Mon, 14 Jan 2013 15:22:36 +0000
Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:
> 1) create a single drive default btrfs volume on single partition -> 
> fill with test data -> scrub -> admire errors.
Did you try ruling out btrfs as the cause of the problem? Maybe something else
in your system is corrupting data, and btrfs just lets you know about that.

I.e. on the same drive, create an Ext4 filesystem, copy some data to it which
has known checksums (use md5sum or cfv to generate them in advance for data
that is on another drive and is waiting to be copied); copy to that drive,
flush caches, verify checksums of files at the destination.

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

Tomasz Kusmierz

2013-Jan-14 16:32 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 14/01/13 15:57, Chris Mason wrote:> On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:
>> On 14/01/13 14:59, Chris Mason wrote:
>>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:
>>>> Hi,
>>>>
>>>> Since I had some free time over Christmas, I decided to conduct
few
>>>> tests over btrFS to se how it will cope with "real life
storage" for
>>>> normal "gray users" and I''ve found that
filesystem will always mess up
>>>> your files that are larger than 10GB.
>>> Hi Tom,
>>>
>>> I''d like to nail down the test case a little better.
>>>
>>> 1) Create on one drive, fill with data
>>> 2) Add a second drive, convert to raid1
>>> 3) find corruptions?
>>>
>>> What happens if you start with two drives in raid1?  In other
words, I''m
>>> trying to see if this is a problem with the conversion code.
>>>
>>> -chris
>> Ok, my description might be a bit enigmatic so to cut long story short
>> tests are:
>> 1) create a single drive default btrfs volume on single partition ->
>> fill with test data -> scrub -> admire errors.
>> 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on
>> separate disk, each same size etc. -> fill with test data ->
scrub ->
>> admire errors.
>> 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on
>> separate disk, each same size etc. -> fill with test data ->
scrub ->
>> admire errors.
>>
>> all disks are same age + size + model ... two different batches to
avoid
>> same time failure.
> Ok, so we have two possible causes.  #1 btrfs is writing garbage to your
> disks.  #2 something in your kernel is corrupting your data.
>
> Since you''re able to see this 100% of the time, lets assume that
if #2
> were true, we''d be able to trigger it on other filesystems.
>
> So, I''ve attached an old friend, stress.sh.  Use it like this:
>
> stress.sh -n 5 -c <your source directory> -s <your btrfs mount
point>
>
> It will run in a loop with 5 parallel processes and make 5 copies of
> your data set into the destination.  It will run forever until there are
> errors.  You can use a higher process count (-n) to force more
> concurrency and use more ram.  It may help to pin down all but 2 or 3 GB
> of your memory.
>
> What I''d like you to do is find a data set and command line that
make
> the script find errors on btrfs.  Then, try the same thing on xfs or
> ext4 and let it run at least twice as long.  Then report back ;)
>
> -chris
>Chris,

Will do, just please be remember that 2TB of test data on "customer 
grade" sata drives will take a while to test :)



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Kusmierz

2013-Jan-14 16:34 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 14/01/13 16:20, Roman Mamedov wrote:> On Mon, 14 Jan 2013 15:22:36 +0000
> Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:
>
>> 1) create a single drive default btrfs volume on single partition ->
>> fill with test data -> scrub -> admire errors.
> Did you try ruling out btrfs as the cause of the problem? Maybe something
else
> in your system is corrupting data, and btrfs just lets you know about that.
>
> I.e. on the same drive, create an Ext4 filesystem, copy some data to it
which
> has known checksums (use md5sum or cfv to generate them in advance for data
> that is on another drive and is waiting to be copied); copy to that drive,
> flush caches, verify checksums of files at the destination.
>Hi Roman,

Chris just provided his good old friend "stress.sh" that should do
that.
So I''ll dive into more testing :)

Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2013-Jan-14 16:34 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz
wrote:> On 14/01/13 15:57, Chris Mason wrote:
> > On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:
> >> On 14/01/13 14:59, Chris Mason wrote:
> >>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz
wrote:
> >>>> Hi,
> >>>>
> >>>> Since I had some free time over Christmas, I decided to
conduct few
> >>>> tests over btrFS to se how it will cope with "real
life storage" for
> >>>> normal "gray users" and I''ve found that
filesystem will always mess up
> >>>> your files that are larger than 10GB.
> >>> Hi Tom,
> >>>
> >>> I''d like to nail down the test case a little better.
> >>>
> >>> 1) Create on one drive, fill with data
> >>> 2) Add a second drive, convert to raid1
> >>> 3) find corruptions?
> >>>
> >>> What happens if you start with two drives in raid1?  In other
words, I''m
> >>> trying to see if this is a problem with the conversion code.
> >>>
> >>> -chris
> >> Ok, my description might be a bit enigmatic so to cut long story
short
> >> tests are:
> >> 1) create a single drive default btrfs volume on single partition
->
> >> fill with test data -> scrub -> admire errors.
> >> 2) create a raid1 (-d raid1 -m raid1) volume with two partitions
on
> >> separate disk, each same size etc. -> fill with test data ->
scrub ->
> >> admire errors.
> >> 3) create a raid10 (-d raid10 -m raid1) volume with four
partitions on
> >> separate disk, each same size etc. -> fill with test data ->
scrub ->
> >> admire errors.
> >>
> >> all disks are same age + size + model ... two different batches to
avoid
> >> same time failure.
> > Ok, so we have two possible causes.  #1 btrfs is writing garbage to
your
> > disks.  #2 something in your kernel is corrupting your data.
> >
> > Since you''re able to see this 100% of the time, lets assume
that if #2
> > were true, we''d be able to trigger it on other filesystems.
> >
> > So, I''ve attached an old friend, stress.sh.  Use it like
this:
> >
> > stress.sh -n 5 -c <your source directory> -s <your btrfs
mount point>
> >
> > It will run in a loop with 5 parallel processes and make 5 copies of
> > your data set into the destination.  It will run forever until there
are
> > errors.  You can use a higher process count (-n) to force more
> > concurrency and use more ram.  It may help to pin down all but 2 or 3
GB
> > of your memory.
> >
> > What I''d like you to do is find a data set and command line
that make
> > the script find errors on btrfs.  Then, try the same thing on xfs or
> > ext4 and let it run at least twice as long.  Then report back ;)
> >
> > -chris
> >
> Chris,
> 
> Will do, just please be remember that 2TB of test data on "customer 
> grade" sata drives will take a while to test :)
Many thanks.  You might want to start with a smaller data set, 20GB or
so total.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Lars Weber

2013-Jan-15 16:54 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Hi,

i had a similar scenario like Tomasz:
- Started with single 3TB Disk.
- Filled the 3TB Disk with a lot of files (more than 30 with 10-30GB)
- Added 2x 1,5TB Disks
- btrfs balance start dconvert=raid1 mconvert=raid1 $MOUNT
- # btrfs scrub start $MOUNT
- # btrfs scrub status $MOUNT

scrub status for $ID
     scrub started at Tue Jan 15 07:10:15 2013 and finished after 24020 
seconds
     total bytes scrubbed: 4.30TB with 0 errors

so at least it is no general bug in btrfs - maybe this helps you...

# uname -a
Linux n40l 3.7.2 #1 SMP Sun Jan 13 11:46:56 CET 2013 x86_64 GNU/Linux
# btrfs version
Btrfs v0.20-rc1-37-g91d9ee

Regards
Lars

Am 14.01.2013 17:34, schrieb Chris Mason:> On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote:
>> On 14/01/13 15:57, Chris Mason wrote:
>>> On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:
>>>> On 14/01/13 14:59, Chris Mason wrote:
>>>>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz
wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Since I had some free time over Christmas, I decided to
conduct few
>>>>>> tests over btrFS to se how it will cope with "real
life storage" for
>>>>>> normal "gray users" and I''ve found
that filesystem will always mess up
>>>>>> your files that are larger than 10GB.
>>>>> Hi Tom,
>>>>>
>>>>> I''d like to nail down the test case a little
better.
>>>>>
>>>>> 1) Create on one drive, fill with data
>>>>> 2) Add a second drive, convert to raid1
>>>>> 3) find corruptions?
>>>>>
>>>>> What happens if you start with two drives in raid1?  In
other words, I''m
>>>>> trying to see if this is a problem with the conversion
code.
>>>>>
>>>>> -chris
>>>> Ok, my description might be a bit enigmatic so to cut long
story short
>>>> tests are:
>>>> 1) create a single drive default btrfs volume on single
partition ->
>>>> fill with test data -> scrub -> admire errors.
>>>> 2) create a raid1 (-d raid1 -m raid1) volume with two
partitions on
>>>> separate disk, each same size etc. -> fill with test data
-> scrub ->
>>>> admire errors.
>>>> 3) create a raid10 (-d raid10 -m raid1) volume with four
partitions on
>>>> separate disk, each same size etc. -> fill with test data
-> scrub ->
>>>> admire errors.
>>>>
>>>> all disks are same age + size + model ... two different batches
to avoid
>>>> same time failure.
>>> Ok, so we have two possible causes.  #1 btrfs is writing garbage to
your
>>> disks.  #2 something in your kernel is corrupting your data.
>>>
>>> Since you''re able to see this 100% of the time, lets
assume that if #2
>>> were true, we''d be able to trigger it on other
filesystems.
>>>
>>> So, I''ve attached an old friend, stress.sh.  Use it like
this:
>>>
>>> stress.sh -n 5 -c <your source directory> -s <your btrfs
mount point>
>>>
>>> It will run in a loop with 5 parallel processes and make 5 copies
of
>>> your data set into the destination.  It will run forever until
there are
>>> errors.  You can use a higher process count (-n) to force more
>>> concurrency and use more ram.  It may help to pin down all but 2 or
3 GB
>>> of your memory.
>>>
>>> What I''d like you to do is find a data set and command
line that make
>>> the script find errors on btrfs.  Then, try the same thing on xfs
or
>>> ext4 and let it run at least twice as long.  Then report back ;)
>>>
>>> -chris
>>>
>> Chris,
>>
>> Will do, just please be remember that 2TB of test data on
"customer
>> grade" sata drives will take a while to test :)
> Many thanks.  You might want to start with a smaller data set, 20GB or
> so total.
>
> -chris
>
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
ADC-Ingenieurbüro Wiedemann | In der Borngasse 12 | 57520 Friedewald | Tel:
02743-930233 | Fax: 02743-930235 | www.adc-wiedemann.de
GF: Dipl.-Ing. Hendrik Wiedemann | Umsatzsteuer-ID: DE 147979431

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tom Kusmierz

2013-Jan-15 23:32 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 14/01/13 16:34, Chris Mason wrote:> On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote:
>> On 14/01/13 15:57, Chris Mason wrote:
>>> On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:
>>>> On 14/01/13 14:59, Chris Mason wrote:
>>>>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz
wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Since I had some free time over Christmas, I decided to
conduct few
>>>>>> tests over btrFS to se how it will cope with "real
life storage" for
>>>>>> normal "gray users" and I''ve found
that filesystem will always mess up
>>>>>> your files that are larger than 10GB.
>>>>> Hi Tom,
>>>>>
>>>>> I''d like to nail down the test case a little
better.
>>>>>
>>>>> 1) Create on one drive, fill with data
>>>>> 2) Add a second drive, convert to raid1
>>>>> 3) find corruptions?
>>>>>
>>>>> What happens if you start with two drives in raid1?  In
other words, I''m
>>>>> trying to see if this is a problem with the conversion
code.
>>>>>
>>>>> -chris
>>>> Ok, my description might be a bit enigmatic so to cut long
story short
>>>> tests are:
>>>> 1) create a single drive default btrfs volume on single
partition ->
>>>> fill with test data -> scrub -> admire errors.
>>>> 2) create a raid1 (-d raid1 -m raid1) volume with two
partitions on
>>>> separate disk, each same size etc. -> fill with test data
-> scrub ->
>>>> admire errors.
>>>> 3) create a raid10 (-d raid10 -m raid1) volume with four
partitions on
>>>> separate disk, each same size etc. -> fill with test data
-> scrub ->
>>>> admire errors.
>>>>
>>>> all disks are same age + size + model ... two different batches
to avoid
>>>> same time failure.
>>> Ok, so we have two possible causes.  #1 btrfs is writing garbage to
your
>>> disks.  #2 something in your kernel is corrupting your data.
>>>
>>> Since you''re able to see this 100% of the time, lets
assume that if #2
>>> were true, we''d be able to trigger it on other
filesystems.
>>>
>>> So, I''ve attached an old friend, stress.sh.  Use it like
this:
>>>
>>> stress.sh -n 5 -c <your source directory> -s <your btrfs
mount point>
>>>
>>> It will run in a loop with 5 parallel processes and make 5 copies
of
>>> your data set into the destination.  It will run forever until
there are
>>> errors.  You can use a higher process count (-n) to force more
>>> concurrency and use more ram.  It may help to pin down all but 2 or
3 GB
>>> of your memory.
>>>
>>> What I''d like you to do is find a data set and command
line that make
>>> the script find errors on btrfs.  Then, try the same thing on xfs
or
>>> ext4 and let it run at least twice as long.  Then report back ;)
>>>
>>> -chris
>>>
>> Chris,
>>
>> Will do, just please be remember that 2TB of test data on
"customer
>> grade" sata drives will take a while to test :)
> Many thanks.  You might want to start with a smaller data set, 20GB or
> so total.
>
> -chris
>Chris & all,

Sorry for not replying for that long but Chris old friend "stress.sh" 
have proven that all my storage is affected with this bug and first 
thing was to bring everything down before corruptions will spread any 
further. Anyway for subject sake btrfs stress have failed after 2h, ext4 
stress have failed after 8h (according to "time ./stress.sh blablabla"
)
- so it might be related to that ext4 always seamed slower on my machine 
than btrfs.


Anyway I wanted to use this opportunity to thank Chris and everybody 
related to btrfs development - your file system found a hidden bug in my 
set up that would be there until it would pretty much corrupt 
everything. I don''t even want to think how much my main storage got 
corrupted over time (etx4 over lvm over md raid 5).

p.s. bizzare that when I "fill" ext4 partition with test data
everything
check''s up OK (crc over all files), but with Chris tool it gets 
corrupted - for both Adaptec crappy pcie controller and for mother board 
built in one. Also since courses of history proven that my testing 
facilities are crap - any suggestion''s on how can I test ram, cpu &
controller would be appreciated.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2013-Jan-15 23:44 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On Tue, Jan 15, 2013 at 04:32:10PM -0700, Tom Kusmierz
wrote:> Chris & all,
> 
> Sorry for not replying for that long but Chris old friend
"stress.sh"
> have proven that all my storage is affected with this bug and first 
> thing was to bring everything down before corruptions will spread any 
> further. Anyway for subject sake btrfs stress have failed after 2h, ext4 
> stress have failed after 8h (according to "time ./stress.sh
blablabla" )
> - so it might be related to that ext4 always seamed slower on my machine 
> than btrfs.
Ok, great.  These problems are really hard to debug, and I''m glad
we''ve
nailed it down to the lower layers.
> 
> 
> Anyway I wanted to use this opportunity to thank Chris and everybody 
> related to btrfs development - your file system found a hidden bug in my 
> set up that would be there until it would pretty much corrupt 
> everything. I don''t even want to think how much my main storage
got
> corrupted over time (etx4 over lvm over md raid 5).
> 
> p.s. bizzare that when I "fill" ext4 partition with test data
everything
> check''s up OK (crc over all files), but with Chris tool it gets 
> corrupted - for both Adaptec crappy pcie controller and for mother board 
> built in one.
One really hard part of tracking down corruptions is that our boxes have
so much ram right now that they are often hidden by the page cache.  My
first advice is to boot with much less ram (1G/2G) or pin down all your
ram for testing.  A problem that triggers in 10 minutes is a billion
times easier to figure out than one that triggers in 8 hours.
> Also since courses of history proven that my testing 
> facilities are crap - any suggestion''s on how can I test ram, cpu
&
> controller would be appreciated.
Step one is to figure out if you''ve got a CPU/memory problem or an IO
problem.
memtest is often able to find CPU and memory problems, but if you pass
memtest I like to use gcc for extra hard testing.

If you have the ram, make a copy of the linux kernel tree in /dev/shm or
any ramdisk/tmpfs mount.  Then run make -j ; make clean in a loop until
your box either crashes, gcc reports an internal compiler error, or 16
hours go by.  Your loop will need to check for failed makes and stop
once you get the first failure.

Hopefully that will catch it.  Otherwise we need to look at the IO
stack.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bernd Schubert

2013-Jan-16 09:21 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 01/16/2013 12:32 AM, Tom Kusmierz wrote:
> p.s. bizzare that when I "fill" ext4 partition with test data
everything
> check''s up OK (crc over all files), but with Chris tool it gets
> corrupted - for both Adaptec crappy pcie controller and for mother board
> built in one. Also since courses of history proven that my testing
> facilities are crap - any suggestion''s on how can I test ram, cpu
&
> controller would be appreciated.
Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe 
you could try that? You can easily see the pattern of the corruption 
with that. But maybe Chris'' stress.sh also provides it.
Anyway, I yesterday added support to specify min and max file size, as 
it before only used 1MiB to 1GiB sizes... It''s a bit cryptic with bits,
though, I will improve that later.
https://bitbucket.org/aakef/ql-fstest/downloads


Cheers,
Bernd


PS: But see my other thread, using ql-fstest I yesterday entirely broke 
a btrfs test file system resulting in kernel panics.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Kusmierz

2013-Feb-05 10:16 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 16/01/13 09:21, Bernd Schubert wrote:> On 01/16/2013 12:32 AM, Tom Kusmierz wrote:
>
>> p.s. bizzare that when I "fill" ext4 partition with test data
everything
>> check''s up OK (crc over all files), but with Chris tool it
gets
>> corrupted - for both Adaptec crappy pcie controller and for mother
board
>> built in one. Also since courses of history proven that my testing
>> facilities are crap - any suggestion''s on how can I test ram,
cpu &
>> controller would be appreciated.
>
> Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe 
> you could try that? You can easily see the pattern of the corruption 
> with that. But maybe Chris'' stress.sh also provides it.
> Anyway, I yesterday added support to specify min and max file size, as 
> it before only used 1MiB to 1GiB sizes... It''s a bit cryptic with 
> bits, though, I will improve that later.
> https://bitbucket.org/aakef/ql-fstest/downloads
>
>
> Cheers,
> Bernd
>
>
> PS: But see my other thread, using ql-fstest I yesterday entirely 
> broke a btrfs test file system resulting in kernel panics.
Hi,

Its been a while, but I think I should provide a "definite anwser" or 
simply what was the cause of whole problem:

It was a printer!

Long story short, I was going nuts trying to diagnose which bit of my 
server is going bad and effectively I was down to blaming a interface 
card that connects hotswapable disks to mobo / pcie controllers. When 
I''ve got back from my holiday I''ve sat in front of server and
decided to
go with ql-fstest which in a very nice way reports errors with a very 
low lag (~2 minutes) after they occurred. At this point my printer 
kicked in with "self clean" and error just showed up after ~ two
minutes
- so I''ve restarted printer and while it was going through
it''s own post
with self clean another error showed up. Issue here turned out to be 
that I was using one of those fantastic pci 4 port ethernet cards and 
printer was directly to it - after moving it and everything else to 
switch all problem and issues have went away. AT the moment I''m running
server for 2 weeks without any corruptions, any random kernel btrfs 
crashes etc.


Anyway I wanted to thank again to Chris and rest of btrFS dev people for 
this fantastic filesystem that let me discover how stupid setup I was 
running and how deep into shiet I''ve put my self.

CHEERS LADS !



Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2013-Feb-05 12:49 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On Tue, Feb 05, 2013 at 03:16:34AM -0700, Tomasz Kusmierz
wrote:> On 16/01/13 09:21, Bernd Schubert wrote:
> > On 01/16/2013 12:32 AM, Tom Kusmierz wrote:
> >
> >> p.s. bizzare that when I "fill" ext4 partition with test
data everything
> >> check''s up OK (crc over all files), but with Chris tool
it gets
> >> corrupted - for both Adaptec crappy pcie controller and for mother
board
> >> built in one. Also since courses of history proven that my testing
> >> facilities are crap - any suggestion''s on how can I test
ram, cpu &
> >> controller would be appreciated.
> >
> > Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe
> > you could try that? You can easily see the pattern of the corruption 
> > with that. But maybe Chris'' stress.sh also provides it.
> > Anyway, I yesterday added support to specify min and max file size, as
> > it before only used 1MiB to 1GiB sizes... It''s a bit cryptic
with
> > bits, though, I will improve that later.
> > https://bitbucket.org/aakef/ql-fstest/downloads
> >
> >
> > Cheers,
> > Bernd
> >
> >
> > PS: But see my other thread, using ql-fstest I yesterday entirely 
> > broke a btrfs test file system resulting in kernel panics.
> 
> Hi,
> 
> Its been a while, but I think I should provide a "definite
anwser" or
> simply what was the cause of whole problem:
> 
> It was a printer!
> 
> Long story short, I was going nuts trying to diagnose which bit of my 
> server is going bad and effectively I was down to blaming a interface 
> card that connects hotswapable disks to mobo / pcie controllers. When 
> I''ve got back from my holiday I''ve sat in front of server
and decided to
> go with ql-fstest which in a very nice way reports errors with a very 
> low lag (~2 minutes) after they occurred. At this point my printer 
> kicked in with "self clean" and error just showed up after ~ two
minutes
> - so I''ve restarted printer and while it was going through
it''s own post
> with self clean another error showed up. Issue here turned out to be 
> that I was using one of those fantastic pci 4 port ethernet cards and 
> printer was directly to it - after moving it and everything else to 
> switch all problem and issues have went away. AT the moment I''m
running
> server for 2 weeks without any corruptions, any random kernel btrfs 
> crashes etc.
Wow, I''ve never heard that one before.  You might want to try a
different 4 port card and/or report it to the driver maintainer.  That
shouldn''t happen ;)

ql-fstest looks neat, I''ll check it out (thanks Bernd).
 
-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roman Mamedov

2013-Feb-05 13:46 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On Tue, 05 Feb 2013 10:16:34 +0000
Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:
> that I was using one of those fantastic pci 4 port ethernet cards and 
> printer was directly to it - after moving it and everything else to 
> switch all problem and issues have went away. AT the moment I''m
running
> server for 2 weeks without any corruptions, any random kernel btrfs 
> crashes etc.
If moving the printer over to a switch helped, perhaps it is indeed an
electrical interference problem, but if your card is an old one from Sun, keep
in mind that they also have some problems with DMA on machines with large
amounts of RAM:

  "sunhme" experiences corrupt packets if machine has more than 2GB of
memory
  https://bugzilla.kernel.org/show_bug.cgi?id=10790

Not hard to envision a horror story scenario where a rogue network card would
shred your filesystem buffer cache with network packets DMAed all over it,
like bullets from a machine gun :) But in reality afaik IOMMU is supposed to
protect against this.

-- 
With respect,
Roman

Tomasz Kusmierz

2013-Feb-05 14:10 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 05/02/13 12:49, Chris Mason wrote:> On Tue, Feb 05, 2013 at 03:16:34AM -0700, Tomasz Kusmierz wrote:
>> On 16/01/13 09:21, Bernd Schubert wrote:
>>> On 01/16/2013 12:32 AM, Tom Kusmierz wrote:
>>>
>>>> p.s. bizzare that when I "fill" ext4 partition with
test data everything
>>>> check''s up OK (crc over all files), but with Chris
tool it gets
>>>> corrupted - for both Adaptec crappy pcie controller and for
mother board
>>>> built in one. Also since courses of history proven that my
testing
>>>> facilities are crap - any suggestion''s on how can I
test ram, cpu &
>>>> controller would be appreciated.
>>> Similar issues had been the reason we wrote ql-fstest at q-leap.
Maybe
>>> you could try that? You can easily see the pattern of the
corruption
>>> with that. But maybe Chris'' stress.sh also provides it.
>>> Anyway, I yesterday added support to specify min and max file size,
as
>>> it before only used 1MiB to 1GiB sizes... It''s a bit
cryptic with
>>> bits, though, I will improve that later.
>>> https://bitbucket.org/aakef/ql-fstest/downloads
>>>
>>>
>>> Cheers,
>>> Bernd
>>>
>>>
>>> PS: But see my other thread, using ql-fstest I yesterday entirely
>>> broke a btrfs test file system resulting in kernel panics.
>> Hi,
>>
>> Its been a while, but I think I should provide a "definite
anwser" or
>> simply what was the cause of whole problem:
>>
>> It was a printer!
>>
>> Long story short, I was going nuts trying to diagnose which bit of my
>> server is going bad and effectively I was down to blaming a interface
>> card that connects hotswapable disks to mobo / pcie controllers. When
>> I''ve got back from my holiday I''ve sat in front of
server and decided to
>> go with ql-fstest which in a very nice way reports errors with a very
>> low lag (~2 minutes) after they occurred. At this point my printer
>> kicked in with "self clean" and error just showed up after ~
two minutes
>> - so I''ve restarted printer and while it was going through
it''s own post
>> with self clean another error showed up. Issue here turned out to be
>> that I was using one of those fantastic pci 4 port ethernet cards and
>> printer was directly to it - after moving it and everything else to
>> switch all problem and issues have went away. AT the moment
I''m running
>> server for 2 weeks without any corruptions, any random kernel btrfs
>> crashes etc.
> Wow, I''ve never heard that one before.  You might want to try a
> different 4 port card and/or report it to the driver maintainer.  That
> shouldn''t happen ;)
>
> ql-fstest looks neat, I''ll check it out (thanks Bernd).
>   
> -chris
>I''ve forgot to mention that server sits on UPS, and printer is directly
connected to mains - when thinking of it, it creates an ground shift 
effect since nothing on cheap PSU got "real" ground. But anyway this
is
not a fault of this 4 port card, I''ve tried moving it to cheap ne2000 
and to motherboard integrated one and effect was the same. Also 
diagnostics was veeery problematic because beside of having a corruption 
on hdd memtest was returning corruptions in ram, but on a very rare 
occation, also a cpu test was returning corruption on 1 / day basis. 
I''ve replaced nearly everything on this server - including psu (to
1400W
from my dev rig) to make NO difference. I should mention as well that 
this printer is a colour laser printer which got 4 drums to clean, so I 
would assume that it produces enough static electricity to power a small 
cattle.

ps. it shouldn''t be an driver issue since errors in ram were 1 - 4 bit 
big located in same 32 bit word - hence i think a single transfer had to 
be corrupt rather than whole eth packet showed into random memory.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Kusmierz

2013-Feb-05 14:18 UTC

head link

Re: btrfs for files > 10GB = random spontaneous CRC failure.

On 05/02/13 13:46, Roman Mamedov wrote:> On Tue, 05 Feb 2013 10:16:34 +0000
> Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:
>
>> that I was using one of those fantastic pci 4 port ethernet cards and
>> printer was directly to it - after moving it and everything else to
>> switch all problem and issues have went away. AT the moment
I''m running
>> server for 2 weeks without any corruptions, any random kernel btrfs
>> crashes etc.
> If moving the printer over to a switch helped, perhaps it is indeed an
> electrical interference problem, but if your card is an old one from Sun,
keep
> in mind that they also have some problems with DMA on machines with large
> amounts of RAM:
>
>    "sunhme" experiences corrupt packets if machine has more than
2GB of memory
>    https://bugzilla.kernel.org/show_bug.cgi?id=10790
>
> Not hard to envision a horror story scenario where a rogue network card
would
> shred your filesystem buffer cache with network packets DMAed all over it,
> like bullets from a machine gun :) But in reality afaik IOMMU is supposed
to
> protect against this.
>As I said in reply to Chris it was definitely and electrical issue. Back 
in the days when cat5 eth was a novelty I''ve learnt hard way a simple 
lesson - don''t be skimp, always separate with switch. I''ve
learnt it on
networks where parties were not necessary powered from same circuit or 
even supply phase. Since this setup is limited to my home I''ve violated
my own old rule - and it back fired on me.

Anyway thanks for info on "sunhme" - WOW ....
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jan 2013 - btrfs for files > 10GB = random spontaneous CRC failure.

btrfs for files > 10GB = random spontaneous CRC failure.

btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.

Re: btrfs for files > 10GB = random spontaneous CRC failure.