Tomasz Kusmierz
2013-Jan-14 11:09 UTC
btrfs for files > 10GB = random spontaneous CRC failure.
Hi, Since I had some free time over Christmas, I decided to conduct few tests over btrFS to se how it will cope with "real life storage" for normal "gray users" and I''ve found that filesystem will always mess up your files that are larger than 10GB. Long story: I''ve used my set of data that I''ve got nicelly backed up on personal raid 5 to populate btrfs volumes: music, slr pics and video (an just a few document). Disks used in test are all "green" 2TB disks from WD. 1. First I started with creating btrfs (4k blocks) on one disk, filling it up and then adding second disk -> convert to raid1 through balance -> convert to raid10 trough balance. Unfortunately converting to raid1 failed - because of CRC error in 49 files that vere bigger > 10GB. At this point I was a bit spooked up that my controllers are failing or that drives got some bad sectors. Tested everything (took few days) and it turns out that there is no "apparent" issue with hardware (bad sectors or io down to disks). 2. At this point I thought "cool this will be a perfect test case for scrub to show it''s magical power!". Created raid1 over two volumes -> try scrubbing -> FAIL ... It turns out that magically I''ve got corrupted CRC in two exactly same logical locations (~34 files > 10GB affected). 3. Performed same test on raid10 setup (still 4k block). Same results (just diffrent file count). Ok, time to dig more into this because it starts get intriguing. I''m running ubuntu server 12.10 with stock kernel, so my next step was to get 3.7.1 kernel + new btrfs tool straight from git repo. Unfortunatelly 1 & 2 & 3 still provides same results, corrupt CRC only in files > 10GB. At this point I thought "fine maybe when I''ll expand allocation block - it will make less block needed for big file to fit in resulting in propperly storing those" -> time for 16K leafs :) (-n 16K -l 16K) sectors are still 4K for known reasons :P. Well, it does exactly the same thing -> 1 & 2 & 3 same results, big files get automagically corrupt. Something about test data: music - not more than 200MB files (tipical mix of mp3 & aac) 10 K files give or take. pics - not more than 20MB (typical point & shot + dslr) 6K files give or take. video1 - collection of little ones with size more than 300MB, less than 1.5GB ~ 400 files video2 - collection of 5GB - 18GB files ~400 files I guess that stating that "files >10GB" are only affected is a long shot, but so far I''ve not seen file less than 10GB affected (I was not really thorough about checking size, but all files that size I''ve checked were more than 10GB) ps. As a footnote I''ll add that I''ve tried shuffling test 1, 2 & 3 without video2 and it all work just fine. If you''ve got any ideas for work around ( other than zfs :D ) I''m happy to try it out. Tom. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Kusmierz
2013-Jan-14 11:17 UTC
btrfs for files > 10GB = random spontaneous CRC failure.
Hi, Since I had some free time over Christmas, I decided to conduct few tests over btrFS to se how it will cope with "real life storage" for normal "gray users" and I''ve found that filesystem will always mess up your files that are larger than 10GB. Long story: I''ve used my set of data that I''ve got nicelly backed up on personal raid 5 to populate btrfs volumes: music, slr pics and video (an just a few document). Disks used in test are all "green" 2TB disks from WD. 1. First I started with creating btrfs (4k blocks) on one disk, filling it up and then adding second disk -> convert to raid1 through balance -> convert to raid10 trough balance. Unfortunately converting to raid1 failed - because of CRC error in 49 files that vere bigger > 10GB. At this point I was a bit spooked up that my controllers are failing or that drives got some bad sectors. Tested everything (took few days) and it turns out that there is no "apparent" issue with hardware (bad sectors or io down to disks). 2. At this point I thought "cool this will be a perfect test case for scrub to show it''s magical power!". Created raid1 over two volumes -> try scrubbing -> FAIL ... It turns out that magically I''ve got corrupted CRC in two exactly same logical locations on two different disks (~34 files > 10GB affected) hence scrub can''t do anything with it. It only reports it as "uncorrectable errors" 3. Performed same test on raid10 setup (still 4k block). Same results (just different file count). Ok, time to dig more into this because it starts get intriguing. I''m running ubuntu server 12.10 (64bit) with stock kernel, so my next step was to get 3.7.1 kernel + new btrfs tool straight from git repo. Unfortunatelly 1 & 2 & 3 still provides same results, corrupt CRC only in files > 10GB. At this point I thought "fine maybe when I''ll expand allocation block - it will make less block needed for big file to fit in resulting in propperly storing those" -> time for 16K leafs :) (-n 16K -l 16K) sectors are still 4K for known reasons :P. Well, it does exactly the same thing -> 1 & 2 & 3 same results, big files get automagically corrupt. Something about test data: music - not more than 200MB files (tipical mix of mp3 & aac) 10 K files give or take. pics - not more than 20MB (typical point & shot + dslr) 6K files give or take. video1 - collection of little ones with size more than 300MB, less than 1.5GB ~ 400 files video2 - collection of 5GB - 18GB files ~400 files I guess that stating that "files >10GB" are only affected is a long shot, but so far I''ve not seen file less than 10GB affected (I was not really thorough about checking size, but all files that size I''ve checked were more than 10GB) ps. As a footnote I''ll add that I''ve tried shuffling test 1, 2 & 3 without video2 and it all work just fine. If you''ve got any ideas for work around ( other than zfs :D ) I''m happy to try it out. Tom. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Roman Mamedov
2013-Jan-14 11:25 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
Hello, On Mon, 14 Jan 2013 11:17:17 +0000 Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:> this point I was a bit spooked up that my controllers are failing orWhich controller manufacturer/model? -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free."
Tomasz Kusmierz
2013-Jan-14 11:43 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 14/01/13 11:25, Roman Mamedov wrote:> Hello, > > On Mon, 14 Jan 2013 11:17:17 +0000 > Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote: > >> this point I was a bit spooked up that my controllers are failing or > Which controller manufacturer/model? >Well, this is a "home server" (which I preffer to tinker on). Two controllers were used, mother board build in, and crappy Adaptec pcie one. 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] 02:00.0 RAID bus controller: Adaptec Serial ATA II RAID 1430SA (rev 02) ps. MoBo is: ASUS M4A79T Deluxe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2013-Jan-14 14:59 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote:> Hi, > > Since I had some free time over Christmas, I decided to conduct few > tests over btrFS to se how it will cope with "real life storage" for > normal "gray users" and I''ve found that filesystem will always mess up > your files that are larger than 10GB.Hi Tom, I''d like to nail down the test case a little better. 1) Create on one drive, fill with data 2) Add a second drive, convert to raid1 3) find corruptions? What happens if you start with two drives in raid1? In other words, I''m trying to see if this is a problem with the conversion code. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Kusmierz
2013-Jan-14 15:22 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 14/01/13 14:59, Chris Mason wrote:> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote: >> Hi, >> >> Since I had some free time over Christmas, I decided to conduct few >> tests over btrFS to se how it will cope with "real life storage" for >> normal "gray users" and I''ve found that filesystem will always mess up >> your files that are larger than 10GB. > Hi Tom, > > I''d like to nail down the test case a little better. > > 1) Create on one drive, fill with data > 2) Add a second drive, convert to raid1 > 3) find corruptions? > > What happens if you start with two drives in raid1? In other words, I''m > trying to see if this is a problem with the conversion code. > > -chrisOk, my description might be a bit enigmatic so to cut long story short tests are: 1) create a single drive default btrfs volume on single partition -> fill with test data -> scrub -> admire errors. 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on separate disk, each same size etc. -> fill with test data -> scrub -> admire errors. 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on separate disk, each same size etc. -> fill with test data -> scrub -> admire errors. all disks are same age + size + model ... two different batches to avoid same time failure. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2013-Jan-14 15:57 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote:> On 14/01/13 14:59, Chris Mason wrote: > > On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote: > >> Hi, > >> > >> Since I had some free time over Christmas, I decided to conduct few > >> tests over btrFS to se how it will cope with "real life storage" for > >> normal "gray users" and I''ve found that filesystem will always mess up > >> your files that are larger than 10GB. > > Hi Tom, > > > > I''d like to nail down the test case a little better. > > > > 1) Create on one drive, fill with data > > 2) Add a second drive, convert to raid1 > > 3) find corruptions? > > > > What happens if you start with two drives in raid1? In other words, I''m > > trying to see if this is a problem with the conversion code. > > > > -chris > Ok, my description might be a bit enigmatic so to cut long story short > tests are: > 1) create a single drive default btrfs volume on single partition -> > fill with test data -> scrub -> admire errors. > 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on > separate disk, each same size etc. -> fill with test data -> scrub -> > admire errors. > 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on > separate disk, each same size etc. -> fill with test data -> scrub -> > admire errors. > > all disks are same age + size + model ... two different batches to avoid > same time failure.Ok, so we have two possible causes. #1 btrfs is writing garbage to your disks. #2 something in your kernel is corrupting your data. Since you''re able to see this 100% of the time, lets assume that if #2 were true, we''d be able to trigger it on other filesystems. So, I''ve attached an old friend, stress.sh. Use it like this: stress.sh -n 5 -c <your source directory> -s <your btrfs mount point> It will run in a loop with 5 parallel processes and make 5 copies of your data set into the destination. It will run forever until there are errors. You can use a higher process count (-n) to force more concurrency and use more ram. It may help to pin down all but 2 or 3 GB of your memory. What I''d like you to do is find a data set and command line that make the script find errors on btrfs. Then, try the same thing on xfs or ext4 and let it run at least twice as long. Then report back ;) -chris
Roman Mamedov
2013-Jan-14 16:20 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On Mon, 14 Jan 2013 15:22:36 +0000 Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:> 1) create a single drive default btrfs volume on single partition -> > fill with test data -> scrub -> admire errors.Did you try ruling out btrfs as the cause of the problem? Maybe something else in your system is corrupting data, and btrfs just lets you know about that. I.e. on the same drive, create an Ext4 filesystem, copy some data to it which has known checksums (use md5sum or cfv to generate them in advance for data that is on another drive and is waiting to be copied); copy to that drive, flush caches, verify checksums of files at the destination. -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free."
Tomasz Kusmierz
2013-Jan-14 16:32 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 14/01/13 15:57, Chris Mason wrote:> On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote: >> On 14/01/13 14:59, Chris Mason wrote: >>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote: >>>> Hi, >>>> >>>> Since I had some free time over Christmas, I decided to conduct few >>>> tests over btrFS to se how it will cope with "real life storage" for >>>> normal "gray users" and I''ve found that filesystem will always mess up >>>> your files that are larger than 10GB. >>> Hi Tom, >>> >>> I''d like to nail down the test case a little better. >>> >>> 1) Create on one drive, fill with data >>> 2) Add a second drive, convert to raid1 >>> 3) find corruptions? >>> >>> What happens if you start with two drives in raid1? In other words, I''m >>> trying to see if this is a problem with the conversion code. >>> >>> -chris >> Ok, my description might be a bit enigmatic so to cut long story short >> tests are: >> 1) create a single drive default btrfs volume on single partition -> >> fill with test data -> scrub -> admire errors. >> 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on >> separate disk, each same size etc. -> fill with test data -> scrub -> >> admire errors. >> 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on >> separate disk, each same size etc. -> fill with test data -> scrub -> >> admire errors. >> >> all disks are same age + size + model ... two different batches to avoid >> same time failure. > Ok, so we have two possible causes. #1 btrfs is writing garbage to your > disks. #2 something in your kernel is corrupting your data. > > Since you''re able to see this 100% of the time, lets assume that if #2 > were true, we''d be able to trigger it on other filesystems. > > So, I''ve attached an old friend, stress.sh. Use it like this: > > stress.sh -n 5 -c <your source directory> -s <your btrfs mount point> > > It will run in a loop with 5 parallel processes and make 5 copies of > your data set into the destination. It will run forever until there are > errors. You can use a higher process count (-n) to force more > concurrency and use more ram. It may help to pin down all but 2 or 3 GB > of your memory. > > What I''d like you to do is find a data set and command line that make > the script find errors on btrfs. Then, try the same thing on xfs or > ext4 and let it run at least twice as long. Then report back ;) > > -chris >Chris, Will do, just please be remember that 2TB of test data on "customer grade" sata drives will take a while to test :) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Kusmierz
2013-Jan-14 16:34 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 14/01/13 16:20, Roman Mamedov wrote:> On Mon, 14 Jan 2013 15:22:36 +0000 > Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote: > >> 1) create a single drive default btrfs volume on single partition -> >> fill with test data -> scrub -> admire errors. > Did you try ruling out btrfs as the cause of the problem? Maybe something else > in your system is corrupting data, and btrfs just lets you know about that. > > I.e. on the same drive, create an Ext4 filesystem, copy some data to it which > has known checksums (use md5sum or cfv to generate them in advance for data > that is on another drive and is waiting to be copied); copy to that drive, > flush caches, verify checksums of files at the destination. >Hi Roman, Chris just provided his good old friend "stress.sh" that should do that. So I''ll dive into more testing :) Tom. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2013-Jan-14 16:34 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote:> On 14/01/13 15:57, Chris Mason wrote: > > On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote: > >> On 14/01/13 14:59, Chris Mason wrote: > >>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote: > >>>> Hi, > >>>> > >>>> Since I had some free time over Christmas, I decided to conduct few > >>>> tests over btrFS to se how it will cope with "real life storage" for > >>>> normal "gray users" and I''ve found that filesystem will always mess up > >>>> your files that are larger than 10GB. > >>> Hi Tom, > >>> > >>> I''d like to nail down the test case a little better. > >>> > >>> 1) Create on one drive, fill with data > >>> 2) Add a second drive, convert to raid1 > >>> 3) find corruptions? > >>> > >>> What happens if you start with two drives in raid1? In other words, I''m > >>> trying to see if this is a problem with the conversion code. > >>> > >>> -chris > >> Ok, my description might be a bit enigmatic so to cut long story short > >> tests are: > >> 1) create a single drive default btrfs volume on single partition -> > >> fill with test data -> scrub -> admire errors. > >> 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on > >> separate disk, each same size etc. -> fill with test data -> scrub -> > >> admire errors. > >> 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on > >> separate disk, each same size etc. -> fill with test data -> scrub -> > >> admire errors. > >> > >> all disks are same age + size + model ... two different batches to avoid > >> same time failure. > > Ok, so we have two possible causes. #1 btrfs is writing garbage to your > > disks. #2 something in your kernel is corrupting your data. > > > > Since you''re able to see this 100% of the time, lets assume that if #2 > > were true, we''d be able to trigger it on other filesystems. > > > > So, I''ve attached an old friend, stress.sh. Use it like this: > > > > stress.sh -n 5 -c <your source directory> -s <your btrfs mount point> > > > > It will run in a loop with 5 parallel processes and make 5 copies of > > your data set into the destination. It will run forever until there are > > errors. You can use a higher process count (-n) to force more > > concurrency and use more ram. It may help to pin down all but 2 or 3 GB > > of your memory. > > > > What I''d like you to do is find a data set and command line that make > > the script find errors on btrfs. Then, try the same thing on xfs or > > ext4 and let it run at least twice as long. Then report back ;) > > > > -chris > > > Chris, > > Will do, just please be remember that 2TB of test data on "customer > grade" sata drives will take a while to test :)Many thanks. You might want to start with a smaller data set, 20GB or so total. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lars Weber
2013-Jan-15 16:54 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
Hi, i had a similar scenario like Tomasz: - Started with single 3TB Disk. - Filled the 3TB Disk with a lot of files (more than 30 with 10-30GB) - Added 2x 1,5TB Disks - btrfs balance start dconvert=raid1 mconvert=raid1 $MOUNT - # btrfs scrub start $MOUNT - # btrfs scrub status $MOUNT scrub status for $ID scrub started at Tue Jan 15 07:10:15 2013 and finished after 24020 seconds total bytes scrubbed: 4.30TB with 0 errors so at least it is no general bug in btrfs - maybe this helps you... # uname -a Linux n40l 3.7.2 #1 SMP Sun Jan 13 11:46:56 CET 2013 x86_64 GNU/Linux # btrfs version Btrfs v0.20-rc1-37-g91d9ee Regards Lars Am 14.01.2013 17:34, schrieb Chris Mason:> On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote: >> On 14/01/13 15:57, Chris Mason wrote: >>> On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote: >>>> On 14/01/13 14:59, Chris Mason wrote: >>>>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote: >>>>>> Hi, >>>>>> >>>>>> Since I had some free time over Christmas, I decided to conduct few >>>>>> tests over btrFS to se how it will cope with "real life storage" for >>>>>> normal "gray users" and I''ve found that filesystem will always mess up >>>>>> your files that are larger than 10GB. >>>>> Hi Tom, >>>>> >>>>> I''d like to nail down the test case a little better. >>>>> >>>>> 1) Create on one drive, fill with data >>>>> 2) Add a second drive, convert to raid1 >>>>> 3) find corruptions? >>>>> >>>>> What happens if you start with two drives in raid1? In other words, I''m >>>>> trying to see if this is a problem with the conversion code. >>>>> >>>>> -chris >>>> Ok, my description might be a bit enigmatic so to cut long story short >>>> tests are: >>>> 1) create a single drive default btrfs volume on single partition -> >>>> fill with test data -> scrub -> admire errors. >>>> 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on >>>> separate disk, each same size etc. -> fill with test data -> scrub -> >>>> admire errors. >>>> 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on >>>> separate disk, each same size etc. -> fill with test data -> scrub -> >>>> admire errors. >>>> >>>> all disks are same age + size + model ... two different batches to avoid >>>> same time failure. >>> Ok, so we have two possible causes. #1 btrfs is writing garbage to your >>> disks. #2 something in your kernel is corrupting your data. >>> >>> Since you''re able to see this 100% of the time, lets assume that if #2 >>> were true, we''d be able to trigger it on other filesystems. >>> >>> So, I''ve attached an old friend, stress.sh. Use it like this: >>> >>> stress.sh -n 5 -c <your source directory> -s <your btrfs mount point> >>> >>> It will run in a loop with 5 parallel processes and make 5 copies of >>> your data set into the destination. It will run forever until there are >>> errors. You can use a higher process count (-n) to force more >>> concurrency and use more ram. It may help to pin down all but 2 or 3 GB >>> of your memory. >>> >>> What I''d like you to do is find a data set and command line that make >>> the script find errors on btrfs. Then, try the same thing on xfs or >>> ext4 and let it run at least twice as long. Then report back ;) >>> >>> -chris >>> >> Chris, >> >> Will do, just please be remember that 2TB of test data on "customer >> grade" sata drives will take a while to test :) > Many thanks. You might want to start with a smaller data set, 20GB or > so total. > > -chris > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- ADC-Ingenieurbüro Wiedemann | In der Borngasse 12 | 57520 Friedewald | Tel: 02743-930233 | Fax: 02743-930235 | www.adc-wiedemann.de GF: Dipl.-Ing. Hendrik Wiedemann | Umsatzsteuer-ID: DE 147979431 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tom Kusmierz
2013-Jan-15 23:32 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 14/01/13 16:34, Chris Mason wrote:> On Mon, Jan 14, 2013 at 09:32:25AM -0700, Tomasz Kusmierz wrote: >> On 14/01/13 15:57, Chris Mason wrote: >>> On Mon, Jan 14, 2013 at 08:22:36AM -0700, Tomasz Kusmierz wrote: >>>> On 14/01/13 14:59, Chris Mason wrote: >>>>> On Mon, Jan 14, 2013 at 04:09:47AM -0700, Tomasz Kusmierz wrote: >>>>>> Hi, >>>>>> >>>>>> Since I had some free time over Christmas, I decided to conduct few >>>>>> tests over btrFS to se how it will cope with "real life storage" for >>>>>> normal "gray users" and I''ve found that filesystem will always mess up >>>>>> your files that are larger than 10GB. >>>>> Hi Tom, >>>>> >>>>> I''d like to nail down the test case a little better. >>>>> >>>>> 1) Create on one drive, fill with data >>>>> 2) Add a second drive, convert to raid1 >>>>> 3) find corruptions? >>>>> >>>>> What happens if you start with two drives in raid1? In other words, I''m >>>>> trying to see if this is a problem with the conversion code. >>>>> >>>>> -chris >>>> Ok, my description might be a bit enigmatic so to cut long story short >>>> tests are: >>>> 1) create a single drive default btrfs volume on single partition -> >>>> fill with test data -> scrub -> admire errors. >>>> 2) create a raid1 (-d raid1 -m raid1) volume with two partitions on >>>> separate disk, each same size etc. -> fill with test data -> scrub -> >>>> admire errors. >>>> 3) create a raid10 (-d raid10 -m raid1) volume with four partitions on >>>> separate disk, each same size etc. -> fill with test data -> scrub -> >>>> admire errors. >>>> >>>> all disks are same age + size + model ... two different batches to avoid >>>> same time failure. >>> Ok, so we have two possible causes. #1 btrfs is writing garbage to your >>> disks. #2 something in your kernel is corrupting your data. >>> >>> Since you''re able to see this 100% of the time, lets assume that if #2 >>> were true, we''d be able to trigger it on other filesystems. >>> >>> So, I''ve attached an old friend, stress.sh. Use it like this: >>> >>> stress.sh -n 5 -c <your source directory> -s <your btrfs mount point> >>> >>> It will run in a loop with 5 parallel processes and make 5 copies of >>> your data set into the destination. It will run forever until there are >>> errors. You can use a higher process count (-n) to force more >>> concurrency and use more ram. It may help to pin down all but 2 or 3 GB >>> of your memory. >>> >>> What I''d like you to do is find a data set and command line that make >>> the script find errors on btrfs. Then, try the same thing on xfs or >>> ext4 and let it run at least twice as long. Then report back ;) >>> >>> -chris >>> >> Chris, >> >> Will do, just please be remember that 2TB of test data on "customer >> grade" sata drives will take a while to test :) > Many thanks. You might want to start with a smaller data set, 20GB or > so total. > > -chris >Chris & all, Sorry for not replying for that long but Chris old friend "stress.sh" have proven that all my storage is affected with this bug and first thing was to bring everything down before corruptions will spread any further. Anyway for subject sake btrfs stress have failed after 2h, ext4 stress have failed after 8h (according to "time ./stress.sh blablabla" ) - so it might be related to that ext4 always seamed slower on my machine than btrfs. Anyway I wanted to use this opportunity to thank Chris and everybody related to btrfs development - your file system found a hidden bug in my set up that would be there until it would pretty much corrupt everything. I don''t even want to think how much my main storage got corrupted over time (etx4 over lvm over md raid 5). p.s. bizzare that when I "fill" ext4 partition with test data everything check''s up OK (crc over all files), but with Chris tool it gets corrupted - for both Adaptec crappy pcie controller and for mother board built in one. Also since courses of history proven that my testing facilities are crap - any suggestion''s on how can I test ram, cpu & controller would be appreciated. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2013-Jan-15 23:44 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On Tue, Jan 15, 2013 at 04:32:10PM -0700, Tom Kusmierz wrote:> Chris & all, > > Sorry for not replying for that long but Chris old friend "stress.sh" > have proven that all my storage is affected with this bug and first > thing was to bring everything down before corruptions will spread any > further. Anyway for subject sake btrfs stress have failed after 2h, ext4 > stress have failed after 8h (according to "time ./stress.sh blablabla" ) > - so it might be related to that ext4 always seamed slower on my machine > than btrfs.Ok, great. These problems are really hard to debug, and I''m glad we''ve nailed it down to the lower layers.> > > Anyway I wanted to use this opportunity to thank Chris and everybody > related to btrfs development - your file system found a hidden bug in my > set up that would be there until it would pretty much corrupt > everything. I don''t even want to think how much my main storage got > corrupted over time (etx4 over lvm over md raid 5). > > p.s. bizzare that when I "fill" ext4 partition with test data everything > check''s up OK (crc over all files), but with Chris tool it gets > corrupted - for both Adaptec crappy pcie controller and for mother board > built in one.One really hard part of tracking down corruptions is that our boxes have so much ram right now that they are often hidden by the page cache. My first advice is to boot with much less ram (1G/2G) or pin down all your ram for testing. A problem that triggers in 10 minutes is a billion times easier to figure out than one that triggers in 8 hours.> Also since courses of history proven that my testing > facilities are crap - any suggestion''s on how can I test ram, cpu & > controller would be appreciated.Step one is to figure out if you''ve got a CPU/memory problem or an IO problem. memtest is often able to find CPU and memory problems, but if you pass memtest I like to use gcc for extra hard testing. If you have the ram, make a copy of the linux kernel tree in /dev/shm or any ramdisk/tmpfs mount. Then run make -j ; make clean in a loop until your box either crashes, gcc reports an internal compiler error, or 16 hours go by. Your loop will need to check for failed makes and stop once you get the first failure. Hopefully that will catch it. Otherwise we need to look at the IO stack. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bernd Schubert
2013-Jan-16 09:21 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 01/16/2013 12:32 AM, Tom Kusmierz wrote:> p.s. bizzare that when I "fill" ext4 partition with test data everything > check''s up OK (crc over all files), but with Chris tool it gets > corrupted - for both Adaptec crappy pcie controller and for mother board > built in one. Also since courses of history proven that my testing > facilities are crap - any suggestion''s on how can I test ram, cpu & > controller would be appreciated.Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe you could try that? You can easily see the pattern of the corruption with that. But maybe Chris'' stress.sh also provides it. Anyway, I yesterday added support to specify min and max file size, as it before only used 1MiB to 1GiB sizes... It''s a bit cryptic with bits, though, I will improve that later. https://bitbucket.org/aakef/ql-fstest/downloads Cheers, Bernd PS: But see my other thread, using ql-fstest I yesterday entirely broke a btrfs test file system resulting in kernel panics. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Kusmierz
2013-Feb-05 10:16 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 16/01/13 09:21, Bernd Schubert wrote:> On 01/16/2013 12:32 AM, Tom Kusmierz wrote: > >> p.s. bizzare that when I "fill" ext4 partition with test data everything >> check''s up OK (crc over all files), but with Chris tool it gets >> corrupted - for both Adaptec crappy pcie controller and for mother board >> built in one. Also since courses of history proven that my testing >> facilities are crap - any suggestion''s on how can I test ram, cpu & >> controller would be appreciated. > > Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe > you could try that? You can easily see the pattern of the corruption > with that. But maybe Chris'' stress.sh also provides it. > Anyway, I yesterday added support to specify min and max file size, as > it before only used 1MiB to 1GiB sizes... It''s a bit cryptic with > bits, though, I will improve that later. > https://bitbucket.org/aakef/ql-fstest/downloads > > > Cheers, > Bernd > > > PS: But see my other thread, using ql-fstest I yesterday entirely > broke a btrfs test file system resulting in kernel panics.Hi, Its been a while, but I think I should provide a "definite anwser" or simply what was the cause of whole problem: It was a printer! Long story short, I was going nuts trying to diagnose which bit of my server is going bad and effectively I was down to blaming a interface card that connects hotswapable disks to mobo / pcie controllers. When I''ve got back from my holiday I''ve sat in front of server and decided to go with ql-fstest which in a very nice way reports errors with a very low lag (~2 minutes) after they occurred. At this point my printer kicked in with "self clean" and error just showed up after ~ two minutes - so I''ve restarted printer and while it was going through it''s own post with self clean another error showed up. Issue here turned out to be that I was using one of those fantastic pci 4 port ethernet cards and printer was directly to it - after moving it and everything else to switch all problem and issues have went away. AT the moment I''m running server for 2 weeks without any corruptions, any random kernel btrfs crashes etc. Anyway I wanted to thank again to Chris and rest of btrFS dev people for this fantastic filesystem that let me discover how stupid setup I was running and how deep into shiet I''ve put my self. CHEERS LADS ! Tom. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2013-Feb-05 12:49 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On Tue, Feb 05, 2013 at 03:16:34AM -0700, Tomasz Kusmierz wrote:> On 16/01/13 09:21, Bernd Schubert wrote: > > On 01/16/2013 12:32 AM, Tom Kusmierz wrote: > > > >> p.s. bizzare that when I "fill" ext4 partition with test data everything > >> check''s up OK (crc over all files), but with Chris tool it gets > >> corrupted - for both Adaptec crappy pcie controller and for mother board > >> built in one. Also since courses of history proven that my testing > >> facilities are crap - any suggestion''s on how can I test ram, cpu & > >> controller would be appreciated. > > > > Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe > > you could try that? You can easily see the pattern of the corruption > > with that. But maybe Chris'' stress.sh also provides it. > > Anyway, I yesterday added support to specify min and max file size, as > > it before only used 1MiB to 1GiB sizes... It''s a bit cryptic with > > bits, though, I will improve that later. > > https://bitbucket.org/aakef/ql-fstest/downloads > > > > > > Cheers, > > Bernd > > > > > > PS: But see my other thread, using ql-fstest I yesterday entirely > > broke a btrfs test file system resulting in kernel panics. > > Hi, > > Its been a while, but I think I should provide a "definite anwser" or > simply what was the cause of whole problem: > > It was a printer! > > Long story short, I was going nuts trying to diagnose which bit of my > server is going bad and effectively I was down to blaming a interface > card that connects hotswapable disks to mobo / pcie controllers. When > I''ve got back from my holiday I''ve sat in front of server and decided to > go with ql-fstest which in a very nice way reports errors with a very > low lag (~2 minutes) after they occurred. At this point my printer > kicked in with "self clean" and error just showed up after ~ two minutes > - so I''ve restarted printer and while it was going through it''s own post > with self clean another error showed up. Issue here turned out to be > that I was using one of those fantastic pci 4 port ethernet cards and > printer was directly to it - after moving it and everything else to > switch all problem and issues have went away. AT the moment I''m running > server for 2 weeks without any corruptions, any random kernel btrfs > crashes etc.Wow, I''ve never heard that one before. You might want to try a different 4 port card and/or report it to the driver maintainer. That shouldn''t happen ;) ql-fstest looks neat, I''ll check it out (thanks Bernd). -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Roman Mamedov
2013-Feb-05 13:46 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On Tue, 05 Feb 2013 10:16:34 +0000 Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote:> that I was using one of those fantastic pci 4 port ethernet cards and > printer was directly to it - after moving it and everything else to > switch all problem and issues have went away. AT the moment I''m running > server for 2 weeks without any corruptions, any random kernel btrfs > crashes etc.If moving the printer over to a switch helped, perhaps it is indeed an electrical interference problem, but if your card is an old one from Sun, keep in mind that they also have some problems with DMA on machines with large amounts of RAM: "sunhme" experiences corrupt packets if machine has more than 2GB of memory https://bugzilla.kernel.org/show_bug.cgi?id=10790 Not hard to envision a horror story scenario where a rogue network card would shred your filesystem buffer cache with network packets DMAed all over it, like bullets from a machine gun :) But in reality afaik IOMMU is supposed to protect against this. -- With respect, Roman
Tomasz Kusmierz
2013-Feb-05 14:10 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 05/02/13 12:49, Chris Mason wrote:> On Tue, Feb 05, 2013 at 03:16:34AM -0700, Tomasz Kusmierz wrote: >> On 16/01/13 09:21, Bernd Schubert wrote: >>> On 01/16/2013 12:32 AM, Tom Kusmierz wrote: >>> >>>> p.s. bizzare that when I "fill" ext4 partition with test data everything >>>> check''s up OK (crc over all files), but with Chris tool it gets >>>> corrupted - for both Adaptec crappy pcie controller and for mother board >>>> built in one. Also since courses of history proven that my testing >>>> facilities are crap - any suggestion''s on how can I test ram, cpu & >>>> controller would be appreciated. >>> Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe >>> you could try that? You can easily see the pattern of the corruption >>> with that. But maybe Chris'' stress.sh also provides it. >>> Anyway, I yesterday added support to specify min and max file size, as >>> it before only used 1MiB to 1GiB sizes... It''s a bit cryptic with >>> bits, though, I will improve that later. >>> https://bitbucket.org/aakef/ql-fstest/downloads >>> >>> >>> Cheers, >>> Bernd >>> >>> >>> PS: But see my other thread, using ql-fstest I yesterday entirely >>> broke a btrfs test file system resulting in kernel panics. >> Hi, >> >> Its been a while, but I think I should provide a "definite anwser" or >> simply what was the cause of whole problem: >> >> It was a printer! >> >> Long story short, I was going nuts trying to diagnose which bit of my >> server is going bad and effectively I was down to blaming a interface >> card that connects hotswapable disks to mobo / pcie controllers. When >> I''ve got back from my holiday I''ve sat in front of server and decided to >> go with ql-fstest which in a very nice way reports errors with a very >> low lag (~2 minutes) after they occurred. At this point my printer >> kicked in with "self clean" and error just showed up after ~ two minutes >> - so I''ve restarted printer and while it was going through it''s own post >> with self clean another error showed up. Issue here turned out to be >> that I was using one of those fantastic pci 4 port ethernet cards and >> printer was directly to it - after moving it and everything else to >> switch all problem and issues have went away. AT the moment I''m running >> server for 2 weeks without any corruptions, any random kernel btrfs >> crashes etc. > Wow, I''ve never heard that one before. You might want to try a > different 4 port card and/or report it to the driver maintainer. That > shouldn''t happen ;) > > ql-fstest looks neat, I''ll check it out (thanks Bernd). > > -chris >I''ve forgot to mention that server sits on UPS, and printer is directly connected to mains - when thinking of it, it creates an ground shift effect since nothing on cheap PSU got "real" ground. But anyway this is not a fault of this 4 port card, I''ve tried moving it to cheap ne2000 and to motherboard integrated one and effect was the same. Also diagnostics was veeery problematic because beside of having a corruption on hdd memtest was returning corruptions in ram, but on a very rare occation, also a cpu test was returning corruption on 1 / day basis. I''ve replaced nearly everything on this server - including psu (to 1400W from my dev rig) to make NO difference. I should mention as well that this printer is a colour laser printer which got 4 drums to clean, so I would assume that it produces enough static electricity to power a small cattle. ps. it shouldn''t be an driver issue since errors in ram were 1 - 4 bit big located in same 32 bit word - hence i think a single transfer had to be corrupt rather than whole eth packet showed into random memory. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Kusmierz
2013-Feb-05 14:18 UTC
Re: btrfs for files > 10GB = random spontaneous CRC failure.
On 05/02/13 13:46, Roman Mamedov wrote:> On Tue, 05 Feb 2013 10:16:34 +0000 > Tomasz Kusmierz <tom.kusmierz@gmail.com> wrote: > >> that I was using one of those fantastic pci 4 port ethernet cards and >> printer was directly to it - after moving it and everything else to >> switch all problem and issues have went away. AT the moment I''m running >> server for 2 weeks without any corruptions, any random kernel btrfs >> crashes etc. > If moving the printer over to a switch helped, perhaps it is indeed an > electrical interference problem, but if your card is an old one from Sun, keep > in mind that they also have some problems with DMA on machines with large > amounts of RAM: > > "sunhme" experiences corrupt packets if machine has more than 2GB of memory > https://bugzilla.kernel.org/show_bug.cgi?id=10790 > > Not hard to envision a horror story scenario where a rogue network card would > shred your filesystem buffer cache with network packets DMAed all over it, > like bullets from a machine gun :) But in reality afaik IOMMU is supposed to > protect against this. >As I said in reply to Chris it was definitely and electrical issue. Back in the days when cat5 eth was a novelty I''ve learnt hard way a simple lesson - don''t be skimp, always separate with switch. I''ve learnt it on networks where parties were not necessary powered from same circuit or even supply phase. Since this setup is limited to my home I''ve violated my own old rule - and it back fired on me. Anyway thanks for info on "sunhme" - WOW .... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html