I noticed a WARN_ON going off when adding csums because we were going over the amount of csum bytes that should have been allowed for an ordered extent. This is a leftover from when we used to hold the csums privately for direct io, but now we use the normal ordered sum stuff so we need to make sure and check if we''ve moved on to another extent so that the csums are added to the right extent. Without this we could end up with csums for bytenrs that don''t have extents to cover them yet. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> --- fs/btrfs/file-item.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 93ddc2e..f21b490 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -460,8 +460,8 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode, if (!contig) offset = page_offset(bvec->bv_page) + bvec->bv_offset; - if (!contig && (offset >= ordered->file_offset + ordered->len || - offset < ordered->file_offset)) { + if (offset >= ordered->file_offset + ordered->len || + offset < ordered->file_offset) { unsigned long bytes_left; sums->len = this_sum_bytes; this_sum_bytes = 0; -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Jan-23 11:01 UTC
Re: [PATCH] Btrfs: put csums on the right ordered extent
On Tue, Jan 22, 2013 at 03:45:35PM -0500, Josef Bacik wrote:> I noticed a WARN_ON going off when adding csums because we were going over > the amount of csum bytes that should have been allowed for an ordered > extent. This is a leftover from when we used to hold the csums privately > for direct io, but now we use the normal ordered sum stuff so we need to > make sure and check if we''ve moved on to another extent so that the csums > are added to the right extent. Without this we could end up with csums for > bytenrs that don''t have extents to cover them yet. Thanks, > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>Survived a few hours of Chris'' test and I''m running the full xfstests again now. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Jan-23 15:21 UTC
Re: [PATCH] Btrfs: put csums on the right ordered extent
On Wed, Jan 23, 2013 at 12:01:08PM +0100, David Sterba wrote:> Survived a few hours of Chris'' test and I''m running the full xfstests > again now.After 4.5 hours of the same testsuite as before, this popped up in syslog: [53489.998888] btrfs csum failed ino 63793 off 106496 csum 2411266714 private 3462448526 the file is $TEST/junk . Running md5sum on that file produces a different checksum error messages: [54187.259285] btrfs csum failed ino 63793 off 344064 csum 2566472073 private 1022717503 (3 times) I''ve rerun chris'' script again, no errors, and rechecked if I''m running the kernel, yes. relevant part of testlog: 091 41s ... [15:49:07] [15:49:08] [failed, exit status 1] - output mismatch (see 091.out.bad) --- 091.out 2011-11-01 10:31:12.000000000 +0100 +++ 091.out.bad 2013-01-23 15:49:08.000000000 +0100 @@ -1,7 +1,64 @@ QA output created by 091 fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 32768 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -W +fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W +mapped writes DISABLED +truncating to largest ever: 0x12a00 +truncating to largest ever: 0x75400 +dowrite: write: Input/output error +LOG DUMP (54 total operations): +1( 1 mod 256): SKIPPED (no operation) +2( 2 mod 256): WRITE 0x62600 thru 0x6bdff (0x9800 bytes) HOLE +3( 3 mod 256): FALLOC 0x2e0f2 thru 0x3134a (0x3258 bytes) INTERIOR +4( 4 mod 256): TRUNCATE DOWN from 0x6be00 to 0x12a00 +5( 5 mod 256): READ 0x0 thru 0xdfff (0xe000 bytes) +6( 6 mod 256): FALLOC 0x7048 thru 0x9f54 (0x2f0c bytes) INTERIOR +7( 7 mod 256): WRITE 0x5ea00 thru 0x6e7ff (0xfe00 bytes) HOLE +8( 8 mod 256): READ 0x16000 thru 0x17fff (0x2000 bytes) +9( 9 mod 256): FALLOC 0x4957f thru 0x5298e (0x940f bytes) INTERIOR +10( 10 mod 256): SKIPPED (no operation) +11( 11 mod 256): WRITE 0x10a00 thru 0x173ff (0x6a00 bytes) +12( 12 mod 256): WRITE 0x53800 thru 0x5a7ff (0x7000 bytes) +13( 13 mod 256): WRITE 0x5ae00 thru 0x5afff (0x200 bytes) +14( 14 mod 256): READ 0x5d000 thru 0x66fff (0xa000 bytes) +15( 15 mod 256): SKIPPED (no operation) +16( 16 mod 256): READ 0x21000 thru 0x2bfff (0xb000 bytes) +17( 17 mod 256): SKIPPED (no operation) +18( 18 mod 256): READ 0x47000 thru 0x4ffff (0x9000 bytes) +19( 19 mod 256): WRITE 0x17600 thru 0x25bff (0xe600 bytes) +20( 20 mod 256): READ 0x3f000 thru 0x48fff (0xa000 bytes) +21( 21 mod 256): FALLOC 0xea89 thru 0x19800 (0xad77 bytes) INTERIOR +22( 22 mod 256): FALLOC 0x569aa thru 0x586ea (0x1d40 bytes) INTERIOR +23( 23 mod 256): WRITE 0x35c00 thru 0x453ff (0xf800 bytes) +24( 24 mod 256): SKIPPED (no operation) +25( 25 mod 256): SKIPPED (no operation) +26( 26 mod 256): READ 0x21000 thru 0x26fff (0x6000 bytes) +27( 27 mod 256): READ 0x5e000 thru 0x61fff (0x4000 bytes) +28( 28 mod 256): WRITE 0x6f600 thru 0x6f7ff (0x200 bytes) HOLE +29( 29 mod 256): READ 0x13000 thru 0x19fff (0x7000 bytes) +30( 30 mod 256): TRUNCATE UP from 0x6f800 to 0x75400 +31( 31 mod 256): READ 0x4000 thru 0xafff (0x7000 bytes) +32( 32 mod 256): SKIPPED (no operation) +33( 33 mod 256): FALLOC 0x31d49 thru 0x3c520 (0xa7d7 bytes) INTERIOR +34( 34 mod 256): FALLOC 0x2bbb3 thru 0x37ad8 (0xbf25 bytes) INTERIOR +35( 35 mod 256): READ 0x68000 thru 0x73fff (0xc000 bytes) +36( 36 mod 256): FALLOC 0x2a075 thru 0x36518 (0xc4a3 bytes) INTERIOR +37( 37 mod 256): WRITE 0x24800 thru 0x275ff (0x2e00 bytes) +38( 38 mod 256): READ 0x2f000 thru 0x3cfff (0xe000 bytes) +39( 39 mod 256): FALLOC 0x25e59 thru 0x345eb (0xe792 bytes) INTERIOR +40( 40 mod 256): WRITE 0x1a600 thru 0x225ff (0x8000 bytes) +41( 41 mod 256): READ 0x11000 thru 0x13fff (0x3000 bytes) +42( 42 mod 256): READ 0x72000 thru 0x73fff (0x2000 bytes) +43( 43 mod 256): READ 0x4f000 thru 0x5bfff (0xd000 bytes) +44( 44 mod 256): FALLOC 0x114aa thru 0x1818b (0x6ce1 bytes) INTERIOR +45( 45 mod 256): READ 0x1f000 thru 0x28fff (0xa000 bytes) +46( 46 mod 256): WRITE 0x54600 thru 0x609ff (0xc400 bytes) +47( 47 mod 256): WRITE 0xb600 thru 0xe7ff (0x3200 bytes) +48( 48 mod 256): WRITE 0x68c00 thru 0x73fff (0xb400 bytes) +49( 49 mod 256): WRITE 0x78600 thru 0x79dff (0x1800 bytes) HOLE +50( 50 mod 256): READ 0x49000 thru 0x4cfff (0x4000 bytes) +51( 51 mod 256): WRITE 0x17a00 thru 0x223ff (0xaa00 bytes) +52( 52 mod 256): READ 0x9000 thru 0xafff (0x2000 bytes) +53( 53 mod 256): SKIPPED (no operation) +54( 54 mod 256): WRITE 0x1a600 thru 0x209ff (0x6400 bytes) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2013-Jan-23 23:57 UTC
Re: [PATCH] Btrfs: put csums on the right ordered extent
On Wed, Jan 23, 2013 at 08:21:07AM -0700, David Sterba wrote:> On Wed, Jan 23, 2013 at 12:01:08PM +0100, David Sterba wrote: > > Survived a few hours of Chris'' test and I''m running the full xfstests > > again now. > > After 4.5 hours of the same testsuite as before, this popped up in > syslog: > > [53489.998888] btrfs csum failed ino 63793 off 106496 csum 2411266714 private 3462448526 > > the file is $TEST/junk . Running md5sum on that file produces a > different checksum error messages: > > [54187.259285] btrfs csum failed ino 63793 off 344064 csum 2566472073 private 1022717503 > (3 times) > > I''ve rerun chris'' script again, no errors, and rechecked if I''m running > the kernel, yes.Looks like Josef saw the same, so we have two different bugs (missing crcs vs incorrect crcs). Josef mentioned a much faster test for the incorrect crcs...Josef what was that test? Thanks to everyone that ran my script, sorry it ended up a distraction. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Jan-24 17:38 UTC
Re: [PATCH] Btrfs: put csums on the right ordered extent
On Wed, Jan 23, 2013 at 06:57:09PM -0500, Chris Mason wrote:> Looks like Josef saw the same, so we have two different bugs (missing > crcs vs incorrect crcs). Josef mentioned a much faster test for the > incorrect crcs...Josef what was that test?Yeah, 2 bugs, this patch fixed the incorrect csums with DIO, your scriptlet was able to trigger it reliably. I don''t see a mail from Josef about the 2nd problem, so let me tell the story: I''ve tested plain 3.7 with the script and also with the repeated xfstests (that reproduced the 1st csum problem) -- all clear after 5.5 hours. So we took it as a first good. All suspects in the post 3.7 queue. Josef had a quick reproducer and bisected it to commit 31e502298d80e2af9001d17dc419a3fd4b0bebef Author: Liu Bo <bo.li.liu@oracle.com> Btrfs: put raid properties into global table Raid properties can be shared among raid calculation code, we can put them into a global table to keep it simple. https://patchwork.kernel.org/patch/1781061/ --- Which mistakenly specified 0 as a maximum number of devices in single profile and was fixed by Miao later on, but the fix was not merged due to Josef debugging the 1st csum problem. https://patchwork.kernel.org/patch/1987481/ - { 1, 1, 0, 1, 1, 1 /* single */ }, + { 1, 1, 1, 1, 1, 1 /* single */ }, does indeed fix the 2nd bug. The reproducer that worked for me was to create raid0/raid0 filesystem, run the 50x fsx DIO load and do balance convert to single/single. Then fsx fail. cheers, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html