SAMSUNG SSD 830 Series CPU0: Intel® Core(TM) i7-2820QM CPU @ 2.30GHz (fam: 06, model: 2a, stepping: 07) 8GB RAM (quite heavily tested, not recently, with several days of memtest) kernel 3.11.1-200.fc19.x86_64 running on baremetal btrfs-progs-0.20.rc1.20130308git704a08c-1.fc19.x86_64 Today I did a scrub on a btrfs volume, with no message or errors in console or dmesg or journal. Immediately after the scrub I did a balance on the volume which resulted in: ERROR: error during balancing ''/'' - Input/output error In dmesg for the time of that error, this is reported: [ 567.921661] btrfs: relocating block group 6463422464 flags 1 [ 568.282371] btrfs: found 200 extents [ 568.800974] btrfs: found 200 extents [ 568.868567] btrfs: relocating block group 5389680640 flags 1 [ 571.929662] btrfs: found 4410 extents [ 572.896410] btrfs: found 4410 extents [ 572.962479] btrfs: relocating block group 4315938816 flags 1 [ 574.681576] BTRFS info (device sda6): csum failed ino 259 off 428470272 csum 2566472073 private 2181120065 [ 574.692047] BTRFS info (device sda6): csum failed ino 259 off 428470272 csum 2566472073 private 2181120065 Upon reboot with kernel 3.11.1-200.fc19.x86_64 and also kernel-3.10.4-300.fc19.x86_64 the following is reported in dmesg: [ 6.053511] btrfs no csum found for inode 37693 start 25538560 [ 6.054463] BTRFS info (device sda6): csum failed ino 37693 off 25538560 csum 3474434693 private 0 [ 6.055299] btrfs no csum found for inode 37693 start 26218496 [ 6.056086] BTRFS info (device sda6): csum failed ino 37693 off 26218496 csum 2772176352 private 0 [ 6.085993] btrfs no csum found for inode 37693 start 22286336 [ 6.086093] btrfs no csum found for inode 37693 start 22368256 [ 6.087636] BTRFS info (device sda6): csum failed ino 37693 off 22286336 csum 396494483 private 0 [ 6.087741] BTRFS info (device sda6): csum failed ino 37693 off 22368256 csum 2249156591 private 0 [root@f19l chris]# btrfs fi show failed to open /dev/sr0: No medium found Label: ''fedora'' uuid: d505bdee-ba7c-4a64-9481-d5cd76ab8b3e Total devices 1 FS bytes used 3.64GB devid 1 size 12.99GB used 6.51GB path /dev/sda6 The file system is on an SSD, so single profile for both data and metadata: [root@f19l chris]# btrfs fi df / Data: total=6.01GB, used=3.39GB System: total=4.00MB, used=4.00KB Metadata: total=512.00MB, used=258.93MB If this is not the result of a known bug, let me know if there''s more information I should provide, I do have a ~22MB btrfs-image -c9 -t4 of the file system. This fs is disposable, but I might try btrfsck --repair --init-csum-tree with a slightly newer btrfs-progs. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Result of btrfsck (without --repair) on the fs. Checking filesystem on /dev/sda6 UUID: d505bdee-ba7c-4a64-9481-d5cd76ab8b3e checking extents checking fs roots root 257 inode 37693 errors 1800 found 3938304000 bytes used err is 1 total csum bytes: 3557972 total tree bytes: 271794176 total fs tree bytes: 253009920 btree space waste bytes: 79371605 file data blocks allocated: 4546076672 referenced 3631865856 Btrfs v0.20-rc1 Console result of subsequence scrub on the mounted fs: scrub status for d505bdee-ba7c-4a64-9481-d5cd76ab8b3e scrub started at Mon Sep 23 16:23:33 2013 and finished after 8 seconds total bytes scrubbed: 3.67GB with 10 errors error details: csum=10 corrected errors: 0, uncorrectable errors: 10, unverified errors: 0 dmesg result of a subsequent scrub on the mounted file system: [ 30.682058] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [ 30.682095] btrfs: unable to fixup (regular) error at logical 461914112 on dev /dev/sda6 [ 30.682141] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [ 30.682174] btrfs: unable to fixup (regular) error at logical 460079104 on dev /dev/sda6 [ 30.689792] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 [ 30.689823] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 [ 30.689824] btrfs: unable to fixup (regular) error at logical 456085504 on dev /dev/sda6 [ 30.689896] btrfs: unable to fixup (regular) error at logical 457531392 on dev /dev/sda6 [ 30.743222] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 [ 30.743260] btrfs: unable to fixup (regular) error at logical 460230656 on dev /dev/sda6 [ 30.970989] btrfs: checksum error at logical 462082048 on dev /dev/sda6, sector 902504, root 257, inode 37693, offset 22282240, length 4096, links 1 (path: var/log/journal/180d14c18233452d9918c3aec1c6c68b/system.journal) [ 30.970993] btrfs: checksum error at logical 464195584 on dev /dev/sda6, sector 906632, root 257, inode 37693, offset 22638592, length 4096, links 1 (path: var/log/journal/180d14c18233452d9918c3aec1c6c68b/system.journal) [ 30.970997] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 [ 30.970998] btrfs: unable to fixup (regular) error at logical 464195584 on dev /dev/sda6 [ 30.971270] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 [ 30.971300] btrfs: unable to fixup (regular) error at logical 462082048 on dev /dev/sda6 [ 31.047120] btrfs: checksum error at logical 462123008 on dev /dev/sda6, sector 902584, root 257, inode 37693, offset 22360064, length 4096, links 1 (path: var/log/journal/180d14c18233452d9918c3aec1c6c68b/system.journal) [ 31.047206] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0 [ 31.047235] btrfs: unable to fixup (regular) error at logical 462123008 on dev /dev/sda6 [ 36.290269] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0 [ 36.290305] btrfs: unable to fixup (regular) error at logical 4744409088 on dev /dev/sda6 [ 37.882830] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 [ 37.882867] btrfs: unable to fixup (regular) error at logical 6730518528 on dev /dev/sda6 Also, there have been no crashes, panics, or power cuts to this system. Thus far it seems like the balance itself is what has caused the csum corruption. Prior to balance, scrub finds no problems. After balance there is some corruption. But isn''t it ambiguous whether the data or the metadata have been corrupted since there is only a single copy of each? In which case is it wise to init-csum-tree? Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
I''m able to reproduce this on a different drive, HDD (WDC WD5000BEVT-22ZAT0), also with data and metadata set to single. There are no problems reported when scrubbing before balance, and then there is corruption reported after balance. File system is created with: kernel-3.9.5-301.fc19.x86_64 btrfs-progs-0.20.rc1.20130308git704a08c-1.fc19.x86_64 Then updated to: kernel-3.11.1-200.fc19.x86_64 btrfs-progs-0.20.rc1.20130308git704a08c-1.fc19.x86_64 Then scrubbed with no errors. Then balanced with no errors (unlike the previous report with SSD which stopped) Then scrubbed with errors, see below the balance followed by 2nd scrub. [ 226.333352] btrfs: relocating block group 2168455168 flags 1 [ 233.032499] btrfs: found 2816 extents [ 234.522162] btrfs: found 2816 extents [ 234.818501] btrfs: relocating block group 1094713344 flags 1 [ 261.631679] btrfs: found 13255 extents [ 266.133269] btrfs: found 13254 extents [ 266.464119] btrfs: relocating block group 20971520 flags 4 [ 268.665678] btrfs: found 2018 extents [ 268.976324] btrfs: relocating block group 12582912 flags 1 [ 269.397991] btrfs: found 246 extents [ 269.931383] btrfs: found 246 extents [ 270.209504] btrfs: relocating block group 4194304 flags 4 [ 270.642570] btrfs: found 378 extents [ 318.029771] btrfs: checksum error at logical 2209439744 on dev /dev/sda4, sector 6412464, root 256, inode 25764, offset 6746112, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 318.029793] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [ 318.029827] btrfs: unable to fixup (regular) error at logical 2209439744 on dev /dev/sda4 [ 318.045206] btrfs: checksum error at logical 2207895552 on dev /dev/sda4, sector 6409448, root 256, inode 25764, offset 6668288, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 318.045211] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [ 318.045224] btrfs: unable to fixup (regular) error at logical 2207895552 on dev /dev/sda4 [ 318.172649] btrfs: checksum error at logical 2211979264 on dev /dev/sda4, sector 6417424, root 256, inode 25764, offset 7389184, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 318.172657] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 [ 318.172671] btrfs: unable to fixup (regular) error at logical 2211979264 on dev /dev/sda4 [ 318.175607] btrfs: checksum error at logical 2212261888 on dev /dev/sda4, sector 6417976, root 256, inode 25764, offset 7065600, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 318.175611] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 [ 318.175619] btrfs: unable to fixup (regular) error at logical 2212261888 on dev /dev/sda4 [ 318.175979] btrfs: checksum error at logical 2212278272 on dev /dev/sda4, sector 6418008, root 256, inode 25764, offset 7081984, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 318.175984] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 [ 318.175992] btrfs: unable to fixup (regular) error at logical 2212278272 on dev /dev/sda4 [ 318.200069] btrfs: checksum error at logical 2213355520 on dev /dev/sda4, sector 6420112, root 256, inode 25764, offset 7581696, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 318.200074] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 [ 318.200083] btrfs: unable to fixup (regular) error at logical 2213355520 on dev /dev/sda4 [ 318.207868] btrfs: checksum error at logical 2214825984 on dev /dev/sda4, sector 6422984, root 256, inode 25764, offset 7806976, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 318.207872] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 [ 318.207881] btrfs: unable to fixup (regular) error at logical 2214825984 on dev /dev/sda4 [ 323.564405] btrfs: checksum error at logical 2650460160 on dev /dev/sda4, sector 7273832, root 256, inode 25764, offset 4247552, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 323.564422] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0 [ 323.564456] btrfs: unable to fixup (regular) error at logical 2650460160 on dev /dev/sda4 [ 325.307954] btrfs: checksum error at logical 2792796160 on dev /dev/sda4, sector 7551832, root 256, inode 25764, offset 5857280, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 325.307975] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0 [ 325.308088] btrfs: unable to fixup (regular) error at logical 2792796160 on dev /dev/sda4 [ 325.317607] btrfs: checksum error at logical 2791784448 on dev /dev/sda4, sector 7549856, root 256, inode 25764, offset 5378048, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 325.317621] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 [ 325.317648] btrfs: unable to fixup (regular) error at logical 2791784448 on dev /dev/sda4 [ 325.431833] btrfs: checksum error at logical 2791858176 on dev /dev/sda4, sector 7550000, root 256, inode 25764, offset 5455872, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 325.431849] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 11, gen 0 [ 325.431877] btrfs: unable to fixup (regular) error at logical 2791858176 on dev /dev/sda4 [ 325.432530] btrfs: checksum error at logical 2791907328 on dev /dev/sda4, sector 7550096, root 256, inode 25764, offset 5505024, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 325.432543] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 12, gen 0 [ 325.432567] btrfs: unable to fixup (regular) error at logical 2791907328 on dev /dev/sda4 [ 327.321507] btrfs: checksum error at logical 2792157184 on dev /dev/sda4, sector 7550584, root 256, inode 25764, offset 5758976, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 327.321525] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 13, gen 0 [ 327.321560] btrfs: unable to fixup (regular) error at logical 2792157184 on dev /dev/sda4 [ 329.996557] btrfs: checksum error at logical 3165069312 on dev /dev/sda4, sector 8278928, root 256, inode 25764, offset 6369280, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 329.996575] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 14, gen 0 [ 329.996604] btrfs: unable to fixup (regular) error at logical 3165069312 on dev /dev/sda4 [ 329.997239] btrfs: checksum error at logical 3165126656 on dev /dev/sda4, sector 8279040, root 256, inode 25764, offset 6430720, length 4096, links 1 (path: var/log/journal/10db2764a11a4829bf82a94c6559d121/system.journal) [ 329.997253] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 15, gen 0 [ 329.997279] btrfs: unable to fixup (regular) error at logical 3165126656 on dev /dev/sda4 Also on reboot now it is reported: [root@f19s ~]# dmesg | grep -i btrfs [ 1.835049] Btrfs loaded [ 1.966412] btrfs: disk space caching is enabled [ 1.980436] btrfs: bdev /dev/sda4 errs: wr 0, rd 0, flush 0, corrupt 15, gen 0 [ 3.233524] SELinux: initialized (dev sda4, type btrfs), uses xattr [ 4.316491] btrfs: disk space caching is enabled [ 9.715052] btrfs no csum found for inode 25764 start 6402048 [ 9.715503] btrfs no csum found for inode 25764 start 6754304 [ 9.827785] BTRFS info (device sda4): csum failed ino 25764 off 6402048 csum 3000251694 private 0 [ 10.204708] BTRFS info (device sda4): csum failed ino 25764 off 6754304 csum 1612034066 private 0 [ 11.187301] btrfs no csum found for inode 25764 start 7393280 [ 11.187578] btrfs no csum found for inode 25764 start 7585792 [ 11.187866] btrfs no csum found for inode 25764 start 7819264 [ 11.403389] BTRFS info (device sda4): csum failed ino 25764 off 7393280 csum 3889482771 private 0 [ 11.427616] BTRFS info (device sda4): csum failed ino 25764 off 7819264 csum 4086456643 private 0 [ 11.486405] BTRFS info (device sda4): csum failed ino 25764 off 7585792 csum 3911271769 private 0 Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
OK so now I''m able to reproduce this with Fedora 20 alpha RC4 on a HDD, which uses: kernel-3.11.1-300.fc20.x86_64 btrfs-progs-0.20.rc1.20130917git194aa4a-1.fc20.x86_64 Since it''s HDD, metadata profile DUP is used. But I still get munged checksums with balance, and the corruption isn''t fixable by a subsequent scrub. So even though the data is probably OK and this is just a checksum problem, it''s apparently not fixable (?). [root@oldlaptop ~]# btrfs balance start / Done, had to relocate 5 out of 5 chunks [root@oldlaptop ~]# dmesg (snippet) [ 390.770699] btrfs: relocating block group 1103101952 flags 1 [ 406.639113] btrfs: found 10341 extents [ 414.172873] btrfs: found 10331 extents [ 414.530059] btrfs: relocating block group 29360128 flags 36 [ 418.761208] btrfs: found 9281 extents [ 419.136338] btrfs: relocating block group 20971520 flags 34 [ 419.536539] btrfs: found 1 extents [ 419.880757] btrfs: relocating block group 12582912 flags 1 [ 420.380511] btrfs: found 282 extents [ 421.080667] btrfs: found 282 extents [ 421.426891] btrfs: relocating block group 4194304 flags 4 [root@oldlaptop ~]# btrfs scrub start / scrub started on /, fsid 1463a31b-472a-47cd-a8c8-86bf09f978fa (pid=894) [root@oldlaptop ~]# dmesg (snippet) [ 460.533990] btrfs: checksum error at logical 2607853568 on dev /dev/sda5, sector 7207000, root 256, inode 24622, offset 4247552, length 4096, links 1 (path: var/log/journal/d212cf4a840f4e78a33781c56189a7da/system.journal) [ 460.534045] btrfs: bdev /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [ 460.534082] btrfs: unable to fixup (regular) error at logical 2607853568 on dev /dev/sda5 [ 460.534581] btrfs: checksum error at logical 2607869952 on dev /dev/sda5, sector 7207032, root 256, inode 24622, offset 4263936, length 4096, links 1 (path: var/log/journal/d212cf4a840f4e78a33781c56189a7da/system.journal) [ 460.534594] btrfs: bdev /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [ 460.534614] btrfs: unable to fixup (regular) error at logical 2607869952 on dev /dev/sda5 [ 460.535128] btrfs: checksum error at logical 2607886336 on dev /dev/sda5, sector 7207064, root 256, inode 24622, offset 4280320, length 4096, links 1 (path: var/log/journal/d212cf4a840f4e78a33781c56189a7da/system.journal) [ 460.535140] btrfs: bdev /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 [ 460.535161] btrfs: unable to fixup (regular) error at logical 2607886336 on dev /dev/sda5 [ 460.535607] btrfs: checksum error at logical 2607902720 on dev /dev/sda5, sector 7207096, root 256, inode 24622, offset 4296704, length 4096, links 1 (path: var/log/journal/d212cf4a840f4e78a33781c56189a7da/system.journal) [ 460.535619] btrfs: bdev /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 [ 460.535639] btrfs: unable to fixup (regular) error at logical 2607902720 on dev /dev/sda5 [ 460.536421] btrfs: checksum error at logical 2608025600 on dev /dev/sda5, sector 7207336, root 256, inode 24622, offset 4313088, length 4096, links 1 (path: var/log/journal/d212cf4a840f4e78a33781c56189a7da/system.journal) [ 460.536437] btrfs: bdev /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 [ 460.536457] btrfs: unable to fixup (regular) error at logical 2608025600 on dev /dev/sda5 [ 460.779192] btrfs: checksum error at logical 2626674688 on dev /dev/sda5, sector 7243760, root 256, inode 24622, offset 4595712, length 4096, links 1 (path: var/log/journal/d212cf4a840f4e78a33781c56189a7da/system.journal) [ 460.779210] btrfs: bdev /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 [ 460.779245] btrfs: unable to fixup (regular) error at logical 2626674688 on dev /dev/sda5 [ 460.779822] btrfs: checksum error at logical 2626715648 on dev /dev/sda5, sector 7243840, root 256, inode 24622, offset 4231168, length 4096, links 1 (path: var/log/journal/d212cf4a840f4e78a33781c56189a7da/system.journal) [ 460.779834] btrfs: bdev /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 [ 460.779854] btrfs: unable to fixup (regular) error at logical 2626715648 on dev /dev/sda5 And now on reboot: [root@f20s ~]# dmesg | grep -i btrfs [ 1.725224] Btrfs loaded [ 1.980491] btrfs: disk space caching is enabled [ 2.001684] btrfs: bdev /dev/sda5 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0 [ 3.011628] SELinux: initialized (dev sda5, type btrfs), uses xattr [ 5.092593] btrfs: disk space caching is enabled [ 8.703883] btrfs no csum found for inode 24622 start 4235264 [ 8.844562] btrfs no csum found for inode 24622 start 4251648 [ 8.844589] btrfs no csum found for inode 24622 start 4272128 [ 8.844611] btrfs no csum found for inode 24622 start 4288512 [ 8.844632] btrfs no csum found for inode 24622 start 4304896 [ 8.844658] btrfs no csum found for inode 24622 start 4321280 [ 8.856069] BTRFS info (device sda5): csum failed ino 24622 off 4251648 csum 1113579642 private 0 [ 8.856084] BTRFS info (device sda5): csum failed ino 24622 off 4272128 csum 2433646103 private 0 [ 8.856092] BTRFS info (device sda5): csum failed ino 24622 off 4288512 csum 2276263411 private 0 [ 8.857248] BTRFS info (device sda5): csum failed ino 24622 off 4304896 csum 1156822344 private 0 [ 8.857424] BTRFS info (device sda5): csum failed ino 24622 off 4321280 csum 3967991073 private 0 [ 8.867242] BTRFS info (device sda5): csum failed ino 24622 off 4235264 csum 172180530 private 0 Other info: [root@oldlaptop ~]# btrfs fi show bfailed to open /dev/sr0: No medium found Label: ''fedora'' uuid: 1463a31b-472a-47cd-a8c8-86bf09f978fa Total devices 1 FS bytes used 700.04MB devid 1 size 432.62GB used 3.04GB path /dev/sda5 Btrfs v0.20-rc1 [root@oldlaptop ~]# btrfs fi df / Data: total=1.01GB, used=662.47MB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=37.57MB Metadata: total=8.00MB, used=0.00 Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Since I can now reproduce this bug on two different computers, one with SSD, the other HDD, and scrub does not fix the csum errors with a scrub, I''ve filed a bug. It''s reproducible with: kernel-3.11.1-300.fc20.x86_64 btrfs-progs-0.20.rc1.20130917git194aa4a-1.fc20.x86_64 Bug is at: bugzilla.redhat.com/show_bug.cgi?id=1011714 Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
OK so I think I''m narrowing this down to just the systemd journal, and it''s not checksums that are corrupted, it''s the journal itself. [ 19.354354] systemd-journald[210]: /var/log/journal/8e4cbfea404512ae70096c6202c9a3bf/system.journal: Journal file corrupted, rotating. If I set systemd journald.conf Storage=volatile so that it stores journals only in memory, the problem is not reproducible. However, even after deleting all corrupt journal files, and a subsequent scrub reporting no errors, on each reboot (and mount of the filesystem) I get: [ 3.646448] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt 17, gen 0 So somehow the corrupt counter isn''t being reset? And how would I go about setting /var/log/journal contents to inherit nodatacow? Possible? Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
On Sep 24, 2013, at 10:34 PM, Chris Murphy <lists@colorremedies.com> wrote:> And how would I go about setting /var/log/journal contents to inherit nodatacow? Possible?chattr +C /var/log/journal Resolved the problem. Whether this is an appropriate long term fix that systemd should apply to this directory, I don''t know. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
Chris Murphy posted on Tue, 24 Sep 2013 22:34:20 -0600 as excerpted:> However, even after deleting all corrupt journal files, and a subsequent > scrub reporting no errors, on each reboot (and mount of the filesystem) > I get: > > [ 3.646448] btrfs: bdev /dev/sda6 errs: wr 0, rd 0, flush 0, corrupt > 17, gen 0 > > So somehow the corrupt counter isn''t being reset?AFAIK, it''s deliberate that errors aren''t reset automatically so there''s some historical record and it''s possible to see if they start to accumulate. But there is of course a manual reset available, should a sysadmin wish to use it... <quick lookup, quoting the commandline help output> ... btrfs device stats [-z] <path>|<device> Show current device IO stats. -z to reset stats afterwards. What the (brief) help output doesn''t say but the (longer) manpage does... for multi-device filesystems <path> will list (and zero with -z) stats for all devices (listing one device''s stats after another) composing the filesystem, <device> will list/zero them for just that single component device. The -r does reset things here. (FWIW I have a device that''s occasionally slow enough to stabilize on power-up, that at least with 3.11, btrfs would occasionally drop it on resume after a suspend, forcing a hard reboot soon after, with resulting corruption. Fortunately I''m running raid1 mode both data/metadata, and a scrub has always fixed things as verified by a further scrub and balance, but the stats errors of course stuck around until I did a -r/reset. So I have personal knowledge of this one. But with last nite''s 3.12-rc2 git kernel pull and build I changed the kernel commandline option I was using from rootdelay=N to rootwait, and between that and the btrfs fixes in 3.12, I''m hoping I won''t see that problem again. I guess I''ll find out over the coming couple weeks or so, at which I''ll declare the issue gone if I''ve not seen it again.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
On Tue, Sep 24, 2013 at 11:44:15PM -0600, Chris Murphy wrote:> > On Sep 24, 2013, at 10:34 PM, Chris Murphy <lists@colorremedies.com> wrote: > > > And how would I go about setting /var/log/journal contents to inherit nodatacow? Possible? > > chattr +C /var/log/journal > > Resolved the problem. Whether this is an appropriate long term fix that systemd should apply to this directory, I don''t know. >That just disables cow which in turn disables csumming so it is a good solution for you right now and gives me time ti figure out wtf is going on here. Looking at the systemd code it isn''t doing O_DIRECT, which is how you usually end up with this sort of situation. So it is likely a bug on our side, I will try and track it down today. Thanks for narrowing this down, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
On Sep 25, 2013, at 6:30 AM, Josef Bacik <jbacik@fusionio.com> wrote:> > That just disables cow which in turn disables csumming so it is a good solution > for you right now and gives me time ti figure out wtf is going on here.I think it''s preventing the corruption of the journal logs, because I''m also no longer getting messages from systemd saying a log is corrupt. So I don''t think the problem is solved just by not having csums. I''m thinking the csums were always correct, it was the data that was corrupting… or both data and csums were wrong. It seems that the way systemd-journal is writing to disk is handled differently only during balance operations. The corruption has never happened with days of normal usage (no balance). But happens within tens of seconds upon balance. Naturally something or other is always being written to the systemd-journal logs during a balance (someone logs in=journal entry, kernel reports extent found=journal entry, kernel reports moved chunk=journal entry). Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
On Wed, Sep 25, 2013 at 08:56:52AM -0600, Chris Murphy wrote:> > On Sep 25, 2013, at 6:30 AM, Josef Bacik <jbacik@fusionio.com> wrote: > > > > > That just disables cow which in turn disables csumming so it is a good solution > > for you right now and gives me time ti figure out wtf is going on here. > > I think it''s preventing the corruption of the journal logs, because I''m also no longer getting messages from systemd saying a log is corrupt. So I don''t think the problem is solved just by not having csums. I''m thinking the csums were always correct, it was the data that was corrupting… or both data and csums were wrong. > > It seems that the way systemd-journal is writing to disk is handled differently only during balance operations. The corruption has never happened with days of normal usage (no balance). But happens within tens of seconds upon balance. Naturally something or other is always being written to the systemd-journal logs during a balance (someone logs in=journal entry, kernel reports extent found=journal entry, kernel reports moved chunk=journal entry). >I''ve reproduce it locally so I''ll hopefully figure out what is going on soon. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
On Tue, 24 Sep 2013 22:34:20 -0600 Chris Murphy <lists@colorremedies.com> wrote:> OK so I think I''m narrowing this down to just the systemd journal, > and it''s not checksums that are corrupted, it''s the journal itself.I doubt it''s systemd-dependent, cause I''ve seen similar behaviour on a Gentoo system without systemd. Before balance the filesystem was ok, after I get root 257 inode 2875 errors 1800 root 257 inode 2881 errors 1800 root 257 inode 2969 errors 1800 root 257 inode 3063 errors 1800 root 257 inode 3120 errors 1800 root 257 inode 12407 errors 1800 root 257 inode 19496 errors 1800 root 257 inode 19500 errors 1800 root 257 inode 19564 errors 1800 root 257 inode 19643 errors 1800 root 257 inode 19693 errors 1800 root 257 inode 19949 errors 1800 root 257 inode 20178 errors 1800 root 257 inode 20320 errors 1800 root 257 inode 20406 errors 1800 root 257 inode 20512 errors 1800 root 257 inode 20586 errors 1800 root 257 inode 20654 errors 1800 root 257 inode 20727 errors 1800 root 257 inode 20728 errors 1800 root 257 inode 20821 errors 1800 root 257 inode 20843 errors 1800 root 257 inode 21062 errors 1800 root 257 inode 21078 errors 1800 root 257 inode 21222 errors 1800 root 257 inode 21356 errors 1800 root 257 inode 21437 errors 1800 root 257 inode 55082 errors 1800 root 257 inode 65343 errors 1800 root 257 inode 72413 errors 1800 on a fsck and scrub tells me that there are unfixable csum errors. Kernel is 3.12.0-rc2-00083-g4b97280. I''ve observed this two times, and every time only the first subvolume (root 257) was affected. regards, Johannes -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html
On Sep 27, 2013, at 9:07 AM, Johannes Hirte <johannes.hirte@datenkhaos.de> wrote:> On Tue, 24 Sep 2013 22:34:20 -0600 > Chris Murphy <lists@colorremedies.com> wrote: > >> OK so I think I''m narrowing this down to just the systemd journal, >> and it''s not checksums that are corrupted, it''s the journal itself. > > I doubt it''s systemd-dependent,I did not intend to indicate only systemd journal can trigger this, but rather on my system those appear to be the only affected files. Anything that has the same write behavior as systemd-journald probably has the same problem.> > on a fsck and scrub tells me that there are unfixable csum errors.The scrub should cause messages to appear in dmesg that include a pathname to the affected files, which might hint at what has the same write behavior. Even though a fix has been sent to stable for the systemd journal triggered issue, you should still find out what''s being corrupted in your situation in case the write behaviors are different yet are still triggering corruption. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at vger.kernel.org/majordomo-info.html