On Wed, Apr 24, 2013 at 1:24 PM, Tom Gundersen <teg@jklm.no>
wrote:> I''m having lots of problems with wrong checksums on the most
recent
> kernels. Note that this is not a regression as far as I know, just
> more pronounced now than before (the increase in severity might be due
> to changes in my setup).
>
> I see that this was discussed on the ML a few months back, but it was
> not clear to me if the problem is still open or if a solution should
> have landed upstream.
>
>
>
> This is what I''m seeing:
>
> Pretty much on every reboot some (but not all) of the files written to
> or created before the reboot are broken. If the offending files are
> deleted / overwritten the problem goes away (at least until next
> reboot when other files are affected). A random selection of "dmesg |
> grep btrfs" is attached below.
>
> As I can easily reproduce, please let me know how I can help debugging
> further. For instance, how can I tell btrfs to ignore the checksum
> error and give me the file it has anyway (to see if the file is
> garbled, or just the checksum is wrong)?
>
> My btrfs volume is made up of two partitions, and is split into three
> subvolumes. When mounting the rootfs I see this in dmesg:
>
> Apr 24 01:31:47 toms-air kernel: device fsid
> 0d7a2474-3523-413e-8611-1f489b8a9891 devid 1 transid 141284 /dev/sda4
> Apr 24 01:31:47 toms-air kernel: device fsid
> 0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
> Apr 24 01:31:47 toms-air kernel: device fsid
> 0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
> Apr 24 01:31:47 toms-air kernel: device fsid
> 0d7a2474-3523-413e-8611-1f489b8a9891 devid 1 transid 141284 /dev/sda4
> Apr 24 01:31:47 toms-air kernel: device fsid
> 0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
> Apr 24 01:31:47 toms-air kernel: btrfs: use ssd allocation scheme
> Apr 24 01:31:47 toms-air kernel: btrfs: use lzo compression
> Apr 24 01:31:47 toms-air kernel: btrfs: disk space caching is enabled
> Apr 24 01:31:47 toms-air kernel: btrfs: bdev /dev/sda2 errs: wr 0, rd
> 0, flush 0, corrupt 2056270, gen 6
> Apr 24 01:31:47 toms-air kernel: btrfs: bdev /dev/sda4 errs: wr 0, rd
> 0, flush 0, corrupt 2049061, gen 6
> Apr 24 01:31:47 toms-air kernel: device fsid
> 0d7a2474-3523-413e-8611-1f489b8a9891 devid 2 transid 141284 /dev/sda2
>
> And the output of findmnt is:
>
> TARGET SOURCE FSTYPE OPTIONS
> /home UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
> subvol=home,ssd,compress=lzo,x-systemd.automount,nofail
> /var UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
> subvol=var,ssd,compress=lzo
> /usr UUID=0d7a2474-3523-413e-8611-1f489b8a9891 btrfs
> subvol=usr,ssd,compress=lzo
>
>
>
> Errors reported in dmesg:
>
> [10520.530437] btrfs csum failed ino 1988603 off 1277952 csum
> 2566472073 private 2887162790
> [10520.535299] btrfs csum failed ino 1988542 off 172032 csum
> 1032373158 private 2555710917
> [10520.535489] btrfs csum failed ino 1988542 off 172032 csum
> 1032373158 private 2555710917
> [10520.536448] btrfs csum failed ino 1988542 off 307200 csum
> 2566472073 private 4189934277
> [10521.404738] btrfs csum failed ino 1988603 off 1277952 csum
> 2566472073 private 2887162790
> [10521.406514] btrfs csum failed ino 1988542 off 192512 csum
> 2359321615 private 259683409
> [10521.407797] btrfs csum failed ino 1988542 off 372736 csum
> 2566472073 private 1399566794
> [10521.620012] btrfs csum failed ino 1988603 off 1277952 csum
> 2566472073 private 2887162790
> [10521.621371] btrfs csum failed ino 1988542 off 192512 csum
> 2359321615 private 259683409
> [10521.622048] btrfs csum failed ino 1988542 off 372736 csum
> 2566472073 private 1399566794
> [10546.115794] btrfs_readpage_end_io_hook: 26 callbacks suppressed
> [10546.115806] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.116811] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.117847] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.118527] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.118910] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.119436] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.119856] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.120292] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.120683] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10546.121086] btrfs csum failed ino 1988548 off 28672 csum 2066685480
> private 49363816
> [10553.246253] btrfs_readpage_end_io_hook: 2 callbacks suppressed
> [10553.246269] btrfs csum failed ino 114348 off 45056 csum 1787155441
> private 2298707641
> [10553.246541] btrfs csum failed ino 114348 off 45056 csum 1787155441
> private 2298707641
> [10554.761105] btrfs csum failed ino 1988542 off 372736 csum
> 2566472073 private 1399566794
> [10554.762052] btrfs csum failed ino 1988603 off 1204224 csum
> 4217002373 private 516821494
> [10605.966575] btrfs csum failed ino 1988548 off 28672 csum 1496083883
> private 49363816
> [10681.761222] btrfs csum failed ino 1988542 off 217088 csum 652086749
> private 371373290
> [10707.199412] btrfs csum failed ino 1988548 off 28672 csum 1496083883
> private 49363816
> [10711.777982] btrfs csum failed ino 1988542 off 217088 csum 652086749
> private 371373290
> [10711.778511] btrfs csum failed ino 1988543 off 4096 csum 1242025980
> private 1116748566
> [10711.778786] btrfs csum failed ino 1988543 off 4096 csum 1242025980
> private 1116748566
> [10743.754821] btrfs csum failed ino 1988547 off 12288 csum 1555062722
> private 1166323098
> [10743.755264] btrfs csum failed ino 1988542 off 569344 csum
> 1587824662 private 1165253717
> [10743.755549] btrfs csum failed ino 1988543 off 4096 csum 1242025980
> private 1116748566
> [10743.755660] btrfs csum failed ino 1988543 off 4096 csum 1242025980
> private 1116748566
> [10743.761723] btrfs csum failed ino 1988542 off 569344 csum
> 1587824662 private 1165253717
> [10743.761968] btrfs csum failed ino 1988547 off 12288 csum 1555062722
> private 1166323098
> [10743.877909] btrfs csum failed ino 1988547 off 12288 csum 1555062722
> private 1166323098
> [10743.878433] btrfs csum failed ino 1988547 off 12288 csum 1555062722
> private 1166323098
> [10743.878824] btrfs csum failed ino 1988547 off 12288 csum 1555062722
> private 1166323098
> [10743.879210] btrfs csum failed ino 1988547 off 12288 csum 1555062722
> private 1166323098
> [10773.121616] btrfs_readpage_end_io_hook: 2 callbacks suppressed
> [10773.121628] btrfs csum failed ino 1988548 off 28672 csum 1496083883
> private 49363816
> [10774.871002] btrfs csum failed ino 1988603 off 1204224 csum
> 4217002373 private 516821494
>
>
>
>
> Cheers,
>
> Tom
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
This is definitely not normal... Are you sure your hardware is okay?
Both disks as well as RAM? Also: The filesystem looks corrupted to me,
you can check it (but don''t attempt repair) with btrfsck /dev/sdX. If
it''s corrupt then you should recreate it, copy the files back into the
new filesystem and see if it starts to corrupt again...
Keep a copy of the old fs in any case if someone wants a btrfs-image
for debugging!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html