since my last debian kernel-update to 2.6.38-2-amd64 i got troubles with csum failures. it''s a volume full of huge kvm-images on md-RAID1 and LVM, so i used the mount options: ''noatime,nodatasum'' to maximize the performance. it happened two weeks ago for the fist time. and now again a kvm-image isn''t readable again. i have to use an older snapshot to substitute the virtual machine. this are the entries in dmesg/kernel-log on any access: ... [2412668.409442] btrfs csum failed ino 258 off 2331529216 csum 3632892464 private 2115348581 ... it''s a production machine, so i can not make to much experiments on it. do you see an obvious way to solve this problem? thanks! martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 03, 2011 at 11:56:32PM +0200, Martin Schitter wrote:> since my last debian kernel-update to 2.6.38-2-amd64 i got troubles with > csum failures. it''s a volume full of huge kvm-images on md-RAID1 and > LVM, so i used the mount options: ''noatime,nodatasum'' to maximize the > performance. > > it happened two weeks ago for the fist time. and now again a kvm-image > isn''t readable again. i have to use an older snapshot to substitute the > virtual machine. > > this are the entries in dmesg/kernel-log on any access: > ... > [2412668.409442] btrfs csum failed ino 258 off 2331529216 csum > 3632892464 private 2115348581 > ... > > it''s a production machine, so i can not make to much experiments on it. > do you see an obvious way to solve this problem? >Wait why are you running with btrfs in production? What OS is in this vm image? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 2011-05-04 02:28, schrieb Josef Bacik:> Wait why are you running with btrfs in production?do you know a better alternative for continuous snapshots? :) it works surprisingly well since more than a year. well the performance could be better for vm-image-hosting but it works. we used cache=''writeback'' for a long time but now all virtual instances have set cache=''none''> What OS is in this vm image?2.6.30-bpo.1-amd64 with virtio-driver could you give me some advice how to debug/report this specific problem more precise? thanks martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 4, 2011 at 7:44 AM, Martin Schitter <ms@mur.at> wrote:> Am 2011-05-04 02:28, schrieb Josef Bacik: >> >> Wait why are you running with btrfs in production? > > do you know a better alternative for continuous snapshots? :)zfs :D> > it works surprisingly well since more than a year. > well the performance could be better for vm-image-hosting but it works. > > we used cache=''writeback'' for a long time but now all virtual instances have > set cache=''none'' > >> What OS is in this vm image? > > 2.6.30-bpo.1-amd64 with virtio-driver > > could you give me some advice how to debug/report this specific problem more > precise?If it''s not reproducible then I''d suspect it''d be hard to do. Usually checksum errors is early sign of hardware failure (most common are disk or power supply). -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 2011-05-04 04:18, schrieb Fajar A. Nugraha:>> could you give me some advice how to debug/report this specific >> problem more >>> precise? > If it''s not reproducible then I''d suspect it''d be hard to do.the last working snapshot is from 2011-05-02-17:13. i can reproduce this file system corruption on one specific file in any hourly snapshot later. whenever i make a simple: cat snapshot-2011-05-02-18:13/sata-images/image_xy.raw > /dev/null i get an "Input/output error" and the quoted debug messages in dmesg and kernel-log could this be seen as an useful starting point for further investigations?> Usually checksum errors is early sign of hardware failure (most > common are disk or power supply).that looks very unplausible to me. there is an RAID1 layer beneath btrfs in our setup and i don''t see any errors there. and the ''nodatasum'' option should also ignore csum issues.-- isn''t it? martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 04, 2011 at 01:39:46PM +0200, Martin Schitter wrote:> and the ''nodatasum'' option should also ignore csum issues.-- isn''t it?No, "nodatasum" will prevent newly-written data from being checksummed. However, if a checksum already exists (because the data was written to a filesystem mounted without the "nodatasum" option), btrfs will still verify the checksum, regardless of the current setting of nodatasum. There is currently no way of preventing btrfs from verifying checksums if they exist; I don''t believe that there''s any way of removing an existing checksum, either. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Charting the inexorable advance of Western syphilisation... ---
On Wed, May 4, 2011 at 5:39 AM, Martin Schitter <ms@mur.at> wrote:> Am 2011-05-04 04:18, schrieb Fajar A. Nugraha: >>> >>> could you give me some advice how to debug/report this specific >>> problem more >>>> >>>> precise? >> >> If it''s not reproducible then I''d suspect it''d be hard to do. > > the last working snapshot is from 2011-05-02-17:13. i can reproduce this > file system corruption on one specific file in any hourly snapshot later.That''s not surprising, any later snapshots will be sharing the same corrupted block.> that looks very unplausible to me. there is an RAID1 layer beneath btrfs in > our setup and i don''t see any errors there.That doesn''t rule out the possibility of corruption when it was written in the first place, or some similar problem that the raid1 faithfully reproduced on both mirrors. That''s not to say that it''s impossible that the problem is in btrfs, just that it''s not the only plausible possibility.> and the ''nodatasum'' option should also ignore csum issues.-- isn''t it?No, it only affects writing new checksums; any existing checksums are still checked. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 2011-05-04 13:51, schrieb cwillu:>> that looks very unplausible to me. there is an RAID1 layer beneath btrfs in >> our setup and i don''t see any errors there. > > That doesn''t rule out the possibility of corruption when it was > written in the first place, or some similar problem that the raid1 > faithfully reproduced on both mirrors. That''s not to say that it''s > impossible that the problem is in btrfs, just that it''s not the only > plausible possibility.well -- i am doing a backup of all images every night. this process should work like a simple "scrub" because all data (and its checksumes) will be read. that''s the way i stumbled over this problem!>> and the ''nodatasum'' option should also ignore csum issues.-- isn''t it?>> No, it only affects writing new checksums; any existing checksums are > still checked.would it make some sense to remount the volume with checksumming enabled and run additional tests to find similar suspect blocks to prevent this kind of suddenly broken files? martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hey Martin, On 05/04/11 13:39, Martin Schitter wrote:>> Usually checksum errors is early sign of hardware failure (most >> common are disk or power supply). > > that looks very unplausible to me. there is an RAID1 layer beneath btrfs > in our setup and i don''t see any errors there.Is the btrfs RAID1 itself inside a virtual machine? I''ve had data corruption with virtio block devices > 1TB on early squeeze kernels. Kaspar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Excerpts from Martin Schitter''s message of 2011-05-03 17:56:32 -0400:> since my last debian kernel-update to 2.6.38-2-amd64 i got troubles with > csum failures. it''s a volume full of huge kvm-images on md-RAID1 and > LVM, so i used the mount options: ''noatime,nodatasum'' to maximize the > performance. > > it happened two weeks ago for the fist time. and now again a kvm-image > isn''t readable again. i have to use an older snapshot to substitute the > virtual machine. > > this are the entries in dmesg/kernel-log on any access: > ... > [2412668.409442] btrfs csum failed ino 258 off 2331529216 csum > 3632892464 private 2115348581 > ... > > it''s a production machine, so i can not make to much experiments on it. > do you see an obvious way to solve this problem?What OS is inside these virtual machines? The btrfs unstable tree has some fixes for windows based OSes. Is your kvm config using O_DIRECT? I''ve also got patches here that force us to honor nodatasum even when the file has csums, that can help if the contents of the file are actually good. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs- > owner@vger.kernel.org] On Behalf Of Martin Schitter > > well -- i am doing a backup of all images every night. this process > should work like a simple "scrub" because all data (and its checksumes) > will be read.Sorry, not correct. When you read all the data using something in user-land, the OS only needs to read one side of the data. It can accelerate by staggering the read requests across multiple disks. So some sectors remain unread on some disks. When you scrub, it reads all the data from all the redundant copies (mirrored or raid) on all the individual disks in the raid set. For this reason, you always want to use JBOD, and don''t use hardware raid. Because if there''s an undetected hardware error, the hardware raid will make it impossible for the OS to examine individual disks to identify the failing one. At least I know all the above is true for reading & scrubbing in another filesystem, I don''t actually know any of this for fact in btrfs, but it seems so basic I would be flabbergasted if I learned that wasn''t the btrfs behavior. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 2011-05-04 14:31, schrieb Kaspar Schleiser:> Is the btrfs RAID1 itself inside a virtual machine? I''ve had data > corruption with virtio block devices > 1TB on early squeeze kernels.no -- it''s on the (native) host side. and we use a very actual kernel from debian ''testing'' (2.6.38-2). martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 2011-05-04 14:39, schrieb Chris Mason:> What OS is inside these virtual machines? The btrfs unstable tree has > some fixes for windows based OSes.we have only linux guests of different flavor, no windows guests. both corruptions during this last weeks belong to different virtual block device images of the same guest instance.> Is your kvm config using O_DIRECT?yes -- the kvm/qemu option cache="none" implies O_DIRECT.> I''ve also got patches here that force us to honor nodatasum even when > the file has csums, that can help if the contents of the file are > actually good.that sounds interessting! in our case it may be easier do use same recent backup data, but it could be very helpful in similar situations. i would really like to help isolating the reasons of this failure and a find a practical strategy to prevent additional breakdowns. thanks martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04.05.2011 13:51, cwillu wrote:> On Wed, May 4, 2011 at 5:39 AM, Martin Schitter <ms@mur.at> wrote: >> and the ''nodatasum'' option should also ignore csum issues.-- isn''t it? > > No, it only affects writing new checksums; any existing checksums are > still checked.From the report I assume this must be the case for meta data, but it does not stand true for data. I was just looking at btrfs_readpage_end_io_hook for some other reason and realized it skips checksum checking when the file system is mounted nodatasum. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/03/2011 08:44 PM, Martin Schitter wrote:> Am 2011-05-04 02:28, schrieb Josef Bacik: >> Wait why are you running with btrfs in production? > > do you know a better alternative for continuous snapshots? :) > > it works surprisingly well since more than a year. > well the performance could be better for vm-image-hosting but it works. > > we used cache=''writeback'' for a long time but now all virtual instances > have set cache=''none'' > >> What OS is in this vm image? > > 2.6.30-bpo.1-amd64 with virtio-driver > > could you give me some advice how to debug/report this specific problem > more precise? >So there is a problem with DIO, since userspace can modify pages in flight we will end up with the wrong checksums since the data can change in flight. I was trying to come up with a way to fix this but there''s really nothing to be done at the moment other than turn off checksumming per file. Windows was particularly bad about this, but I hadn''t seen it with Linux guests (even though it should still be happening). So I''ll come up with something to turn off checksumming per file to get around this for now, I''ll try and get to that soonish. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am 2011-05-04 15:23, schrieb Edward Ned Harvey:>> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs- >> owner@vger.kernel.org] On Behalf Of Martin Schitter >> >> well -- i am doing a backup of all images every night. this >> process should work like a simple "scrub" because all data (and its >> checksumes) will be read. > > Sorry, not correct. When you read all the data using something in > user-land, the OS only needs to read one side of the data. It can > accelerate by staggering the read requests across multiple disks. So > some sectors remain unread on some disks. > > When you scrub, it reads all the data from all the redundant copies > (mirrored or raid) on all the individual disks in the raid set.ok -- i see -- you''re right! i know, there a some befits in the way btrfs and zfs implement RAID / multiply disk usage and checksumming, but i a also want to stay on the save side, when it comes to real practical problems. so i decided to use ''classical'' linux software RAID-1 as the base layer. that''s a very old fashioned solution, but it usually simply works... and you can change a broken disk without any respect of the used filesystem(s). in general i try to use btrfs only on account of its snapshot features in a very simple way. it looks very strange to me, that i don''t see any SMART warnings on the harddisks or errors on other filsystems on the same raid-array. there was also no reboot, power-failure or similar when the corruption suddenly appeared. so i think, a btrfs bug would be the most evident explanation. martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Excerpts from Martin Schitter''s message of 2011-05-04 10:42:51 -0400:> Am 2011-05-04 15:23, schrieb Edward Ned Harvey: > >> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs- > >> owner@vger.kernel.org] On Behalf Of Martin Schitter > >> > >> well -- i am doing a backup of all images every night. this > >> process should work like a simple "scrub" because all data (and its > >> checksumes) will be read. > > > > Sorry, not correct. When you read all the data using something in > > user-land, the OS only needs to read one side of the data. It can > > accelerate by staggering the read requests across multiple disks. So > > some sectors remain unread on some disks. > > > > When you scrub, it reads all the data from all the redundant copies > > (mirrored or raid) on all the individual disks in the raid set. > > ok -- i see -- you''re right! > > i know, there a some befits in the way btrfs and zfs implement RAID / > multiply disk usage and checksumming, but i a also want to stay on the > save side, when it comes to real practical problems. so i decided to use > ''classical'' linux software RAID-1 as the base layer. that''s a very old > fashioned solution, but it usually simply works... and you can change a > broken disk without any respect of the used filesystem(s). in general i > try to use btrfs only on account of its snapshot features in a very > simple way. > > it looks very strange to me, that i don''t see any SMART warnings on the > harddisks or errors on other filsystems on the same raid-array. there > was also no reboot, power-failure or similar when the corruption > suddenly appeared. so i think, a btrfs bug would be the most evident > explanation.That''s the bad news, it can be very hard to tell. The disk could be returning garbage or btrfs would be messing up the csums. The btrfs unstable tree does have one fix that is related to O_DIRECT and kvm, but we''ve only ever seen it happen with a windows guest. This doesn''t mean it is impossible for a linux guest to trigger it though. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html