I came across the tidbit that ZFS has a contract guarantee that the data read back will either be correct (the checksum computed over the data read from the disk matches the checksum stored on disk), or you get an I/O error. Obviously, this greatly reduces the probability that the data is invalid. (Particularly when taken in combination with the disk firmware''s own ECC and checksumming.) With the default options, does btrfs make any similar guarantees? If not, then are there any options to force it to make such guarantees? I''m interested in this both from a specification and an implementation point of view. The last thing anyone wants is probably undetected bit rot, and with today''s large drives, even with the quite low bit rot numbers it can be a real concern. If even the act of simply successfully reading a file guarantees, to the extent of the checksumming algorithm''s ability to detect changes, that the data read is the same as was once written, that would be a major selling point for btrfs for me personally. The closest I was able to find was that btrfs uses crc32c currently for data and metadata checksumming and that this can be turned off if so desired (using the "nodatasum" mount option), but nothing about what the file system code does or is supposed to do in the face of a checksum mismatch. -- Michael Kjörling • http://michael.kjorling.se • michael@kjorling.se “People who think they know everything really annoy those of us who know we don’t.” (Bjarne Stroustrup) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 27, 2012 at 09:56:45PM +0000, Michael Kjörling wrote:> I came across the tidbit that ZFS has a contract guarantee that the > data read back will either be correct (the checksum computed over the > data read from the disk matches the checksum stored on disk), or you > get an I/O error. Obviously, this greatly reduces the probability that > the data is invalid. (Particularly when taken in combination with the > disk firmware''s own ECC and checksumming.) > > With the default options, does btrfs make any similar guarantees? If > not, then are there any options to force it to make such guarantees?It does indeed do the same thing: if the checksum doesn''t match the block, then the alternative block is read (if one exists, e.g. RAID-1, RAID-10). If that does not exist, or also has a checksum failure, then EIO is returned. Hugo.> I''m interested in this both from a specification and an implementation > point of view. > > The last thing anyone wants is probably undetected bit rot, and with > today''s large drives, even with the quite low bit rot numbers it can > be a real concern. If even the act of simply successfully reading a > file guarantees, to the extent of the checksumming algorithm''s ability > to detect changes, that the data read is the same as was once written, > that would be a major selling point for btrfs for me personally. > > The closest I was able to find was that btrfs uses crc32c currently > for data and metadata checksumming and that this can be turned off if > so desired (using the "nodatasum" mount option), but nothing about > what the file system code does or is supposed to do in the face of a > checksum mismatch.-- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- It used to take a lot of talent and a certain type of --- upbringing to be perfectly polite and have filthy manners at the same time. Now all it needs is a computer.
On 27 Oct 2012 23:02 +0100, from hugo@carfax.org.uk (Hugo Mills):>> I came across the tidbit that ZFS has a contract guarantee that the >> data read back will either be correct (the checksum computed over the >> data read from the disk matches the checksum stored on disk), or you >> get an I/O error. Obviously, this greatly reduces the probability that >> the data is invalid. (Particularly when taken in combination with the >> disk firmware''s own ECC and checksumming.) >> >> With the default options, does btrfs make any similar guarantees? If >> not, then are there any options to force it to make such guarantees? > > It does indeed do the same thing: if the checksum doesn''t match the > block, then the alternative block is read (if one exists, e.g. RAID-1, > RAID-10). If that does not exist, or also has a checksum failure, then > EIO is returned.Great! This should perhaps be mentioned more clearly in the Wiki. Also, thanks for the prompt reply. -- Michael Kjörling • http://michael.kjorling.se • michael@kjorling.se “People who think they know everything really annoy those of us who know we don’t.” (Bjarne Stroustrup) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
In a raid1 situation, it will also rewrite the effected data, on the drive that failed the checksum -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Am Sonntag, 28. Oktober 2012 schrieb Ronnie Collinson:> In a raid1 situation, it will also rewrite the effected data, on the > drive that failed the checksumWill it do so without an explicit scrub? -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Oct 28, 2012 at 02:23:51PM +0100, Martin Steigerwald wrote:> Am Sonntag, 28. Oktober 2012 schrieb Ronnie Collinson: > > In a raid1 situation, it will also rewrite the effected data, on the > > drive that failed the checksum > > Will it do so without an explicit scrub?If a failed checksum is detected, yes. If there''s a bad block, and the FS happens to read the good copy first, it won''t fix it, because it hasn''t tried reading the bad copy yet. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ... one ping(1) to rule them all, and in the --- darkness bind(2) them.
Am Sonntag, 28. Oktober 2012 schrieb Hugo Mills:> On Sun, Oct 28, 2012 at 02:23:51PM +0100, Martin Steigerwald wrote: > > Am Sonntag, 28. Oktober 2012 schrieb Ronnie Collinson: > > > In a raid1 situation, it will also rewrite the effected data, on > > > the drive that failed the checksum > > > > Will it do so without an explicit scrub? > > If a failed checksum is detected, yes. > > If there''s a bad block, and the FS happens to read the good copy > first, it won''t fix it, because it hasn''t tried reading the bad copy > yet.Ah, okay. I think I read some while ago in a case of bad checksum detected it won´t repair automatically. Has this been changed? Anyway, a regular scrub still makes sense, as BTRFS only reads files that applications demand and BTRFS may read from a good copy as you pointed out. Thanks, -- Martin ''Helios'' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Oct 28, 2012 at 02:36:24PM +0100, Martin Steigerwald wrote:> Am Sonntag, 28. Oktober 2012 schrieb Hugo Mills: > > On Sun, Oct 28, 2012 at 02:23:51PM +0100, Martin Steigerwald wrote: > > > Am Sonntag, 28. Oktober 2012 schrieb Ronnie Collinson: > > > > In a raid1 situation, it will also rewrite the effected data, on > > > > the drive that failed the checksum > > > > > > Will it do so without an explicit scrub? > > > > If a failed checksum is detected, yes. > > > > If there''s a bad block, and the FS happens to read the good copy > > first, it won''t fix it, because it hasn''t tried reading the bad copy > > yet. > > Ah, okay. I think I read some while ago in a case of bad checksum detected > it won´t repair automatically. Has this been changed?It was changed some time ago -- the kernel release after scrub went in, IIRC.> Anyway, a regular scrub still makes sense, as BTRFS only reads files that > applications demand and BTRFS may read from a good copy as you pointed > out.Indeed. I have a cron job in /etc/cron.monthy for my main FS to do just that. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- ... one ping(1) to rule them all, and in the --- darkness bind(2) them.