Alexander Kolbasov
2005-Nov-30 23:56 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
I am wondering whether ZFS re-tries reads for blocks having bad checksum? This may help in case the corruption is caused by controller bit flips rather than on-disk data. Also, is there a mode when it verifies checksum for blocks written on disk and rewrites if it is incorrect? Or we should rely on mirroring for that? My concern is the use of ZFS on smaller systems (like laptops and desktops) with a single disk where mirroring is not very practical. - Alexander Kolbasov
Casper.Dik at Sun.COM
2005-Dec-01 19:09 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
>I am wondering whether ZFS re-tries reads for blocks having bad checksum? This >may help in case the corruption is caused by controller bit flips rather than >on-disk data. Also, is there a mode when it verifies checksum for blocks >written on disk and rewrites if it is incorrect? Or we should rely on >mirroring for that?Well, if it reads an incorrect checksum how can it now it''s just the checksum which is incorrect, and not the data.>My concern is the use of ZFS on smaller systems (like laptops and desktops) >with a single disk where mirroring is not very practical.I thought there was a way to read "corrupt" data or is that a future direction? A file should return EIO if the checksum doesn''t match. The rereading is an interessting point, though; but where are controller bit flips most likely to happen; on the way to the disk''s read cache (from which the data will be re-read) or somewhere else? Casper
Torrey McMahon
2005-Dec-01 19:43 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
Casper.Dik at sun.com wrote:> > I thought there was a way to read "corrupt" data or is that a future > direction? A file should return EIO if the checksum doesn''t match. > > The rereading is an interessting point, though; but where are controller > bit flips most likely to happen; on the way to the disk''s read cache > (from which the data will be re-read) or somewhere else? >Completely anecdotal but I saw more in the drives themselves then in the controllers.
Richard Elling
2005-Dec-01 22:49 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
Torrey McMahon wrote:> Casper.Dik at sun.com wrote: >> The rereading is an interessting point, though; but where are controller >> bit flips most likely to happen; on the way to the disk''s read cache >> (from which the data will be re-read) or somewhere else? > > Completely anecdotal but I saw more in the drives themselves then in the > controllers.This makes sense as the drives themselves are very cost-sensitive, certainly more than a RAID controller. The problem with rereads is that we really don''t want to spend much time (ok, any time) attempting to redo faulty commands when we have good redundant data. In other words, there is a policy opportunity here. For non-redundant data, reread for checksum might be ok, as long as it doesn''t take minutes. For redundant data, recover the data, mark the block bad, and move on, ASAP. Any reduction in MTTR is goodness. -- richard
Alexander Kolbasov
2005-Dec-01 23:08 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
> Torrey McMahon wrote: > > Casper.Dik at sun.com wrote: > >> The rereading is an interessting point, though; but where are controller > >> bit flips most likely to happen; on the way to the disk''s read cache > >> (from which the data will be re-read) or somewhere else? > > > > Completely anecdotal but I saw more in the drives themselves then in the > > controllers. > > This makes sense as the drives themselves are very cost-sensitive, > certainly more than a RAID controller. > > The problem with rereads is that we really don''t want to spend > much time (ok, any time) attempting to redo faulty commands when > we have good redundant data. In other words, there is a policy > opportunity here. For non-redundant data, reread for checksum > might be ok, as long as it doesn''t take minutes. For redundant > data, recover the data, mark the block bad, and move on, ASAP. > Any reduction in MTTR is goodness.I agree that if there is enough redundancy to simply re-create the faulty data there is no need to do extra work, but in cases where there data can''t be recreated it is better to spend more time than to loose it. BTW, does ZFS have mechanism for marking bad blocks on disk? - Alexander Kolbasov
Bill Sommerfeld
2005-Dec-02 04:18 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
On Thu, 2005-12-01 at 15:08 -0800, Alexander Kolbasov wrote:> BTW, does ZFS have mechanism for marking bad blocks on disk?i can''t see that it would be needed. With modern disk drives (where as best as I can tell, modern means "under 20 years old"), if you write over a sector you can''t read, the disk drive firmware remaps it to a spare sector.
Chris Gerhard
2005-Dec-02 08:33 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
Casper.Dik at sun.com wrote:>> I am wondering whether ZFS re-tries reads for blocks having bad checksum? This >> may help in case the corruption is caused by controller bit flips rather than >> on-disk data. Also, is there a mode when it verifies checksum for blocks >> written on disk and rewrites if it is incorrect? Or we should rely on >> mirroring for that? > > Well, if it reads an incorrect checksum how can it now it''s just the > checksum which is incorrect, and not the data.It knows because the checksum is not stored with the data and is itself protected by a checksum which itself is protected by a checksum all the way back to the uber block. How the uber block is protected I don''t know.> > The rereading is an interessting point, though; but where are controller > bit flips most likely to happen; on the way to the disk''s read cache > (from which the data will be re-read) or somewhere else? >On many (cheap) drives the disks cache has no protection, not even parity, so that is a fertile ground for bit flip and re-reading does not help as you read from cache. However there have been failures in host bus adapters that would have been saved by a re-read. With SANS the potential for other devices injecting errors is there when the SCSI CRC is recalculated. -- Chris Gerhard. __o __o __o PTS in Europe _`\<,`\<,`\<,_ Sun Microsystems Limited (*)/---/---/ (*) Phone: +44 (0) 1252 426033 (ext 26033) ----------------------------------------------------------- http://blogs.sun.com/chrisg ----------------------------------------------------------- NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3186 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20051202/d2473a74/attachment.bin>
Roch Bourbonnais - Performance Engineering
2005-Dec-02 09:11 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
From: Casper.Dik at Sun.COM Sender: zfs-discuss-bounces at opensolaris.org To: Alexander Kolbasov <akolb at eng.sun.com> Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Does ZFS retry reads for blocks with bad checksum? Date: Thu, 01 Dec 2005 20:09:31 +0100 >I am wondering whether ZFS re-tries reads for blocks having bad checksum? This >may help in case the corruption is caused by controller bit flips rather than >on-disk data. Also, is there a mode when it verifies checksum for blocks >written on disk and rewrites if it is incorrect? Or we should rely on >mirroring for that? Well, if it reads an incorrect checksum how can it now it''s just the checksum which is incorrect, and not the data. I think the checksum for block B is stored in Block A that references it. So if Block A itself checksummed fine then we know that it''s block B which is corrupt. I guess that the uberblock is protected through some other means. -r
Joerg Schilling
2005-Dec-02 10:46 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
Casper.Dik at Sun.COM wrote:> > >I am wondering whether ZFS re-tries reads for blocks having bad checksum? This > >may help in case the corruption is caused by controller bit flips rather than > >on-disk data. Also, is there a mode when it verifies checksum for blocks > >written on disk and rewrites if it is incorrect? Or we should rely on > >mirroring for that? > > Well, if it reads an incorrect checksum how can it now it''s just the > checksum which is incorrect, and not the data.I am not sure what you are replying to..... if you implement read after write, you still have the right checkum in the kernel buffer you did just write.> The rereading is an interessting point, though; but where are controller > bit flips most likely to happen; on the way to the disk''s read cache > (from which the data will be re-read) or somewhere else?If a block is flaky, flipping bits are very probable and a re-reading at the wrong level (usually done by the driver but the driver does not know about checksums) does not help. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Richard Elling
2005-Dec-02 17:11 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
[lot''s of good questions in this thread ;-)] Bill Sommerfeld wrote:> On Thu, 2005-12-01 at 15:08 -0800, Alexander Kolbasov wrote: > >>BTW, does ZFS have mechanism for marking bad blocks on disk? > > i can''t see that it would be needed. With modern disk drives (where as > best as I can tell, modern means "under 20 years old"), if you write > over a sector you can''t read, the disk drive firmware remaps it to a > spare sector.I don''t think we gain anything by limiting our horizon to current disk technology. Such tactics are often deficiencies over the long term. One thing we do know is that as the aerial density increases, the bit rot rate also increases. Disk-based media (sector) remapping solves only a limited set of the fault modes here. -- richard-who-can''t-wait-for-the-day-disk-drives-are-obsolete :-)
Tabriz Leman
2005-Dec-02 19:57 UTC
[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
Chris Gerhard wrote:> Casper.Dik at sun.com wrote: >>> I am wondering whether ZFS re-tries reads for blocks having bad >>> checksum? This may help in case the corruption is caused by >>> controller bit flips rather than on-disk data. Also, is there a mode >>> when it verifies checksum for blocks written on disk and rewrites if >>> it is incorrect? Or we should rely on mirroring for that? >> >> Well, if it reads an incorrect checksum how can it now it''s just the >> checksum which is incorrect, and not the data. > > It knows because the checksum is not stored with the data and is > itself protected by a checksum which itself is protected by a checksum > all the way back to the uber block. How the uber block is protected I > don''t know. >There are three types of blocks which are self-checksumming: the uberblock, zfs intent log blocks, and gang blocks. In these three cases, the checksum is stored in the same block as its data. For these (self-checksummed) blocks, I suppose that it is unknown whether the checksum or data is corrupt (when faced with a checksum failure). It should be noted that the uberblock is also protected by redundancy. ZFS stores 4 copies of the most recently updated uberblock (one in each label). Thus, if one does not checksum, we have 3 other copies to explore (in addition to any sort of redundancy that the pool might have). Tabriz