thr3ads.net - zfs discuss - [zfs-discuss] Does ZFS retry reads for blocks with bad checksum? [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Alexander Kolbasov

2005-Nov-30 23:56 UTC

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

I am wondering whether ZFS re-tries reads for blocks having bad checksum? This 
may help in case the corruption is caused by controller bit flips rather than 
on-disk data. Also, is there a mode when it verifies checksum for blocks 
written on disk and rewrites if it is incorrect? Or we should rely on 
mirroring for that?

My concern is the use of ZFS on smaller systems (like laptops and desktops) 
with a single disk where mirroring is not very practical.

- Alexander Kolbasov

Casper.Dik at Sun.COM

2005-Dec-01 19:09 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

>I am wondering whether ZFS re-tries reads for blocks having bad checksum?
This
>may help in case the corruption is caused by controller bit flips rather
than
>on-disk data. Also, is there a mode when it verifies checksum for blocks 
>written on disk and rewrites if it is incorrect? Or we should rely on 
>mirroring for that?
Well, if it reads an incorrect checksum how can it now it''s just the
checksum which is incorrect, and not the data.
>My concern is the use of ZFS on smaller systems (like laptops and desktops) 
>with a single disk where mirroring is not very practical.
I thought there was a way to read "corrupt" data or is that a future
direction?  A file should return EIO if the checksum doesn''t match.

The rereading is an interessting point, though; but where are controller
bit flips most likely to happen; on the way to the disk''s read cache
(from which the data will be re-read) or somewhere else?

Casper

Torrey McMahon

2005-Dec-01 19:43 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

Casper.Dik at sun.com wrote:>
> I thought there was a way to read "corrupt" data or is that a
future
> direction?  A file should return EIO if the checksum doesn''t
match.
>
> The rereading is an interessting point, though; but where are controller
> bit flips most likely to happen; on the way to the disk''s read
cache
> (from which the data will be re-read) or somewhere else?
>   
Completely anecdotal but I saw more in the drives themselves then in the 
controllers.

Richard Elling

2005-Dec-01 22:49 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

Torrey McMahon wrote:> Casper.Dik at sun.com wrote:
>> The rereading is an interessting point, though; but where are
controller
>> bit flips most likely to happen; on the way to the disk''s read
cache
>> (from which the data will be re-read) or somewhere else?
> 
> Completely anecdotal but I saw more in the drives themselves then in the 
> controllers.
This makes sense as the drives themselves are very cost-sensitive,
certainly more than a RAID controller.

The problem with rereads is that we really don''t want to spend
much time (ok, any time) attempting to redo faulty commands when
we have good redundant data.  In other words, there is a policy
opportunity here.  For non-redundant data, reread for checksum
might be ok, as long as it doesn''t take minutes.  For redundant
data, recover the data, mark the block bad, and move on, ASAP.
Any reduction in MTTR is goodness.
  -- richard

Alexander Kolbasov

2005-Dec-01 23:08 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

> Torrey McMahon wrote:
> > Casper.Dik at sun.com wrote:
> >> The rereading is an interessting point, though; but where are
controller
> >> bit flips most likely to happen; on the way to the disk''s
read cache
> >> (from which the data will be re-read) or somewhere else?
> > 
> > Completely anecdotal but I saw more in the drives themselves then in
the
> > controllers.
> 
> This makes sense as the drives themselves are very cost-sensitive,
> certainly more than a RAID controller.
> 
> The problem with rereads is that we really don''t want to spend
> much time (ok, any time) attempting to redo faulty commands when
> we have good redundant data.  In other words, there is a policy
> opportunity here.  For non-redundant data, reread for checksum
> might be ok, as long as it doesn''t take minutes.  For redundant
> data, recover the data, mark the block bad, and move on, ASAP.
> Any reduction in MTTR is goodness.
I agree that if there is enough redundancy to simply re-create the faulty data 
there is no need to do extra work, but in cases where there data can''t
be
recreated it is better to spend more time than to loose it.

BTW, does ZFS have mechanism for marking bad blocks on disk?

- Alexander Kolbasov

Bill Sommerfeld

2005-Dec-02 04:18 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

On Thu, 2005-12-01 at 15:08 -0800, Alexander Kolbasov
wrote:> BTW, does ZFS have mechanism for marking bad blocks on disk?
i can''t see that it would be needed.  With modern disk drives (where as
best as I can tell, modern means "under 20 years old"), if you write
over a sector you can''t read, the disk drive firmware remaps it to a
spare sector.

Chris Gerhard

2005-Dec-02 08:33 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

Casper.Dik at sun.com wrote:>> I am wondering whether ZFS re-tries reads for blocks having bad
checksum? This
>> may help in case the corruption is caused by controller bit flips
rather than
>> on-disk data. Also, is there a mode when it verifies checksum for
blocks
>> written on disk and rewrites if it is incorrect? Or we should rely on 
>> mirroring for that?
> 
> Well, if it reads an incorrect checksum how can it now it''s just
the
> checksum which is incorrect, and not the data.
It knows because the checksum is not stored with the data and is itself 
protected by a checksum which itself is protected by a checksum all the 
way back to the uber block.  How the uber block is protected I don''t
know.

> 
> The rereading is an interessting point, though; but where are controller
> bit flips most likely to happen; on the way to the disk''s read
cache
> (from which the data will be re-read) or somewhere else?
> 
On many (cheap) drives the disks cache has no protection, not even 
parity, so that is a fertile ground for bit flip and re-reading does not 
help as you read from cache.

However there have been failures in host bus adapters that would have 
been saved by a re-read. With SANS the potential for other devices 
injecting errors is there when the SCSI CRC is recalculated.


-- 
Chris Gerhard.                               __o __o __o
PTS in Europe                               _`\<,`\<,`\<,_
Sun Microsystems Limited                   (*)/---/---/ (*)
Phone: +44 (0) 1252 426033 (ext 26033)
-----------------------------------------------------------
http://blogs.sun.com/chrisg
-----------------------------------------------------------
NOTICE: This email message is for the sole use of the
intended recipient(s) and may contain confidential and
privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited.
If you are not the intended recipient, please contact
the sender by reply email and destroy all copies of the
original message.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3186 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20051202/d2473a74/attachment.bin>

Roch Bourbonnais - Performance Engineering

2005-Dec-02 09:11 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

From: Casper.Dik at Sun.COM
  Sender: zfs-discuss-bounces at opensolaris.org
  To: Alexander Kolbasov <akolb at eng.sun.com>
  Cc: zfs-discuss at opensolaris.org
  Subject: Re: [zfs-discuss] Does ZFS retry reads for blocks with bad checksum?
  Date: Thu, 01 Dec 2005 20:09:31 +0100

  >I am wondering whether ZFS re-tries reads for blocks having bad checksum?
This
  >may help in case the corruption is caused by controller bit flips rather
than
  >on-disk data. Also, is there a mode when it verifies checksum for blocks 
  >written on disk and rewrites if it is incorrect? Or we should rely on 
  >mirroring for that?

  Well, if it reads an incorrect checksum how can it now it''s just the
  checksum which is incorrect, and not the data.

I think the checksum for  block B is stored  in Block A that
references it. So if Block A itself checksummed fine then we
know that  it''s block B which is  corrupt. I guess  that the
uberblock is protected through some other means.

-r

Joerg Schilling

2005-Dec-02 10:46 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

Casper.Dik at Sun.COM wrote:
>
> >I am wondering whether ZFS re-tries reads for blocks having bad
checksum? This
> >may help in case the corruption is caused by controller bit flips
rather than
> >on-disk data. Also, is there a mode when it verifies checksum for
blocks
> >written on disk and rewrites if it is incorrect? Or we should rely on 
> >mirroring for that?
>
> Well, if it reads an incorrect checksum how can it now it''s just
the
> checksum which is incorrect, and not the data.
I am not sure what you are replying to..... if you implement read  after write,
you still have the right checkum in the kernel buffer you did just write.

> The rereading is an interessting point, though; but where are controller
> bit flips most likely to happen; on the way to the disk''s read
cache
> (from which the data will be re-read) or somewhere else?
If a block is flaky, flipping bits are very probable and a re-reading at the 
wrong level (usually done by the driver but the driver does not know about
checksums) does not help.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de		(uni)  
       schilling at fokus.fraunhofer.de	(work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Richard Elling

2005-Dec-02 17:11 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[lot''s of good questions in this thread ;-)]

Bill Sommerfeld wrote:> On Thu, 2005-12-01 at 15:08 -0800, Alexander Kolbasov wrote:
> 
>>BTW, does ZFS have mechanism for marking bad blocks on disk?
> 
> i can''t see that it would be needed.  With modern disk drives
(where as
> best as I can tell, modern means "under 20 years old"), if you
write
> over a sector you can''t read, the disk drive firmware remaps it to
a
> spare sector.
I don''t think we gain anything by limiting our horizon to current
disk technology.  Such tactics are often deficiencies over the long term.

One thing we do know is that as the aerial density increases, the bit rot
rate also increases.  Disk-based media (sector) remapping solves only a
limited set of the fault modes here.

  -- richard-who-can''t-wait-for-the-day-disk-drives-are-obsolete :-)

Tabriz Leman

2005-Dec-02 19:57 UTC

head link

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

Chris Gerhard wrote:> Casper.Dik at sun.com wrote:
>>> I am wondering whether ZFS re-tries reads for blocks having bad 
>>> checksum? This may help in case the corruption is caused by 
>>> controller bit flips rather than on-disk data. Also, is there a
mode
>>> when it verifies checksum for blocks written on disk and rewrites
if
>>> it is incorrect? Or we should rely on mirroring for that?
>>
>> Well, if it reads an incorrect checksum how can it now it''s
just the
>> checksum which is incorrect, and not the data.
>
> It knows because the checksum is not stored with the data and is 
> itself protected by a checksum which itself is protected by a checksum 
> all the way back to the uber block.  How the uber block is protected I 
> don''t know.
>
There are three types of blocks which are self-checksumming: the 
uberblock, zfs intent log blocks, and gang blocks.  In these three 
cases, the checksum is stored in the same block as its data.  For these 
(self-checksummed) blocks, I suppose that it is unknown whether the 
checksum or data is corrupt (when faced with a checksum failure).

It should be noted that the uberblock is also protected by redundancy.  
ZFS stores 4 copies of the most recently updated uberblock (one in each 
label).  Thus, if one does not checksum, we have 3 other copies to 
explore (in addition to any sort of redundancy that the pool might have).

Tabriz

zfs discuss - Nov 2005 - Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?

[zfs-discuss] Does ZFS retry reads for blocks with bad checksum?