thr3ads.net - zfs discuss - [zfs-discuss] What are the usual suspects in data errors? [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Gary Mills

2009-Jan-14 22:39 UTC

[zfs-discuss] What are the usual suspects in data errors?

I realize that any error can occur in a storage subsystem, but most
of these have an extremely low probability.  I''m interested in this
discussion in only those that do occur occasionally, and that are
not catastrophic.

Consider the common configuration of two SCSI disks connected to
the same HBA that are configured as a mirror in some manner.  In this
case, the data path in general consists of:

o The application
o The filesystem
o The drivers
o The HBA
o The SCSI bus
o The controllers
o The heads and patters

Many of those components have their own error checking.  Some have
error correction.  For example, parity checking is done on a SCSI bus,
unless it''s specifically disabled.  Do SATA and PATA connections also
do error checking?  Disk sector I/O uses CRC error checking and
correction.  Memory buffers would often be protected by parity memory.
Is there any more that I''ve missed?

Now, let''s consider common errors.  To me, the most frequent would
be a bit error on a disk sector.  In this case, the controller would
report a CRC error and would not return bad data.  The filesystem
would obtain the data from its redundant copy.  I assume that ZFS
would also rewrite the bad sector to correct it.  The application
would not see an error.  Similar events would happen for a parity
error on the SCSI bus.

What can go wrong with the disk controller?  A simple seek to the
wrong track is not a problem because the track number is encoded on
the platter.  The controller will simply recalibrate the mechanism and
retry the seek.  If it computes the wrong sector, that would be a
problem.  Does this happen with any frequency?  In this case, ZFS
would detect a checksum error and obtain the data from its redundant
copy.

A logic error in ZFS might result in incorrect metadata being written
with valid checksum.  In this case, ZFS might panic on import or might
correct the error.  How is this sort of error prevented?

If the application wrote bad data to the filesystem, none of the
error checking in lower layers would detect it.  This would be
strictly an error in the application.

Some errors might result from a loss of power if some ZFS data was
written to a disk cache but never was written to the disk platter.
Again, ZFS might panic on import or might correct the error.  How is
this sort of error prevented?

After all of this discussion, what other errors can ZFS checksums
reasonably detect?  Certainly if some of the other error checking
failed to detect an error, ZFS would still detect one.  How likely
are these other error checks to fail?

Is there anything else I''ve missed in this analysis?

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

A Darren Dunham

2009-Jan-14 23:15 UTC

head link

[zfs-discuss] What are the usual suspects in data errors?

On Wed, Jan 14, 2009 at 04:39:03PM -0600, Gary Mills
wrote:> I realize that any error can occur in a storage subsystem, but most
> of these have an extremely low probability.  I''m interested in
this
> discussion in only those that do occur occasionally, and that are
> not catastrophic.
What level is "extremely low" here?
> Many of those components have their own error checking.  Some have
> error correction.  For example, parity checking is done on a SCSI bus,
> unless it''s specifically disabled.  Do SATA and PATA connections
also
> do error checking?  Disk sector I/O uses CRC error checking and
> correction.  Memory buffers would often be protected by parity memory.
> Is there any more that I''ve missed?
Reports suggest that bugs in drive firmware could account for errors at
a level that is not insignificant.
> What can go wrong with the disk controller?  A simple seek to the
> wrong track is not a problem because the track number is encoded on
> the platter.  The controller will simply recalibrate the mechanism and
> retry the seek.  If it computes the wrong sector, that would be a
> problem.  Does this happen with any frequency? 
Netapp documents certain rewrite bugs that they''ve specifically seen. 
I
would imagine they have good data on the frequency that they see it in
the field.
> In this case, ZFS
> would detect a checksum error and obtain the data from its redundant
> copy.
Correct.
> A logic error in ZFS might result in incorrect metadata being written
> with valid checksum.  In this case, ZFS might panic on import or might
> correct the error.  How is this sort of error prevented?
It''s very difficult to protect yourself from software bugs with the
same
piece of software.  You can create assertions that are hopefully simpler
and less prone to errors, but they will not catch all bugs.
> Some errors might result from a loss of power if some ZFS data was
> written to a disk cache but never was written to the disk platter.
> Again, ZFS might panic on import or might correct the error.  How is
> this sort of error prevented?
ZFS uses a multi-stage commit.  It relies on the "disk" responding to
a
request to flush caches to the disk.  If that assumption is correct,
then there is no problem in general with power issues.  The disk is
consistent both before and after the cache is flushed.

-- 
Darren

2009-Jan-15 00:38 UTC

head link

[zfs-discuss] What are the usual suspects in data errors?

darn, Darren, learning fast!

best,
z


----- Original Message ----- 
From: "A Darren Dunham" <ddunham at taos.com>
To: <zfs-discuss at opensolaris.org>
Sent: Wednesday, January 14, 2009 6:15 PM
Subject: Re: [zfs-discuss] What are the usual suspects in data errors?

> On Wed, Jan 14, 2009 at 04:39:03PM -0600, Gary Mills wrote:
>> I realize that any error can occur in a storage subsystem, but most
>> of these have an extremely low probability.  I''m interested in
this
>> discussion in only those that do occur occasionally, and that are
>> not catastrophic.
> 
> What level is "extremely low" here?
> 
>> Many of those components have their own error checking.  Some have
>> error correction.  For example, parity checking is done on a SCSI bus,
>> unless it''s specifically disabled.  Do SATA and PATA
connections also
>> do error checking?  Disk sector I/O uses CRC error checking and
>> correction.  Memory buffers would often be protected by parity memory.
>> Is there any more that I''ve missed?
> 
> Reports suggest that bugs in drive firmware could account for errors at
> a level that is not insignificant.
> 
>> What can go wrong with the disk controller?  A simple seek to the
>> wrong track is not a problem because the track number is encoded on
>> the platter.  The controller will simply recalibrate the mechanism and
>> retry the seek.  If it computes the wrong sector, that would be a
>> problem.  Does this happen with any frequency? 
> 
> Netapp documents certain rewrite bugs that they''ve specifically
seen.  I
> would imagine they have good data on the frequency that they see it in
> the field.
> 
>> In this case, ZFS
>> would detect a checksum error and obtain the data from its redundant
>> copy.
> 
> Correct.
> 
>> A logic error in ZFS might result in incorrect metadata being written
>> with valid checksum.  In this case, ZFS might panic on import or might
>> correct the error.  How is this sort of error prevented?
> 
> It''s very difficult to protect yourself from software bugs with
the same
> piece of software.  You can create assertions that are hopefully simpler
> and less prone to errors, but they will not catch all bugs.
> 
>> Some errors might result from a loss of power if some ZFS data was
>> written to a disk cache but never was written to the disk platter.
>> Again, ZFS might panic on import or might correct the error.  How is
>> this sort of error prevented?
> 
> ZFS uses a multi-stage commit.  It relies on the "disk"
responding to a
> request to flush caches to the disk.  If that assumption is correct,
> then there is no problem in general with power issues.  The disk is
> consistent both before and after the cache is flushed.
> 
> -- 
> Darren
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

2009-Jan-15 00:48 UTC

head link

[zfs-discuss] What are the usual suspects in data errors?

folks, please, chatting on - don''t make me stop you, we are all open
folks.


[but darn]

ok, thank you much for the anticipation for something actually useful, here 
is another thing I shared with MS Storage but not with you folks yet --

we win with real advantages, not lies, not scales, but only real knowhow.

cheers,
z



----- Original Message ----- 
From: "JZ" <jz at excelsioritsolutions.com>
To: "A Darren Dunham" <ddunham at taos.com>; <zfs-discuss at
opensolaris.org>
Sent: Wednesday, January 14, 2009 7:38 PM
Subject: Re: [zfs-discuss] What are the usual suspects in data errors?

> darn, Darren, learning fast!
>
> best,
> z
>
>
> ----- Original Message ----- 
> From: "A Darren Dunham" <ddunham at taos.com>
> To: <zfs-discuss at opensolaris.org>
> Sent: Wednesday, January 14, 2009 6:15 PM
> Subject: Re: [zfs-discuss] What are the usual suspects in data errors?
>
>
>> On Wed, Jan 14, 2009 at 04:39:03PM -0600, Gary Mills wrote:
>>> I realize that any error can occur in a storage subsystem, but most
>>> of these have an extremely low probability.  I''m
interested in this
>>> discussion in only those that do occur occasionally, and that are
>>> not catastrophic.
>>
>> What level is "extremely low" here?
>>
>>> Many of those components have their own error checking.  Some have
>>> error correction.  For example, parity checking is done on a SCSI
bus,
>>> unless it''s specifically disabled.  Do SATA and PATA
connections also
>>> do error checking?  Disk sector I/O uses CRC error checking and
>>> correction.  Memory buffers would often be protected by parity
memory.
>>> Is there any more that I''ve missed?
>>
>> Reports suggest that bugs in drive firmware could account for errors at
>> a level that is not insignificant.
>>
>>> What can go wrong with the disk controller?  A simple seek to the
>>> wrong track is not a problem because the track number is encoded on
>>> the platter.  The controller will simply recalibrate the mechanism
and
>>> retry the seek.  If it computes the wrong sector, that would be a
>>> problem.  Does this happen with any frequency?
>>
>> Netapp documents certain rewrite bugs that they''ve
specifically seen.  I
>> would imagine they have good data on the frequency that they see it in
>> the field.
>>
>>> In this case, ZFS
>>> would detect a checksum error and obtain the data from its
redundant
>>> copy.
>>
>> Correct.
>>
>>> A logic error in ZFS might result in incorrect metadata being
written
>>> with valid checksum.  In this case, ZFS might panic on import or
might
>>> correct the error.  How is this sort of error prevented?
>>
>> It''s very difficult to protect yourself from software bugs
with the same
>> piece of software.  You can create assertions that are hopefully
simpler
>> and less prone to errors, but they will not catch all bugs.
>>
>>> Some errors might result from a loss of power if some ZFS data was
>>> written to a disk cache but never was written to the disk platter.
>>> Again, ZFS might panic on import or might correct the error.  How
is
>>> this sort of error prevented?
>>
>> ZFS uses a multi-stage commit.  It relies on the "disk"
responding to a
>> request to flush caches to the disk.  If that assumption is correct,
>> then there is no problem in general with power issues.  The disk is
>> consistent both before and after the cache is flushed.
>>
>> -- 
>> Darren
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

2009-Jan-15 01:38 UTC

head link

[zfs-discuss] What are the usual suspects in data errors?

Folks, I am very sorry, for don''t know how to be not misleading.

I was not challenging the Ten Commandments.
That is a good code. And maybe the first one we need to follow.

Vain and pride and arrogance and courage are all very different, but very 
similar.
Before you can truely understand the code of love, you will have to be very 
careful.

And then, there are beyond.

Folks, this is a technology discussion, not a religious discussion.
I love you all.
I do not want to see you folks cannot make to your dream states with your 
technology knowhow just because you don''t even understand the basic
code of
love.

Folks, I love you all.
OMG, I did not teach King and High anything beyond what I have said here in 
open.
It that not enough to make the darn open discussion go on?
Please.


[do you know if not because of the 400000 friends, I can be dead by now 
talking this much to an open list???]

best,
z


----- Original Message ----- 
From: "JZ" <jz at excelsioritsolutions.com>
To: "A Darren Dunham" <ddunham at taos.com>; <zfs-discuss at
opensolaris.org>
Sent: Wednesday, January 14, 2009 7:48 PM
Subject: Re: [zfs-discuss] What are the usual suspects in data errors?

> folks, please, chatting on - don''t make me stop you, we are all
open
> folks.
>
>
> [but darn]
>
> ok, thank you much for the anticipation for something actually useful, 
> here
> is another thing I shared with MS Storage but not with you folks yet --
>
> we win with real advantages, not lies, not scales, but only real knowhow.
>
> cheers,
> z
>
>
>
> ----- Original Message ----- 
> From: "JZ" <jz at excelsioritsolutions.com>
> To: "A Darren Dunham" <ddunham at taos.com>;
<zfs-discuss at opensolaris.org>
> Sent: Wednesday, January 14, 2009 7:38 PM
> Subject: Re: [zfs-discuss] What are the usual suspects in data errors?
>
>
>> darn, Darren, learning fast!
>>
>> best,
>> z
>>
>>
>> ----- Original Message ----- 
>> From: "A Darren Dunham" <ddunham at taos.com>
>> To: <zfs-discuss at opensolaris.org>
>> Sent: Wednesday, January 14, 2009 6:15 PM
>> Subject: Re: [zfs-discuss] What are the usual suspects in data errors?
>>
>>
>>> On Wed, Jan 14, 2009 at 04:39:03PM -0600, Gary Mills wrote:
>>>> I realize that any error can occur in a storage subsystem, but
most
>>>> of these have an extremely low probability.  I''m
interested in this
>>>> discussion in only those that do occur occasionally, and that
are
>>>> not catastrophic.
>>>
>>> What level is "extremely low" here?
>>>
>>>> Many of those components have their own error checking.  Some
have
>>>> error correction.  For example, parity checking is done on a
SCSI bus,
>>>> unless it''s specifically disabled.  Do SATA and PATA
connections also
>>>> do error checking?  Disk sector I/O uses CRC error checking and
>>>> correction.  Memory buffers would often be protected by parity
memory.
>>>> Is there any more that I''ve missed?
>>>
>>> Reports suggest that bugs in drive firmware could account for
errors at
>>> a level that is not insignificant.
>>>
>>>> What can go wrong with the disk controller?  A simple seek to
the
>>>> wrong track is not a problem because the track number is
encoded on
>>>> the platter.  The controller will simply recalibrate the
mechanism and
>>>> retry the seek.  If it computes the wrong sector, that would be
a
>>>> problem.  Does this happen with any frequency?
>>>
>>> Netapp documents certain rewrite bugs that they''ve
specifically seen.  I
>>> would imagine they have good data on the frequency that they see it
in
>>> the field.
>>>
>>>> In this case, ZFS
>>>> would detect a checksum error and obtain the data from its
redundant
>>>> copy.
>>>
>>> Correct.
>>>
>>>> A logic error in ZFS might result in incorrect metadata being
written
>>>> with valid checksum.  In this case, ZFS might panic on import
or might
>>>> correct the error.  How is this sort of error prevented?
>>>
>>> It''s very difficult to protect yourself from software bugs
with the same
>>> piece of software.  You can create assertions that are hopefully
simpler
>>> and less prone to errors, but they will not catch all bugs.
>>>
>>>> Some errors might result from a loss of power if some ZFS data
was
>>>> written to a disk cache but never was written to the disk
platter.
>>>> Again, ZFS might panic on import or might correct the error. 
How is
>>>> this sort of error prevented?
>>>
>>> ZFS uses a multi-stage commit.  It relies on the "disk"
responding to a
>>> request to flush caches to the disk.  If that assumption is
correct,
>>> then there is no problem in general with power issues.  The disk is
>>> consistent both before and after the cache is flushed.
>>>
>>> -- 
>>> Darren
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

2009-Jan-15 03:40 UTC

head link

[zfs-discuss] a Min Wang person emailed me for free knowledge

ok, you open folks are really ????.
just one more, and I hope someone replies so we can save some open time.


the code of ?.
and what is the relationship between ? and love?

here, some public info -
again, I am only saying this piece in Chinese is pretty-readable in my 
taste, not too much to attack, but hey, whoever wrote this, don''t be a
hot
head.  [to that blog writter: "darn", if you also know what that
means]
http://blog.ce.cn/html/04/101804-15445.html


and, someone on the list can you please provide a translated url to save 
some open time? every darn hour multiplied by the number of readers here, 
the help better comes fast, before I darn all of you open folks.







[for the beloved ones, attached an even better code.  There has been a tough 
nut, let me see if I can crack that nut with this public code.  :-) ]




z, at home, wondering why Daisy baby is not calling... not interested in the 
list discussion anymore





----- Original Message ----- 
From: "JZ" <jz at excelsioritsolutions.com>
To: "A Darren Dunham" <ddunham at taos.com>; <zfs-discuss at
opensolaris.org>
Sent: Wednesday, January 14, 2009 8:38 PM
Subject: Re: [zfs-discuss] What are the usual suspects in data errors?

> Folks, I am very sorry, for don''t know how to be not misleading.
>
> I was not challenging the Ten Commandments.
> That is a good code. And maybe the first one we need to follow.
>
> Vain and pride and arrogance and courage are all very different, but very
> similar.
> Before you can truely understand the code of love, you will have to be 
> very
> careful.
>
> And then, there are beyond.
>
> Folks, this is a technology discussion, not a religious discussion.
> I love you all.
> I do not want to see you folks cannot make to your dream states with your
> technology knowhow just because you don''t even understand the
basic code
> of
> love.
>
> Folks, I love you all.
> OMG, I did not teach King and High anything beyond what I have said here 
> in
> open.
> It that not enough to make the darn open discussion go on?
> Please.
>
>
> [do you know if not because of the 400000 friends, I can be dead by now
> talking this much to an open list???]
>
> best,
> z
>
>
> ----- Original Message ----- 
> From: "JZ" <jz at excelsioritsolutions.com>
> To: "A Darren Dunham" <ddunham at taos.com>;
<zfs-discuss at opensolaris.org>
> Sent: Wednesday, January 14, 2009 7:48 PM
> Subject: Re: [zfs-discuss] What are the usual suspects in data errors?
>
>
>> folks, please, chatting on - don''t make me stop you, we are
all open
>> folks.
>>
>>
>> [but darn]
>>
>> ok, thank you much for the anticipation for something actually useful,
>> here
>> is another thing I shared with MS Storage but not with you folks yet --
>>
>> we win with real advantages, not lies, not scales, but only real
knowhow.
>>
>> cheers,
>> z
>>
>>
>>
>> ----- Original Message ----- 
>> From: "JZ" <jz at excelsioritsolutions.com>
>> To: "A Darren Dunham" <ddunham at taos.com>;
<zfs-discuss at opensolaris.org>
>> Sent: Wednesday, January 14, 2009 7:38 PM
>> Subject: Re: [zfs-discuss] What are the usual suspects in data errors?
>>
>>
>>> darn, Darren, learning fast!
>>>
>>> best,
>>> z
>>>
>>>
>>> ----- Original Message ----- 
>>> From: "A Darren Dunham" <ddunham at taos.com>
>>> To: <zfs-discuss at opensolaris.org>
>>> Sent: Wednesday, January 14, 2009 6:15 PM
>>> Subject: Re: [zfs-discuss] What are the usual suspects in data
errors?
>>>
>>>
>>>> On Wed, Jan 14, 2009 at 04:39:03PM -0600, Gary Mills wrote:
>>>>> I realize that any error can occur in a storage subsystem,
but most
>>>>> of these have an extremely low probability.  I''m
interested in this
>>>>> discussion in only those that do occur occasionally, and
that are
>>>>> not catastrophic.
>>>>
>>>> What level is "extremely low" here?
>>>>
>>>>> Many of those components have their own error checking. 
Some have
>>>>> error correction.  For example, parity checking is done on
a SCSI bus,
>>>>> unless it''s specifically disabled.  Do SATA and
PATA connections also
>>>>> do error checking?  Disk sector I/O uses CRC error checking
and
>>>>> correction.  Memory buffers would often be protected by
parity memory.
>>>>> Is there any more that I''ve missed?
>>>>
>>>> Reports suggest that bugs in drive firmware could account for
errors at
>>>> a level that is not insignificant.
>>>>
>>>>> What can go wrong with the disk controller?  A simple seek
to the
>>>>> wrong track is not a problem because the track number is
encoded on
>>>>> the platter.  The controller will simply recalibrate the
mechanism and
>>>>> retry the seek.  If it computes the wrong sector, that
would be a
>>>>> problem.  Does this happen with any frequency?
>>>>
>>>> Netapp documents certain rewrite bugs that they''ve
specifically seen.
>>>> I
>>>> would imagine they have good data on the frequency that they
see it in
>>>> the field.
>>>>
>>>>> In this case, ZFS
>>>>> would detect a checksum error and obtain the data from its
redundant
>>>>> copy.
>>>>
>>>> Correct.
>>>>
>>>>> A logic error in ZFS might result in incorrect metadata
being written
>>>>> with valid checksum.  In this case, ZFS might panic on
import or might
>>>>> correct the error.  How is this sort of error prevented?
>>>>
>>>> It''s very difficult to protect yourself from software
bugs with the
>>>> same
>>>> piece of software.  You can create assertions that are
hopefully
>>>> simpler
>>>> and less prone to errors, but they will not catch all bugs.
>>>>
>>>>> Some errors might result from a loss of power if some ZFS
data was
>>>>> written to a disk cache but never was written to the disk
platter.
>>>>> Again, ZFS might panic on import or might correct the
error.  How is
>>>>> this sort of error prevented?
>>>>
>>>> ZFS uses a multi-stage commit.  It relies on the
"disk" responding to a
>>>> request to flush caches to the disk.  If that assumption is
correct,
>>>> then there is no problem in general with power issues.  The
disk is
>>>> consistent both before and after the cache is flushed.
>>>>
>>>> -- 
>>>> Darren
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------------- next part --------------
A non-text attachment was scrubbed...
Name: Do Not Lie.wma
Type: audio/x-ms-wma
Size: 3553103 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090114/945d5568/attachment.bin>

Richard Elling

2009-Jan-15 04:00 UTC

head link

[zfs-discuss] What are the usual suspects in data errors?

well, since this is part of how I make my living, or at least
what is in my current job description...

Gary Mills wrote:> I realize that any error can occur in a storage subsystem, but most
> of these have an extremely low probability.  I''m interested in
this
> discussion in only those that do occur occasionally, and that are
> not catastrophic.
excellent... fertile ground for research.  One of the things
that we see occur with ZFS is that it detects errors which were
previously not detected.  You can see this happen on this forum
when people try to kill the canary (ZFS).  I think a better
analogy is astronomy: as our ability to see more of the universe
gets better, we see more of the universe -- but that also raises
the number of questions we can''t answer... well... yet...
> Consider the common configuration of two SCSI disks connected to
> the same HBA that are configured as a mirror in some manner.  In this
> case, the data path in general consists of:
Beware of the Decomposition Law which says, the part is more
than a fraction of the whole.  This is what trips people up when
they think that if every part performs flawlessly, then the whole
will perform flawlessly.
> o The application
> o The filesystem
> o The drivers
> o The HBA
> o The SCSI bus
> o The controllers
> o The heads and patters
> 
> Many of those components have their own error checking.  Some have
> error correction.  For example, parity checking is done on a SCSI bus,
> unless it''s specifically disabled.  Do SATA and PATA connections
also
> do error checking?  Disk sector I/O uses CRC error checking and
> correction.  Memory buffers would often be protected by parity memory.
> Is there any more that I''ve missed?
thousands more ;-)

> Now, let''s consider common errors.  To me, the most frequent would
> be a bit error on a disk sector.  In this case, the controller would
> report a CRC error and would not return bad data.  The filesystem
> would obtain the data from its redundant copy.  I assume that ZFS
> would also rewrite the bad sector to correct it.  The application
> would not see an error.  Similar events would happen for a parity
> error on the SCSI bus.
Nit: modern disks can detect and correct multiple byte errors in a
sector.  If ZFS can correct it (depends on the ZFS configuration)
then it will, but it will not rewrite the defective sector -- it
will write to a different sector.  While that seems better, it also
introduces at least one new failure mode and can help to expose
other, existing failure modes, such as phantom writes.
> What can go wrong with the disk controller?  A simple seek to the
> wrong track is not a problem because the track number is encoded on
> the platter.  The controller will simply recalibrate the mechanism and
> retry the seek.  If it computes the wrong sector, that would be a
> problem.  Does this happen with any frequency?  In this case, ZFS
> would detect a checksum error and obtain the data from its redundant
> copy.
> 
> A logic error in ZFS might result in incorrect metadata being written
> with valid checksum.  In this case, ZFS might panic on import or might
> correct the error.  How is this sort of error prevented?
> 
> If the application wrote bad data to the filesystem, none of the
> error checking in lower layers would detect it.  This would be
> strictly an error in the application.
> 
> Some errors might result from a loss of power if some ZFS data was
> written to a disk cache but never was written to the disk platter.
> Again, ZFS might panic on import or might correct the error.  How is
> this sort of error prevented?
> 
> After all of this discussion, what other errors can ZFS checksums
> reasonably detect?  Certainly if some of the other error checking
> failed to detect an error, ZFS would still detect one.  How likely
> are these other error checks to fail?
> 
> Is there anything else I''ve missed in this analysis?
Everything along the way.  If you search the archives here you will
find anecdotes of:
	+ bad disks -- of all sorts
	+ bad power supplies
	+ bad FC switch firmware
	+ flaky cables
	+ bugs in NIC drivers
	+ transient and permanent DRAM errors
	+ and, of course, bugs in ZFS code

Basically, anywhere you data touches can fail.

However, to make the problem tractable, we often
divide failures into two classifications:

1. mechanical, including quantum-mechanical

2. design or implementation, including software defects,
    design deficiencies, and manufacturing

There is a lot of experience with measurements of mechanical
failure modes, so we tend to have some ways to assign reliability
budgets and predictions.  For #2, the science we use for #1
doesn''t apply.
  -- richard

Miles Nordin

2009-Jan-15 17:33 UTC

head link

[zfs-discuss] What are the usual suspects in data errors?

>>>>> "gm" == Gary Mills <mills at
cc.umanitoba.ca> writes:
gm> Is there any more that I''ve missed?

1. Filesystem/RAID layer dispatches writes ''aaaaaaaaa'' to
iSCSI
initiator. iSCSI initiator accepts them, buffers them, returns
success to RAID layer.

2. iSCSI initiator sends to iSCSI target. iSCSI Target writes
''aaaaaaaa''.

3. Network connectivity is interrupted, target is rebooted, something like that.

4. Filesystem/RAID layer dispatches writes ''bbbbbbbb'' to iSCSI
initiator. initiator accepts, buffers, returns success.

5. iSCSI initiator can''t write ''bbbbbbbb''

6. iSCSI initiator goes through some cargo-cult error-recovery scheme.
retry this 3 times, timeout, disconnect, reconnect, retry
really-hard 5 times, timeout, return various errors to RAID layer,
maybe.

7. OH! Target''s back! good.

8. Filesystem/RAID layer writes ''ccccccccc'' to iSCSI
initiator. maybe
gets an error. maybe flags ''ccccccccc'' destination blocks
bad,
increments RAID-layer coutners, tries to ``rewrite'''' the
''cccccccc'', eventually gets success back from the
initiator.

9. Filesystem/RAID layer issues SYNCHRONIZE CACHE to the iSCSI
initiator.

10. initiator flushes ''cccccccc'' to the target, and waits for
target
to confirm ''ccccccc'' and all previous writes are on
physical
media.

11. initiator returns success for the SYNCHRONIZE CACHE command.

12. Filesystem/RAID layer writes ''d'' commit sector updating
pointers,
aiming various important things at ''bbbbbbbbb''

Now, the RAID layer thinks ''aaaaaaaaa'' and
''bbbbbbbbb'' and ''ccccccccc''
and ''d'' are all written, but in fact only
''aaaaaaaaa'' and ''cccccccccc''
and ''d'' are written, and ''d'' points at
garbage.

NFS has a state machine designed to handle server reboots without
breaking any consistency promises. Substitute ``the userland
app''''
for Filesystem/RAID, and ``NFSv3 client'''' for iSCSI initiator.
The
NFSv3 client will keep track of which writes are actually committed to
disk and batch them into commit blocks of which the userland app is
entirely unaware. The NFS client won''t free a commit block from its
RAM write cache until it''s on disk. If the server reboots it will
replay the open commit blocks. If the server AND client reboot the
commit block will be lost from RAM, but then ''d'' are not
written, so
the datastore is not corrupt. The iSCSI initiator probably needs to
do something similar to NFSv3 to enforce that success from SYNCHRONIZE
CACHE really means what ZFS thinks it means.

It''s a little trickier to do this with ZFS/iSCSI because the NFS
cop-out was to use ''hard'' mounts---you _never_ propogate write
failures up the stack. You just freeze the application until you can
finally complete the write, and if you can''t write you evade the
consistency guarantees by killing the app. Then, it''s a solveable
problem to design apps that won''t corrupt their datastores when
they''re killed, so the overall system works. This world order
won''t
work analagously for ZFS-on-iSCSI which needs to see failures to
handle redundancy.

We may even need some new kind of failure code to solve the problem,
but maybe something clever can be crammed into the old API. Imagine
the stream of writes to a disk as a bucket-brigade separated by
SYNCHRONIZE CACHE commands. The writes within each bucket can be
sloshed around (reordered) arbitrarily. And if the machine crashes,
we might pour _part_ of the water in the last bucket on the fire, but
then we stop and drop all the other buckets. So far, we can handle
it.

But we''ve no way to handle the situation where someone in the _middle_
of the brigade spills the water in his bucket. There''s no way to
cleanly restart the brigade after this happens. ZFS needs to
gracefully handle a SYNCHRONIZE CACHE command that returns _failure_,
and needs to interpret such a failure really aggressively, as in:

Any writes you issued since the last SYNCHRONIZE CACHE, *even if you
got a Success return to your block-layer write() command*, may or
may not be committed to disk, and waiting will NOT change the
situation---they''re just gone. But, the disk is still here, and is
working, meh, ~fine.

This failure is not ``retryable''''. If you issue a second
SYNCHRONIZE CACHE command, and it Succeeds, that does NOT change
what I''ve just told you. That Success only referrs to writes issued
between this failing SYNCHRONIZE CACHE command and the next one.

Once iSCSI initiator is fixed, probably we need to go back and add
NFS-style commit batches to even SATA disk drivers which can suffer
the same problem if you hot-swap them, or maybe even if you don''t
hot-swap but the disk reports some error which invokes some convoluted
sd/ssd exception handling involving ``resets''''. The
assumption
doesn''t hold that write, write, write, synchronize cache promises all
those writes are on-disk once synchronize cache returns. The only way
to make it hold is to promise to panic the kernel whenever any disk,
controller, bus, or iscsi session is ``reset''''---the simple,
obvious
``SYNCHRONIZE CACHE is the final word of God'''' assumption
ought to
handle cord-yanking just fine, but not smaller failures.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090115/707d4e3a/attachment.bin>

Kees Nuyt

2009-Jan-15 20:15 UTC

head link

[zfs-discuss] a Min Wang person emailed me for free knowledge

On Wed, 14 Jan 2009 22:40:19 -0500, "JZ"
<jz at excelsioritsolutions.com> wrote:
>ok, you open folks are really ????.
>just one more, and I hope someone replies so we can save some open time.
[snip]

JZ, would you please be so kind to refrain from including
any attachments in your postings to our beloved
zfs-discuss at opensolaris.org , especially large, binary ones?

They are not welcome here, and I''m pretty sure I''m not the
only one with that opinion.

Thanks in advance for your cooperation.

Regards,
-- 
  (  Kees Nuyt
  )
c[_]

2009-Jan-15 20:45 UTC

head link

[zfs-discuss] a Min Wang person emailed me for free knowledge

[last 5 minutes on my lunch, just to say thank you and sorry]

Yes, I was wondering how the first one even made to the list.
None of those emails with large attachments should have been approved by the 
mail server policy.

And I feel bad that I tested the server with some bad text and those got 
through.

best,
z


----- Original Message ----- 
From: "Kees Nuyt" <k.nuyt at zonnet.nl>
To: <zfs-discuss at opensolaris.org>
Sent: Thursday, January 15, 2009 3:15 PM
Subject: Re: [zfs-discuss] a Min Wang person emailed me for free knowledge

> On Wed, 14 Jan 2009 22:40:19 -0500, "JZ"
> <jz at excelsioritsolutions.com> wrote:
>
>>ok, you open folks are really ????.
>>just one more, and I hope someone replies so we can save some open time.
>
> [snip]
>
> JZ, would you please be so kind to refrain from including
> any attachments in your postings to our beloved
> zfs-discuss at opensolaris.org , especially large, binary ones?
>
> They are not welcome here, and I''m pretty sure I''m not
the
> only one with that opinion.
>
> Thanks in advance for your cooperation.
>
> Regards,
> -- 
>  (  Kees Nuyt
>  )
> c[_]
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

2009-Jan-17 18:03 UTC

head link

[zfs-discuss] a Min Wang person emailed me for free knowledge

Sorry folks, the mail server is just too advanced for me.
Tfhis email was in the server since 1/14/2009, and coming through now would 
be very misleading.

A side-method for Zhou is that (not for anyone to learn) when trouble comes 
and we are not ready to deal with the trouble, we just hide behind ladies 
since that wall provides better protection than any private network (due to 
the noise of my wall).

This Daisy baby has no personal relationship with me. She is a professional.
And Mohegan Sun is not my friend. [if you would like to play in that area, I 
say Foxwood, because Foxwood is my friend. Even though Daisy is my friend, I 
will not say Mohegan Sun is my friend because Mohegan Sun is not.]

I used Daisy''s name for cover, and that was all. [may not be a correct
thing
to do in many views, but what is done is done, if I should be punished, I 
will take that punishment from the sky with happiness, not a big deal.]


So please, focus on the ? part, not the Daisy baby part, in my post on that 
day.

And again, I had limited respect for the Solaris mail server and tested the 
server policy with some ridiculous methods. If I had offended anyone in that 
process, I am sorry.


Best,
z



----- Original Message ----- 
From: "JZ" <jz at excelsioritsolutions.com>
To: "A Darren Dunham" <ddunham at taos.com>; <zfs-discuss at
opensolaris.org>
Cc: <schen at mohegansun.com>; "Marvin Wang, Min" <mail2wm at
yahoo.com>
Sent: Wednesday, January 14, 2009 10:40 PM
Subject: Re: [zfs-discuss] a Min Wang person emailed me for free knowledge


ok, you open folks are really ????.
just one more, and I hope someone replies so we can save some open time.


the code of ?.
and what is the relationship between ? and love?

here, some public info -
again, I am only saying this piece in Chinese is pretty-readable in my
taste, not too much to attack, but hey, whoever wrote this, don''t be a
hot
head.  [to that blog writter: "darn", if you also know what that
means]
http://blog.ce.cn/html/04/101804-15445.html


and, someone on the list can you please provide a translated url to save
some open time? every darn hour multiplied by the number of readers here,
the help better comes fast, before I darn all of you open folks.







[for the beloved ones, attached an even better code.  There has been a tough
nut, let me see if I can crack that nut with this public code.  :-) ]




z, at home, wondering why Daisy baby is not calling... not interested in the
list discussion anymore





----- Original Message ----- 
From: "JZ" <jz at excelsioritsolutions.com>
To: "A Darren Dunham" <ddunham at taos.com>; <zfs-discuss at
opensolaris.org>
Sent: Wednesday, January 14, 2009 8:38 PM
Subject: Re: [zfs-discuss] What are the usual suspects in data errors?

> Folks, I am very sorry, for don''t know how to be not misleading.
>
> I was not challenging the Ten Commandments.
> That is a good code. And maybe the first one we need to follow.
>
> Vain and pride and arrogance and courage are all very different, but very
> similar.
> Before you can truely understand the code of love, you will have to be
> very
> careful.
>
> And then, there are beyond.
>
> Folks, this is a technology discussion, not a religious discussion.
> I love you all.
> I do not want to see you folks cannot make to your dream states with your
> technology knowhow just because you don''t even understand the
basic code
> of
> love.
>
> Folks, I love you all.
> OMG, I did not teach King and High anything beyond what I have said here
> in
> open.
> It that not enough to make the darn open discussion go on?
> Please.
>
>
> [do you know if not because of the 400000 friends, I can be dead by now
> talking this much to an open list???]
>
> best,
> z
>
>
> ----- Original Message ----- 
> From: "JZ" <jz at excelsioritsolutions.com>
> To: "A Darren Dunham" <ddunham at taos.com>;
<zfs-discuss at opensolaris.org>
> Sent: Wednesday, January 14, 2009 7:48 PM
> Subject: Re: [zfs-discuss] What are the usual suspects in data errors?
>
>
>> folks, please, chatting on - don''t make me stop you, we are
all open
>> folks.
>>
>>
>> [but darn]
>>
>> ok, thank you much for the anticipation for something actually useful,
>> here
>> is another thing I shared with MS Storage but not with you folks yet --
>>
>> we win with real advantages, not lies, not scales, but only real
knowhow.
>>
>> cheers,
>> z
>>
>>
>>
>> ----- Original Message ----- 
>> From: "JZ" <jz at excelsioritsolutions.com>
>> To: "A Darren Dunham" <ddunham at taos.com>;
<zfs-discuss at opensolaris.org>
>> Sent: Wednesday, January 14, 2009 7:38 PM
>> Subject: Re: [zfs-discuss] What are the usual suspects in data errors?
>>
>>
>>> darn, Darren, learning fast!
>>>
>>> best,
>>> z
>>>
>>>
>>> ----- Original Message ----- 
>>> From: "A Darren Dunham" <ddunham at taos.com>
>>> To: <zfs-discuss at opensolaris.org>
>>> Sent: Wednesday, January 14, 2009 6:15 PM
>>> Subject: Re: [zfs-discuss] What are the usual suspects in data
errors?
>>>
>>>
>>>> On Wed, Jan 14, 2009 at 04:39:03PM -0600, Gary Mills wrote:
>>>>> I realize that any error can occur in a storage subsystem,
but most
>>>>> of these have an extremely low probability.  I''m
interested in this
>>>>> discussion in only those that do occur occasionally, and
that are
>>>>> not catastrophic.
>>>>
>>>> What level is "extremely low" here?
>>>>
>>>>> Many of those components have their own error checking. 
Some have
>>>>> error correction.  For example, parity checking is done on
a SCSI bus,
>>>>> unless it''s specifically disabled.  Do SATA and
PATA connections also
>>>>> do error checking?  Disk sector I/O uses CRC error checking
and
>>>>> correction.  Memory buffers would often be protected by
parity memory.
>>>>> Is there any more that I''ve missed?
>>>>
>>>> Reports suggest that bugs in drive firmware could account for
errors at
>>>> a level that is not insignificant.
>>>>
>>>>> What can go wrong with the disk controller?  A simple seek
to the
>>>>> wrong track is not a problem because the track number is
encoded on
>>>>> the platter.  The controller will simply recalibrate the
mechanism and
>>>>> retry the seek.  If it computes the wrong sector, that
would be a
>>>>> problem.  Does this happen with any frequency?
>>>>
>>>> Netapp documents certain rewrite bugs that they''ve
specifically seen.
>>>> I
>>>> would imagine they have good data on the frequency that they
see it in
>>>> the field.
>>>>
>>>>> In this case, ZFS
>>>>> would detect a checksum error and obtain the data from its
redundant
>>>>> copy.
>>>>
>>>> Correct.
>>>>
>>>>> A logic error in ZFS might result in incorrect metadata
being written
>>>>> with valid checksum.  In this case, ZFS might panic on
import or might
>>>>> correct the error.  How is this sort of error prevented?
>>>>
>>>> It''s very difficult to protect yourself from software
bugs with the
>>>> same
>>>> piece of software.  You can create assertions that are
hopefully
>>>> simpler
>>>> and less prone to errors, but they will not catch all bugs.
>>>>
>>>>> Some errors might result from a loss of power if some ZFS
data was
>>>>> written to a disk cache but never was written to the disk
platter.
>>>>> Again, ZFS might panic on import or might correct the
error.  How is
>>>>> this sort of error prevented?
>>>>
>>>> ZFS uses a multi-stage commit.  It relies on the
"disk" responding to a
>>>> request to flush caches to the disk.  If that assumption is
correct,
>>>> then there is no problem in general with power issues.  The
disk is
>>>> consistent both before and after the cache is flushed.
>>>>
>>>> -- 
>>>> Darren
>>>> _______________________________________________
>>>> zfs-discuss mailing list
>>>> zfs-discuss at opensolaris.org
>>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>> _______________________________________________
>>> zfs-discuss mailing list
>>> zfs-discuss at opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--------------------------------------------------------------------------------

> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

zfs discuss - Jan 2009 - What are the usual suspects in data errors?

[zfs-discuss] What are the usual suspects in data errors?

[zfs-discuss] What are the usual suspects in data errors?

[zfs-discuss] What are the usual suspects in data errors?

[zfs-discuss] What are the usual suspects in data errors?

[zfs-discuss] What are the usual suspects in data errors?

[zfs-discuss] a Min Wang person emailed me for free knowledge

[zfs-discuss] What are the usual suspects in data errors?

[zfs-discuss] What are the usual suspects in data errors?

[zfs-discuss] a Min Wang person emailed me for free knowledge

[zfs-discuss] a Min Wang person emailed me for free knowledge

[zfs-discuss] a Min Wang person emailed me for free knowledge