thr3ads.net - zfs discuss - [zfs-discuss] Nice chassis for ZFS server [Dec 2007]

If this information is useful, please help other people find it:
Share via:

Mick Russom

2007-Dec-08 08:45 UTC

[zfs-discuss] Nice chassis for ZFS server

Are there benchmarks somewhere showing a RAID10 implemented on an LSI card with,
say, 128MB of cache being beaten in terms of performance by a similar zraid
configuration with no cache on the drive controller?

Somehow I don''t think they exist. I''m all for data scrubbing,
but this anti-raid-card movement is puzzling.
 
 
This message posted from opensolaris.org

2007-Dec-13 15:58 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

> this anti-raid-card movement is puzzling. 
I think you''ve misinterpreted my questions.
I queried the necessity of paying extra for an seemingly unnecessary RAID card
for zfs. I didn''t doubt that it could perform better.
Wasn''t one of the design briefs of zfs, that it would provide
it''s feature set without expensive RAID hardware?
Of course, if you have the money then you can always go faster, but this is a
zfs discussion thread (I know I''ve perpetuated the extravagant
cross-posting of the OP).
Cheers.
 
 
This message posted from opensolaris.org

Richard Elling

2007-Dec-13 17:37 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

MP wrote:>> this anti-raid-card movement is puzzling. 
>>     
>
> I think you''ve misinterpreted my questions.
> I queried the necessity of paying extra for an seemingly unnecessary RAID
card for zfs. I didn''t doubt that it could perform better.
> Wasn''t one of the design briefs of zfs, that it would provide
it''s feature set without expensive RAID hardware?
>   
In general, feature set != performance.  For example, a VIA 
x86-compatible processor
is not capable of beating the performance of a high-end Xeon, though the 
feature sets
are largely the same.  Additional examples abound.
 -- richard

2007-Dec-13 17:47 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

> Additional examples abound.
Doubtless :)

More usefully, can you confirm whether Solaris works on this chassis without the
RAID controller?
 
 
This message posted from opensolaris.org

can you guess?

2007-Dec-13 19:34 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

> Are there benchmarks somewhere showing a RAID10
> implemented on an LSI card with, say, 128MB of cache
> being beaten in terms of performance by a similar
> zraid configuration with no cache on the drive
> controller?
> 
> Somehow I don''t think they exist. I''m all for data
> scrubbing, but this anti-raid-card movement is
> puzzling.
Oh, for joy - a chance for me to say something *good* about ZFS. rather than
just try to balance out excessive enthusiasm.

Save for speeding up synchronous writes (if it has enough on-board NVRAM to hold
them until it''s convenient to destage them to disk), a RAID-10 card
should not enjoy any noticeable performance advantage over ZFS mirroring.

By contrast, if extremely rare undetected and (other than via ZFS checksums)
undetectable (or considerably more common undetected but detectable via disk ECC
codes, *if* the data is accessed) corruption occurs, if the RAID card is used to
mirror the data there''s a good chance that even ZFS''s
validation scans won''t see the problem (because the card happens to
access the good copy for the scan rather than the bad one) - in which case
you''ll lose that data if the disk with the good data fails.  And in the
case of (extremely rare) otherwise-undetectable corruption, if the card *does*
return the bad copy then IIRC ZFS (not knowing that a good copy also exists)
will just claim that the data is gone (though I don''t know if it will
then flag it such that you''ll never have an opportunity to find the
good copy).

If the RAID card scrubs its disks the difference (now limited to the extremely
rare undetectable-via-disk-ECC corruption) becomes pretty negligible - but
I''m not sure how many RAIDs below the near-enterprise category perform
such scrubs.

In other words, if you *don''t* otherwise scrub your disks then
ZFS''s checksums-plus-internal-scrubbing mechanisms assume greater
importance:  it''s only the contention that other solutions that *do*
offer scrubbing can''t compete with ZFS in effectively protecting your
data that''s somewhat over the top.

- bill
 
 
This message posted from opensolaris.org

Frank Cusack

2007-Dec-13 20:18 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

On December 13, 2007 9:47:00 AM -0800 MP <gildenman at gmail.com>
wrote:>> Additional examples abound.
>
> Doubtless :)
>
> More usefully, can you confirm whether Solaris works on this chassis
> without the RAID controller?
way back, i had Solaris working with a promise j200s (jbod sas) chassis,
to the extent that the sas driver at the time worked.  i can''t IMAGINE
why this chassis would be any different from Solaris'' perspective.

-frank

Frank Cusack

2007-Dec-13 20:28 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

On December 13, 2007 11:34:54 AM -0800 "can you guess?" 
<billtodd at metrocast.net> wrote:> By contrast, if extremely rare undetected and (other than via ZFS
> checksums) undetectable (or considerably more common undetected but
> detectable via disk ECC codes, *if* the data is accessed) corruption
> occurs, if the RAID card is used to mirror the data there''s a good
chance
> that even ZFS''s validation scans won''t see the problem
(because the card
> happens to access the good copy for the scan rather than the bad one) -
> in which case you''ll lose that data if the disk with the good data
fails.
> And in the case of (extremely rare) otherwise-undetectable corruption, if
> the card *does* return the bad copy then IIRC ZFS (not knowing that a
> good copy also exists) will just claim that the data is gone (though I
> don''t know if it will then flag it such that you''ll never
have an
> opportunity to find the good copy).
i like this answer, except for what you are implying by "extremely
rare".
> If the RAID card scrubs its disks the difference (now limited to the
> extremely rare undetectable-via-disk-ECC corruption) becomes pretty
> negligible - but I''m not sure how many RAIDs below the
near-enterprise
> category perform such scrubs.
>
> In other words, if you *don''t* otherwise scrub your disks then
ZFS''s
> checksums-plus-internal-scrubbing mechanisms assume greater importance:
> it''s only the contention that other solutions that *do* offer
scrubbing
> can''t compete with ZFS in effectively protecting your data
that''s
> somewhat over the top.
the problem with your discounting of zfs checksums is that you aren''t
taking into account that "extremely rare" is relative to the number of
transactions, which are "extremely high".  in such a case even
"extremely
rare" errors do happen, and not just to "extremely few" folks,
but i would
say to all enterprises.  hell it happens to home users.

when the difference between an unrecoverable single bit error is not just
1 bit but the entire file, or corruption of an entire database row (etc),
those small and infrequent errors are an "extremely big" deal.

considering all the pieces, i would much rather run zfs on a jbod than
on a raid, wherever i could.  it gives better data protection, and it
is ostensibly cheaper.

-frank

Toby Thain

2007-Dec-13 20:41 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

On 13-Dec-07, at 6:28 PM, Frank Cusack wrote:
> On December 13, 2007 11:34:54 AM -0800 "can you guess?"
> <billtodd at metrocast.net> wrote:
>> By contrast, if extremely rare undetected and (other than via ZFS
>> checksums) undetectable (or considerably more common undetected but
>> detectable via disk ECC codes, *if* the data is accessed) corruption
>> occurs, if the RAID card is used to mirror the data there''s a
good
>> chance
>> that even ZFS''s validation scans won''t see the
problem (because
>> the card
>> happens to access the good copy for the scan rather than the bad  
>> one) -
>> in which case you''ll lose that data if the disk with the good
data
>> fails.
Which is exactly why ZFS should do the mirroring...
>> And in the case of (extremely rare) otherwise-undetectable  
>> corruption, if
>> the card *does* return the bad copy then IIRC ZFS (not knowing that a
>> good copy also exists) will just claim that the data is gone  
>> (though I
>> don''t know if it will then flag it such that you''ll
never have an
>> opportunity to find the good copy).
Ditto.
>
> i like this answer, except for what you are implying by "extremely  
> rare".
>
>> If the RAID card scrubs its disks
A scrub without checksum puts a huge burden on disk firmware and  
error reporting paths :-)

--Toby
>> the difference (now limited to the
>> extremely rare undetectable-via-disk-ECC corruption) becomes pretty
>> negligible - but I''m not sure how many RAIDs below the near- 
>> enterprise
>> category perform such scrubs.
>>
>> In other words, if you *don''t* otherwise scrub your disks then
ZFS''s
>> checksums-plus-internal-scrubbing mechanisms assume greater  
>> importance:
>> it''s only the contention that other solutions that *do* offer
>> scrubbing
>> can''t compete with ZFS in effectively protecting your data
that''s
>> somewhat over the top.
>
> the problem with your discounting of zfs checksums is that you
aren''t
> taking into account that "extremely rare" is relative to the
number of
> transactions, which are "extremely high". ...
>
> considering all the pieces, i would much rather run zfs on a jbod than
> on a raid, wherever i could.  it gives better data protection, and it
> is ostensibly cheaper.
>
> -frank
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

can you guess?

2007-Dec-13 20:51 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

...
> when the difference between an unrecoverable single
> bit error is not just
> 1 bit but the entire file, or corruption of an entire
> database row (etc),
> those small and infrequent errors are an "extremely
> big" deal.
You are confusing unrecoverable disk errors (which are rare but orders of
magnitude more common) with otherwise *undetectable* errors (the occurrence of
which is at most once in petabytes by the studies I''ve seen, rather
than once in terabytes), despite my attempt to delineate the difference clearly.
Conventional approaches using scrubbing provide as complete protection against
unrecoverable disk errors as ZFS does:  it''s only the far rarer
otherwise *undetectable* errors that ZFS catches and they don''t.

- bill
 
 
This message posted from opensolaris.org

can you guess?

2007-Dec-13 20:57 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

...
> >> If the RAID card scrubs its disks
> 
> A scrub without checksum puts a huge burden on disk
> firmware and  
> error reporting paths :-)
Actually, a scrub without checksum places far less burden on the disks and their
firmware than ZFS-style scrubbing does, because it merely has to scan the disk
sectors sequentially rather than follow a tree path to each relatively small
leaf block.  Thus it also compromises runtime operation a lot less as well
(though in both cases doing it infrequently in the background should usually
reduce any impact to acceptable levels).

- bill
 
 
This message posted from opensolaris.org

Frank Cusack

2007-Dec-14 01:46 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

On December 13, 2007 12:51:55 PM -0800 "can you guess?" 
<billtodd at metrocast.net> wrote:> ...
>
>> when the difference between an unrecoverable single
>> bit error is not just
>> 1 bit but the entire file, or corruption of an entire
>> database row (etc),
>> those small and infrequent errors are an "extremely
>> big" deal.
>
> You are confusing unrecoverable disk errors (which are rare but orders of
> magnitude more common) with otherwise *undetectable* errors (the
> occurrence of which is at most once in petabytes by the studies
I''ve
> seen, rather than once in terabytes), despite my attempt to delineate the
> difference clearly.
No I''m not.  I know exactly what you are talking about.
>  Conventional approaches using scrubbing provide as
> complete protection against unrecoverable disk errors as ZFS does: 
it''s
> only the far rarer otherwise *undetectable* errors that ZFS catches and
> they don''t.
yes.  far rarer and yet home users still see them.

that the home user ever sees these extremely rare (undetectable) errors
may have more to do with poor connection (cables, etc) to the disk, and
less to do with disk media errors.  enterprise users probably have
better connectivity and see errors due to high i/o.  just thinking
out loud.

regardless, zfs on non-raid provides better protection than zfs on raid
(well, depending on raid configuration) so just from the data integrity
POV non-raid would generally be preferred.  the fact that the type of
error being prevented is rare doesn''t change that and i was further
arguing that even though it''s rare the impact can be high so you
don''t
want to write it off.

-frank

Marion Hakanson

2007-Dec-14 02:59 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

billtodd at metrocast.net said:> You are confusing unrecoverable disk errors (which are rare but orders of
> magnitude more common) with otherwise *undetectable* errors (the occurrence
> of which is at most once in petabytes by the studies I''ve seen,
rather than
> once in terabytes), despite my attempt to delineate the difference clearly.
I could use a little clarification on how these unrecoverable disk errors
behave -- or maybe a lot, depending on one''s point of view.

So, when one of these "once in around ten (or 100) terabytes read"
events
occurs, my understanding is that a read error is returned by the drive,
and the corresponding data is lost as far as the drive is concerned.
Maybe just a bit is gone, maybe a byte, maybe a disk sector, it probably
depends on the disk, OS, driver, and/or the rest of the I/O hardware
chain.  Am I doing OK so far?

> Conventional approaches using scrubbing provide as complete protection
> against unrecoverable disk errors as ZFS does:  it''s only the far
rarer
> otherwise *undetectable* errors that ZFS catches and they don''t. 
I found it helpful to my own understanding to try restating the above
in my own words.  Maybe others will as well.

If my assumptions are correct about how these unrecoverable disk errors
are manifested, then a "dumb" scrubber will find such errors by simply
trying to read everything on disk -- no additional checksum is required.
Without some form of parity or replication, the data is lost, but at
least somebody will know about it.

Now it seems to me that without parity/replication, there''s not much
point in doing the scrubbing, because you could just wait for the error
to be detected when someone tries to read the data for real.  It''s
only if you can repair such an error (before the data is needed) that
such scrubbing is useful.

For those well-versed in this stuff, apologies for stating the obvious.

Regards,

Marion

Anton B. Rang

2007-Dec-14 05:22 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

> I could use a little clarification on how these unrecoverable disk errors
> behave -- or maybe a lot, depending on one''s point of view.
> 
> So, when one of these "once in around ten (or 100) terabytes
read" events
> occurs, my understanding is that a read error is returned by the drive,
> and the corresponding data is lost as far as the drive is concerned.
Yes -- the data being one or more disk blocks.  (You can''t lose a
smaller
amount of data, from the drive''s point of view, since the error
correction
code covers the whole block.)
> If my assumptions are correct about how these unrecoverable disk errors
> are manifested, then a "dumb" scrubber will find such errors by
simply
> trying to read everything on disk -- no additional checksum is required.
> Without some form of parity or replication, the data is lost, but at
> least somebody will know about it.
Right.  Generally if you have replication and scrubbing, then you''ll
also
re-write any data which was found to be unreadable, thus fixing the
problem (and protecting yourself against future loss of the second copy).
> Now it seems to me that without parity/replication, there''s not
much
> point in doing the scrubbing, because you could just wait for the error
> to be detected when someone tries to read the data for real.  It''s
> only if you can repair such an error (before the data is needed) that
> such scrubbing is useful.
Pretty much, though if you''re keeping backups, you could recover the
data from backup at this point. Of course, backups could be considered
a form of replication, but most of us in file systems don''t think of
them
that way.

Anton
 
 
This message posted from opensolaris.org

can you guess?

2007-Dec-14 06:12 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

> On December 13, 2007 12:51:55 PM -0800 "can you
> guess?" 
> <billtodd at metrocast.net> wrote:
> > ...
> >
> >> when the difference between an unrecoverable
> single
> >> bit error is not just
> >> 1 bit but the entire file, or corruption of an
> entire
> >> database row (etc),
> >> those small and infrequent errors are an
> "extremely
> >> big" deal.
> >
> > You are confusing unrecoverable disk errors (which
> are rare but orders of
> > magnitude more common) with otherwise
> *undetectable* errors (the
> > occurrence of which is at most once in petabytes by
> the studies I''ve
> > seen, rather than once in terabytes), despite my
> attempt to delineate the
> > difference clearly.
> 
> No I''m not.  I know exactly what you are talking
> about.
Then you misspoke in your previous post by referring to "an unrecoverable
single bit error" rather than to "an undetected single-bit
error", which I interpreted as a misunderstanding.
> 
> >  Conventional approaches using scrubbing provide as
> > complete protection against unrecoverable disk
> errors as ZFS does:  it''s
> > only the far rarer otherwise *undetectable* errors
> that ZFS catches and
> > they don''t.
> 
> yes.  far rarer and yet home users still see them.
I''d need to see evidence of that for current hardware.
> 
> that the home user ever sees these extremely rare
> (undetectable) errors
> may have more to do with poor connection (cables,
> etc) to the disk,
Unlikely, since transfers over those connections have been protected by 32-bit
CRCs since ATA busses went to 33 or 66 MB/sec. (SATA has even stronger
protection), and SMART tracks the incidence of these errors (which result in
retries when detected) such that very high error rates should be noticed before
an error is likely to make it through the 2^-32 probability sieve (for that
matter, you might well notice the performance degradation due to the frequent
retries).  I can certainly believe that undetected transfer errors occurred in
noticeable numbers in older hardware, though:  that''s why they
introduced the CRCs.

 and> less to do with disk media errors.  enterprise users
> probably have
> better connectivity and see errors due to high i/o.
As I said, at most once in petabytes transferred.  It takes about 5 years for a
contemporary ATA/SATA disk to transfer 10 PB if it''s streaming data at
top speed, 24/7; doing 8 KB random database accesses (the example that you used)
flat out, 24/7, it takes about 500 years (though most such drives
aren''t speced for 24/7 operation, especially with such a seek-intensive
workload) - and for a more realistic random-access database workload it would
take many millennia.

So it would take an extremely large (on the order of 1,000 disks) and very
active database before you''d be likely to see one of these errors
within the lifetime of the disks involved.
>  just thinking
> ut loud.
> 
> regardless, zfs on non-raid provides better
> protection than zfs on raid
> (well, depending on raid configuration) so just from
> the data integrity
> POV non-raid would generally be preferred.
That was the point I made in my original post here - but *if* the hardware RAID
is scrubbing its disks the difference in data integrity protection is unlikely
to be of any real significance and one might reasonably elect to use the
hardware RAID if it offered any noticeable performance advantage (e.g., by
providing NVRAM that could expedite synchronous writes).

  the fact> that the type of
> error being prevented is rare doesn''t change that and
> i was further
> arguing that even though it''s rare the impact can be
> high so you don''t
> want to write it off.
All reliability involves trade-offs, and very seldom are "all other things
equal".  Extremely low probability risks are often worth taking if it costs
*anything* to avoid them (but of course are never worth taking if it costs
*nothing* to avoid them).

- bill
 
 
This message posted from opensolaris.org

can you guess?

2007-Dec-14 06:31 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

...
> > Now it seems to me that without parity/replication,
> there''s not much
> > point in doing the scrubbing, because you could
> just wait for the error
> > to be detected when someone tries to read the data
> for real.  It''s
> > only if you can repair such an error (before the
> data is needed) that
> > such scrubbing is useful.
> 
> Pretty much
I think I''ve read (possibly in the ''MAID''
descriptions) the contention that at least some unreadable sectors get there in
stages, such that if you catch them early they will be only difficult to read
rather than completely unreadable.  In such a case, scrubbing is worthwhile even
without replication, because it finds the problem early enough that the disk
itself (or higher-level mechanisms if the disk gives up but the higher level is
more persistent) will revector the sector when it finds it difficult (but not
impossible) to read.

- bill
 
 
This message posted from opensolaris.org

Will Murnane

2007-Dec-14 06:41 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

On Dec 14, 2007 1:12 AM, can you guess? <billtodd at metrocast.net>
wrote:> > yes.  far rarer and yet home users still see them.
>
> I''d need to see evidence of that for current hardware.What would constitute "evidence"?  Do anecdotal tales from home users
qualify?  I have two disks (and one controller!) that generate several
checksum errors per day each.  I''ve also seen intermittent checksum
fails that go away once all the cables are wiggled.
> Unlikely, since transfers over those connections have been protected by
32-bit CRCs since ATA busses went to 33 or 66 MB/sec. (SATA has even stronger
protection)The ATA/7 spec specifies a 32-bit CRC (older ones used a 16-bit CRC)
[1].  The serial ata protocol also specifies 32-bit CRCs beneath 8/10b
coding (1.0a p. 159)[2].  That''s not much stronger at all.

Will

[1] http://www.t10.org/t13/project/d1532v3r4a-ATA-ATAPI-7.pdf
[2] http://www.ece.umd.edu/courses/enee759h.S2003/references/serialata10a.pdf

can you guess?

2007-Dec-14 09:23 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

> On Dec 14, 2007 1:12 AM, can you guess?
> <billtodd at metrocast.net> wrote:
> > > yes.  far rarer and yet home users still see
> them.
> >
> > I''d need to see evidence of that for current
> hardware.
> What would constitute "evidence"?  Do anecdotal tales
> from home users
> qualify?  I have two disks (and one controller!) that
> generate several
> checksum errors per day each.
I assume that you''re referring to ZFS checksum errors rather than to
transfer errors caught by the CRC resulting in retries.

If so, then the next obvious question is, what is causing the ZFS checksum
errors?  And (possibly of some help in answering that question) is the disk
seeing CRC transfer errors (which show up in its SMART data)?

If the disk is not seeing CRC errors, then the likelihood that data is being
''silently'' corrupted as it crosses the wire is negligible (1
in 65,536 if you''re using ATA disks, given your correction below, else
1 in 4.3 billion for SATA).  Controller or disk firmware bugs have been known to
cause otherwise undetected errors (though I''m not familiar with any
recent examples in normal desktop environments - e.g., the CERN study discussed
earlier found a disk firmware bug that seemed only activated by the unusual
demands placed on the disk by a RAID controller, and exacerbated by that
controller''s propensity just to ignore disk time-outs).  So, for that
matter, have buggy file systems.  Flaky RAM can result in ZFS checksum errors
(the CERN study found correlations there when it used its own checksum
mechanisms).

  I''ve also seen> intermittent checksum
> fails that go away once all the cables are wiggled.
Once again, a significant question is whether the checksum errors are
accompanied by a lot of CRC transfer errors.  If not, that would strongly
suggest that they''re not coming from bad transfers (and while they
could conceivably be the result of commands corrupted on the wire, so much more
data is transferred compared to command bandwidth that you''d really
expect to see data CRC errors too if commands were getting mangled).  When you
wiggle the cables, other things wiggle as well (I assume you''ve checked
that your RAM is solidly seated).

On the other hand, if you''re getting a whole bunch of CRC errors, then
with only a 16-bit CRC it''s entirely conceivable that a few are
sneaking by unnoticed.
> 
> > Unlikely, since transfers over those connections
> have been protected by 32-bit CRCs since ATA busses
> went to 33 or 66 MB/sec. (SATA has even stronger
> protection)
> The ATA/7 spec specifies a 32-bit CRC (older ones
> used a 16-bit CRC)
> [1].
Yup - my error:  the CRC was indeed introduced in ATA-4 (33 MB/sec. version),
but was only 16 bits wide back then.

  The serial ata protocol also specifies 32-bit> CRCs beneath 8/10b
> coding (1.0a p. 159)[2].  That''s not much stronger at
> all.
The extra strength comes more from its additional coverage (commands as well as
data).

- bill

This message posted from opensolaris.org

Casper.Dik at Sun.COM

2007-Dec-14 09:31 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

>...
>though I''m not familiar with any recent examples in normal desktop
environments
>

One example found during early use of zfs in Solaris engineering was
a system with a flaky power supply.

It seemed to work just fine with ufs but when zfs was installed the
sata drives started to shows many ZFS checksum errors.

After replacing the powersupply, the system did not detect any more
errors.

Flaky powersupplies are an important contributor to PC unreliability; they
also tend to fail a lot in various ways.

Casper

can you guess?

2007-Dec-14 23:09 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

> 
> >...
> >though I''m not familiar with any recent examples in
> normal desktop environments
> >
> 
> 
> One example found during early use of zfs in Solaris
> engineering was
> a system with a flaky power supply.
> 
> It seemed to work just fine with ufs but when zfs was
> installed the
> sata drives started to shows many ZFS checksum
> errors.
> 
> After replacing the powersupply, the system did not
> detect any more
> errors.
> 
> Flaky powersupplies are an important contributor to
> PC unreliability; they
> also tend to fail a lot in various ways.
Thanks - now that you mention it, I think I remember reading about that here
somewhere.

But did anyone delve into these errors sufficiently to know that they were
specifically due to controller or disk firmware bugs (since you seem to be
suggesting by the construction of your response above that they were) rather
than, say, to RAM errors (if the system in question didn''t have ECC
RAM, anyway) between checksum generation and disk access on either reads or
writes (the CERN study found a correlation even using ECC RAM between detected
RAM errors and silent data corruption)?

Not that the generation of such otherwise undetected errors due to a flaky PSU
isn''t interesting in its own right, but this specific sub-thread was
about whether poor connections were a significant source of such errors (my
comment about controller and disk firmware bugs having been a suggested
potential alternative source) - so identifying the underlying mechanisms is of
interest as well.

- bill
 
 
This message posted from opensolaris.org

Will Murnane

2007-Dec-15 00:50 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

On Dec 14, 2007 4:23 AM, can you guess? <billtodd at metrocast.net>
wrote:> I assume that you''re referring to ZFS checksum errors rather than
to transfer errors caught by the CRC resulting in retries.
Correct.
> If so, then the next obvious question is, what is causing the ZFS checksum
errors?  And (possibly of some help in answering that question) is the disk
seeing CRC transfer errors (which show up in its SMART data)?
The memory is ECC in this machine, and Memtest passed it for five
days.  The disk was indeed getting some pretty lousy SMART scores, but
that doesn''t explain the controller issue.  This particular controller
is a SIIG-branded silicon image 0680 chipset (which is, apparently, a
piece of junk - if I''d done my homework I would''ve bought
something
else)... but the premise stands.  I bought a piece of consumer-level
hardware off the shelf, it had corruption issues, and ZFS told me
about it when XFS had been silent.
> Once again, a significant question is whether the checksum errors are
accompanied by a lot of CRC transfer errors.  If not, that would strongly
suggest that they''re not coming from bad transfers (and while they
could conceivably be the result of commands corrupted on the wire, so much more
data is transferred compared to command bandwidth that you''d really
expect to see data CRC errors too if commands were getting mangled).  When you
wiggle the cables, other things wiggle as well (I assume you''ve checked
that your RAM is solidly seated).
I don''t remember offhand if I got CRC errors with the working
controller and drive and bad cabling, sorry.  RAM was solid, as
mentioned earlier.
> The extra strength comes more from its additional coverage (commands as
well as data).
Ah, that explains it.

Will

can you guess?

2007-Dec-15 03:36 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

the next obvious question is, what is> causing the ZFS checksum errors?  And (possibly of
> some help in answering that question) is the disk
> seeing CRC transfer errors (which show up in its
> SMART data)?
> 
> The memory is ECC in this machine, and Memtest passed
> it for five
> days.  The disk was indeed getting some pretty lousy
> SMART scores,
Seagate ATA disks (if that''s what you were using) are notorious for
this in a couple of specific metrics:  they ship from the factory that way. 
This does not appear to be indicative of any actual problem but rather of error
tablulation which they perform differently than other vendors do (e.g., I could
imagine that they did something unusual in their burn-in exercising that
generated nominal errors, but that''s not even speculation, just a
random guess).

 but> that doesn''t explain the controller issue.  This
> particular controller
> is a SIIG-branded silicon image 0680 chipset (which
> is, apparently, a
> piece of junk - if I''d done my homework I would''ve
> bought something
> else)... but the premise stands.  I bought a piece of
> consumer-level
> hardware off the shelf, it had corruption issues, and
> ZFS told me
> about it when XFS had been silent.
Then we''ve been talking at cross-purposes.  Your original response was
to my request for evidence that *platter errors that escape detection by the
disk''s ECC mechanisms* occurred sufficiently frequently to be a cause
for concern - and that''s why I asked specifically what was causing the
errors you saw (to see whether they were in fact the kind for which I had
requested evidence).

Not that detecting silent errors due to buggy firmware is useless:  it clearly
saved you from continuing corruption in this case.  My impression is that in
conventional consumer installations (typical consumers never crack open their
case at all, let alone to add a RAID card) controller and disk firmware is
sufficiently stable (especially for the limited set of functions demanded of it)
that ZFS''s added integrity checks may not count for a great deal (save
perhaps peace of mind, but typical consumers aren''t sufficiently aware
of potential dangers to suffer from deficits in that area) - but your experience
indicates that when you stray from that mold ZFS''s added protection may
sometimes be as significant as it was for Robert''s mid-range array
firmware bugs.

And since there indeed was a RAID card involved in the original hypothetical
situation under discussion, the fact that I was specifically referring to
undetectable *disk* errors was only implied by my subsequent discussion of disk
error rates, rather than explicit.

The bottom line appears to be that introducing non-standard components into the
path between RAM and disk has, at least for some specific subset of those
components, the potential to introduce silent errors of the form that ZFS can
catch - quite possibly in considerably greater numbers that the kinds of
undetected disk errors that I was talking about ever would (that RAID card you
were using has a relatively popular low-end chipset, and Robert''s
mid-range arrays were hardly fly-by-night).  So while I''m still not
convinced that ZFS offers significant features in the reliability area compared
with other open-source *software* solutions, the evidence that it may do so in
more sophisticated (but not quite high-end) hardware environments is becoming
more persuasive.

- bill
 
 
This message posted from opensolaris.org

Frank Cusack

2007-Dec-15 20:38 UTC

head link

[zfs-discuss] Nice chassis for ZFS server

On December 13, 2007 10:12:52 PM -0800 "can you guess?" 
<billtodd at metrocast.net> wrote:>> On December 13, 2007 12:51:55 PM -0800 "can you
>> guess?"
>> <billtodd at metrocast.net> wrote:
>> > ...
>> >
>> >> when the difference between an unrecoverable
>> single
>> >> bit error is not just
>> >> 1 bit but the entire file, or corruption of an
>> entire
>> >> database row (etc),
>> >> those small and infrequent errors are an
>> "extremely
>> >> big" deal.
>> >
>> > You are confusing unrecoverable disk errors (which
>> are rare but orders of
>> > magnitude more common) with otherwise
>> *undetectable* errors (the
>> > occurrence of which is at most once in petabytes by
>> the studies I''ve
>> > seen, rather than once in terabytes), despite my
>> attempt to delineate the
>> > difference clearly.
>>
>> No I''m not.  I know exactly what you are talking
>> about.
>
> Then you misspoke in your previous post by referring to "an
unrecoverable
> single bit error" rather than to "an undetected single-bit
error", which
> I interpreted as a misunderstanding.
I did misspeak.  thanks.
-frank

zfs discuss - Dec 2007 - Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server

[zfs-discuss] Nice chassis for ZFS server