thr3ads.net - zfs discuss - [zfs-discuss] bit-flipping in RAM... [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Erik Trimble

2010-Mar-31 09:27 UTC

[zfs-discuss] bit-flipping in RAM...

Orvar''s post over in opensol-discuss has me thinking:

After reading the paper and looking at design docs, I''m wondering if 
there is some facility to allow for comparing data in the ARC to it''s 
corresponding checksum.  That is, if I''ve got the data I want in the 
ARC, how can I be sure it''s correct (and free of hardware memory 
errors)?  I''d assume the way is to also store absolutely all the 
checksums for all blocks/metadatas being read/written in the ARC (which, 
of course, means that only so much RAM corruption can be compensated 
for), and do a validation when that every time that block is 
used/written from the ARC.  You''d likely have to do constant metadata 
consistency checking, and likely have to hold multiple copies of 
metadata in-ARC to compensate for possible corruption.  I''m assuming 
that this has at least been explored, right?

(the researchers used non-ECC RAM, so honestly, I think it''s a bit 
unrealistic to expect that your car will win the Indy 500 if you put a 
Yugo engine in it) - normally, this problem is exactly what you have 
hardware ECC and memory scrubbing for at the hardware level.

I''m not saying that ZFS should consider doing this - doing a validation
for in-memory data is non-trivially expensive in performance terms, and 
there''s only so much you can do and still expect your machine to 
survive.  I mean, I''ve used the old NonStop stuff, and yes, you can 
shoot them with a .45 and it likely will still run, but wacking them 
with a bazooka still is guarantied to make them, well, Non-NonStop.

-Erik





-------- Original Message --------
Subject: 	Re: [osol-discuss] Any news about 2010.3?
Date: 	Wed, 31 Mar 2010 01:06:45 PDT
From: 	Orvar Korvar <knatte_fnatte_tjatte at yahoo.com>
To: 	opensolaris-discuss at opensolaris.org



If you value your data, you should reconsider. But if your data is not
important, then skip ZFS.

File system data corruption test by researcher:
http://blogs.zdnet.com/storage/?p=169

ZFS data corruption test by researchers:
http://www.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss at opensolaris.org


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Casper.Dik at Sun.COM

2010-Mar-31 09:35 UTC

head link

[zfs-discuss] bit-flipping in RAM...

>I''m not saying that ZFS should consider doing this - doing a
validation
>for in-memory data is non-trivially expensive in performance terms, and 
>there''s only so much you can do and still expect your machine to 
>survive.  I mean, I''ve used the old NonStop stuff, and yes, you can
>shoot them with a .45 and it likely will still run, but wacking them 
>with a bazooka still is guarantied to make them, well, Non-NonStop.
If we scrub the memory anyway, why not include the check of the ZFS 
checksums which are already in memory?

OTOH, zfs gets a lot of mileage out of cheap hardware and we know what the 
limitations are when you don''t use ECC; the industry must start to
require
that all chipsets support ECC.

Casper

Erik Trimble

2010-Mar-31 09:46 UTC

head link

[zfs-discuss] bit-flipping in RAM...

Casper.Dik at Sun.COM wrote:>   
>> I''m not saying that ZFS should consider doing this - doing a
validation
>> for in-memory data is non-trivially expensive in performance terms, and
>> there''s only so much you can do and still expect your machine
to
>> survive.  I mean, I''ve used the old NonStop stuff, and yes,
you can
>> shoot them with a .45 and it likely will still run, but wacking them 
>> with a bazooka still is guarantied to make them, well, Non-NonStop.
>>     
>
> If we scrub the memory anyway, why not include the check of the ZFS 
> checksums which are already in memory?
>
> OTOH, zfs gets a lot of mileage out of cheap hardware and we know what the 
> limitations are when you don''t use ECC; the industry must start to
require
> that all chipsets support ECC.
>
> CaspeReading the paper was interesting, as it highlighted all the places 
where ZFS "skips" validation.  There''s a lot of places. In
many ways,
fixing this would likely make ZFS similar to AppleTalk whose notorious 
performance (relative to Ethernet) was caused by what many called the 
"Are You Sure?" design.  Double and Triple checking absolutely 
everything has it''s costs.

And, yes, we really should just force computer manufacturers to use ECC 
in more places (not just RAM) - as densities and data volumes increase, 
we are more likely to see errors, and without proper hardware checking, 
we''re really going out on a limb here to be able to trust what the 
hardware says. And, let''s face it - hardware error correction is /so/ 
much faster than doing it in software.





-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Zhu Han

2010-Mar-31 10:34 UTC

head link

[zfs-discuss] bit-flipping in RAM...

The ECC enabled RAM should be very cheap quickly if the industry embraces it
in every computer. :-)

best regards,
hanzhu


On Wed, Mar 31, 2010 at 5:46 PM, Erik Trimble <erik.trimble at
oracle.com>wrote:
> Casper.Dik at Sun.COM wrote:
>
>>
>>
>>> I''m not saying that ZFS should consider doing this - doing
a validation
>>> for in-memory data is non-trivially expensive in performance terms,
and
>>> there''s only so much you can do and still expect your
machine to survive.  I
>>> mean, I''ve used the old NonStop stuff, and yes, you can
shoot them with a
>>> .45 and it likely will still run, but wacking them with a bazooka
still is
>>> guarantied to make them, well, Non-NonStop.
>>>
>>>
>>
>> If we scrub the memory anyway, why not include the check of the ZFS
>> checksums which are already in memory?
>>
>> OTOH, zfs gets a lot of mileage out of cheap hardware and we know what
the
>> limitations are when you don''t use ECC; the industry must
start to require
>> that all chipsets support ECC.
>>
>> Caspe
>>
> Reading the paper was interesting, as it highlighted all the places where
> ZFS "skips" validation.  There''s a lot of places. In
many ways, fixing this
> would likely make ZFS similar to AppleTalk whose notorious performance
> (relative to Ethernet) was caused by what many called the "Are You
Sure?"
> design.  Double and Triple checking absolutely everything has it''s
costs.
>
> And, yes, we really should just force computer manufacturers to use ECC in
> more places (not just RAM) - as densities and data volumes increase, we are
> more likely to see errors, and without proper hardware checking,
we''re
> really going out on a limb here to be able to trust what the hardware says.
> And, let''s face it - hardware error correction is /so/ much faster
than
> doing it in software.
>
>
>
>
>
>
> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
> Timezone: US/Pacific (GMT-0800)
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100331/09597a6c/attachment.html>

Darren J Moffat

2010-Mar-31 12:13 UTC

head link

[zfs-discuss] bit-flipping in RAM...

On 31/03/2010 10:27, Erik Trimble wrote:> Orvar''s post over in opensol-discuss has me thinking:
>
> After reading the paper and looking at design docs, I''m wondering
if
> there is some facility to allow for comparing data in the ARC to
it''s
> corresponding checksum. That is, if I''ve got the data I want in
the ARC,
> how can I be sure it''s correct (and free of hardware memory
errors)? I''d
> assume the way is to also store absolutely all the checksums for all
> blocks/metadatas being read/written in the ARC (which, of course, means
> that only so much RAM corruption can be compensated for), and do a
> validation when that every time that block is used/written from the ARC.
> You''d likely have to do constant metadata consistency checking,
and
> likely have to hold multiple copies of metadata in-ARC to compensate for
> possible corruption. I''m assuming that this has at least been
explored,
> right?
A subset of this is already done. The ARC keeps its own in memory 
checksum (because some buffers in the ARC are not yet on stable storage 
so don''t have a block pointer checksum yet).

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c

arc_buf_freeze()
arc_buf_thaw()
arc_cksum_verify()
arc_cksum_compute()

It isn''t done on every access but it can detect in memory corruption - 
I''ve seen it happen on several occasions but all due to errors in my 
code not bad physical memory.

Doing in more frequently could cause a significant performance problem.

-- 
Darren J Moffat

Robert Milkowski

2010-Mar-31 14:21 UTC

head link

[zfs-discuss] bit-flipping in RAM...

>
>
> On 31/03/2010 10:27, Erik Trimble wrote:
>> Orvar''s post over in opensol-discuss has me thinking:
>>
>> After reading the paper and looking at design docs, I''m
wondering if
>> there is some facility to allow for comparing data in the ARC to
it''s
>> corresponding checksum. That is, if I''ve got the data I want
in the ARC,
>> how can I be sure it''s correct (and free of hardware memory
errors)? I''d
>> assume the way is to also store absolutely all the checksums for all
>> blocks/metadatas being read/written in the ARC (which, of course, means
>> that only so much RAM corruption can be compensated for), and do a
>> validation when that every time that block is used/written from the
ARC.
>> You''d likely have to do constant metadata consistency
checking, and
>> likely have to hold multiple copies of metadata in-ARC to compensate
for
>> possible corruption. I''m assuming that this has at least been
explored,
>> right?
>
> A subset of this is already done. The ARC keeps its own in memory 
> checksum (because some buffers in the ARC are not yet on stable 
> storage so don''t have a block pointer checksum yet).
>
>
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c
>
>
> arc_buf_freeze()
> arc_buf_thaw()
> arc_cksum_verify()
> arc_cksum_compute()
>
> It isn''t done on every access but it can detect in memory
corruption -
> I''ve seen it happen on several occasions but all due to errors in
my
> code not bad physical memory.
>
> Doing in more frequently could cause a significant performance problem.
>
or there might be an extra zpool level (or system wide) property to 
enable checking checksums onevery access from ARC - there will be a 
siginificatn performance impact but then it might be acceptable for 
really paranoid folks especially with modern hardware.

-- 
Robert Milkowski
http://milek.blogspot.com

Bob Friesenhahn

2010-Mar-31 15:44 UTC

head link

[zfs-discuss] bit-flipping in RAM...

On Wed, 31 Mar 2010, Robert Milkowski wrote:>
> or there might be an extra zpool level (or system wide) property to enable 
> checking checksums onevery access from ARC - there will be a siginificatn 
> performance impact but then it might be acceptable for really paranoid
folks
> especially with modern hardware.
How would this checking take place for memory mapped files?

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Robert Milkowski

2010-Mar-31 23:38 UTC

head link

[zfs-discuss] bit-flipping in RAM...

On 31/03/2010 16:44, Bob Friesenhahn wrote:> On Wed, 31 Mar 2010, Robert Milkowski wrote:
>>
>> or there might be an extra zpool level (or system wide) property to 
>> enable checking checksums onevery access from ARC - there will be a 
>> siginificatn performance impact but then it might be acceptable for 
>> really paranoid folks especially with modern hardware.
>
> How would this checking take place for memory mapped files?
>
Well, and it wouldn''t help if data were corrupted in an application 
internal buffer after read() succeeded, or just before an application 
does a write().

So I wasn''t saying that it can work or that it can work in all 
circumstances but rather I was trying to say that it probably shouldn''t
be dismissed on a performance argument alone as for some use cases with 
modern HW it might well be that the performance will still be acceptable 
while providing still better protection and data correctness guarantee.

But even then while mmap() issue is probably solvable the read() and 
write() cases are probably not.

-- 
Robert Milkowski
http://milek.blogspot.com

Xin LI

2010-Apr-01 00:13 UTC

head link

[zfs-discuss] bit-flipping in RAM...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2010/03/31 05:13, Darren J Moffat wrote:> On 31/03/2010 10:27, Erik Trimble wrote:
>> Orvar''s post over in opensol-discuss has me thinking:
>>
>> After reading the paper and looking at design docs, I''m
wondering if
>> there is some facility to allow for comparing data in the ARC to
it''s
>> corresponding checksum. That is, if I''ve got the data I want
in the ARC,
>> how can I be sure it''s correct (and free of hardware memory
errors)? I''d
>> assume the way is to also store absolutely all the checksums for all
>> blocks/metadatas being read/written in the ARC (which, of course, means
>> that only so much RAM corruption can be compensated for), and do a
>> validation when that every time that block is used/written from the
ARC.
>> You''d likely have to do constant metadata consistency
checking, and
>> likely have to hold multiple copies of metadata in-ARC to compensate
for
>> possible corruption. I''m assuming that this has at least been
explored,
>> right?
> 
> A subset of this is already done. The ARC keeps its own in memory
> checksum (because some buffers in the ARC are not yet on stable storage
> so don''t have a block pointer checksum yet).
> 
>
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c
> 
> 
> arc_buf_freeze()
> arc_buf_thaw()
> arc_cksum_verify()
> arc_cksum_compute()
> 
> It isn''t done on every access but it can detect in memory
corruption -
> I''ve seen it happen on several occasions but all due to errors in
my
> code not bad physical memory.
> 
> Doing in more frequently could cause a significant performance problem.
Agreed.

I think it''s probably not a very good idea to check it everywhere.  It
would be great if we can do some checks occasionally especially for
critical data structures, but, if it''s the memory we can not trust, how
can we trust that the checksum checker to behave correctly?

I had some questions about the FAST paper mentioned by Erik, which was
not answered during the conference which makes me feel that the paper,
while pointed out some interesting issues, but failed to prove it being
a real world problem:

 - How much probability a bit flipping can happen on a non-ECC system?
say, how much bits would be flipped per terabytes processed, or
transactions or something?
 - Among these flipped bits, how much would happen on a file system
buffer?  What happens when, say, the application''s memory hit a flipped
bit, and when the file system itself have no problem with its buffer?
 - How much performance penalty would be if we check the checksums every
time the data is being accessed?  How good will the check be compared to
an ECC in terms of correctness?

Cheers,
- -- 
Xin LI <delphij at delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!	       Live free or die
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iQEcBAEBAgAGBQJLs+UZAAoJEATO+BI/yjfBfE0H/0+iG/pgrs/JNId814g5JMki
eZ2tJx2Lf7+DIlrHczvcwyWAtAke7ojUMeNEw6HIqMfTQHVcgMk2XNdxWZn0sJsy
PUPj9Qcg+nkHcewAoWvG0VUZN0fSBX1OtJcVG78Kt5drWmT+g5jiMH+BFCEAiISJ
Kcfswp9r0JbYmI010fwqugc74bAZnMhUXMCvvplJZUE3iaDCq499TanKIVmKu4vq
JsDNYXZT9Nqbb20DB4TKluauP1QVUJnBAeqfQCYZ/+CqK5+phnUgzyaBTiMKBHd0
Q0l1bvGEvjLRarlGk7/702Udu7HC4UKs09pKtBIb+cw8CmyYaZ8Vuth0Ri0drzM=S5WS
-----END PGP SIGNATURE-----

Daniel Carosone

2010-Apr-01 01:03 UTC

head link

[zfs-discuss] bit-flipping in RAM...

On Thu, Apr 01, 2010 at 12:38:29AM +0100, Robert Milkowski
wrote:> So I wasn''t saying that it can work or that it can work in all  
> circumstances but rather I was trying to say that it probably
shouldn''t
> be dismissed on a performance argument alone as for some use cases 
It would be of great utility even if considered only as a diagnostic
measure - ie, for qualifying tests or when something else raises
suspicion and you want to eliminate/confirm sources of problems. 

With a suitable pointer in a FAQ/troubleshooting guide, it could
reduce the number / improve the quality of problem reports related to
bad h/w. 

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100401/d3016c58/attachment.bin>

Orvar Korvar

2010-Apr-03 12:16 UTC

head link

[zfs-discuss] bit-flipping in RAM...

Have not the ZFS data corruption researchers been in touch with Jeff Bonwick and
the ZFS team?
-- 
This message posted from opensolaris.org

zfs discuss - Mar 2010 - bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...

[zfs-discuss] bit-flipping in RAM...