thr3ads.net - zfs discuss - [zfs-discuss] Data loss by memory corruption? [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Stefan Ring

2012-Jan-14 14:36 UTC

[zfs-discuss] Data loss by memory corruption?

Inspired by the paper "End-to-end Data Integrity for File Systems: A
ZFS Case Study" [1], I''ve been thinking if it is possible to
devise a way,
in which a minimal in-memory data corruption would cause massive data
loss. I could imagine a scenario where an entire directory branch
drops off the tree structure, for example. Since I know too little
about ZFS''s structure, I''m also asking myself if it is
possible to
make old snapshots disappear via memory corruption or lose data blocks
to leakage (not containing data, but not marked as available).

I''d appreciate it if someone with a good understanding of
ZFS''s
internals and principles could comment on the possibility of such
scenarios.

[1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf

Jim Klimov

2012-Jan-15 12:28 UTC

head link

[zfs-discuss] Data loss by memory corruption?

2012-01-14 18:36, Stefan Ring wrote:> Inspired by the paper "End-to-end Data Integrity for File Systems: A
> ZFS Case Study" [1], I''ve been thinking if it is possible to
devise a way,
> in which a minimal in-memory data corruption would cause massive data
> loss. I could imagine a scenario where an entire directory branch
> drops off the tree structure, for example. Since I know too little
> about ZFS''s structure, I''m also asking myself if it is
possible to
> make old snapshots disappear via memory corruption or lose data blocks
> to leakage (not containing data, but not marked as available).
>
> I''d appreciate it if someone with a good understanding of
ZFS''s
> internals and principles could comment on the possibility of such
> scenarios.
>
> [1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf
By no means I''m an expert like ones you seek, but I''m asking
similar
questions, and have more popping up ;)

I do have some reported corruptions on my non-ECC system despite
raidz2 on disk, so I have a keen interest as to how stuff works
and why it doesn''t sometimes ;)

As for block leakage, according to error messages I''m seeing
now, leaked blocks are at least expected and checked for:
"allocating allocated segment" and "freeing free segment".
How my system got here - that''s the puzzle...

It does seem possible that in-memory corruption of data payload
and/or checksum of a block before writing it to disk would render
it invalid on read (data doesn''t match checksum, ZFS returns EIO) .
Maybe even worse if the in-memory block is corrupted before the
checksumming, and seemingly valid garbage gets stored on disk,
read afterwards, and used with blind trust.
If it is a leaf block (userdata) you just get a corrupted file.
If it is a metadata block, and if the corruption happened before
it was ditto-written to several disk locations, you''re in trouble.

It is likewise possible that data in-RAM gets corrupted after
reading from disk and checksum-checking, but before using it
as a metadata block or whatever.

If you''re as "lucky" as to have irrepairable (by ditto
blocks)
corruption in a metadata block near the root of a tree, you
can happen to be in bad trouble.

In all these cases RAM is the SPOF (single point of failure)
so all ZFS recommendations involve using ECC systems. Alas,
even though ECC chips and chipsets are cheap nowadays, not all
architectures use them anyway (i.e. desktops, laptops, etc.),
and the tagline of running ZFS for "reliable storage on consumer
grade hardware" is poisoned by this fact. Other filesystems
obviously suffer the same from bad components, but ZFS reports
on these detected errors, and unlike other systems that let
you dismiss the errors (i.e. free all blocks and files touched
by a corrupt block, leaving you with a smaller but consistent
tree of data blocks), or don''t even notice them, ZFS tends to
get really upset on many of themm and ask for recovery from
backups (as if they are 100% reliable).

I do wonder, however, if it is possible to make a software ECC
to detect-and/or-repair small memory corruptions on consumer
grade systems. And where would such part fit - in ZFS (i.e.
some ECC bits appended in every zfs_*_t structure) or in the
{Solaris} kernel for general VM management. And even then
there''s a question whether this would solve more problems than
create a greater one - pose the visibility of solution and
hide problems that actually exist (because there would be
some non-ECC parts of the data path and GIGO principle can
apply at any point). In the bad case, you ECC an invalid
piece of memory, and afterwards trust it as it matches the
checksum. On the good side, there is a smaller window that
data is exposed unprotected, so statistically this solution
should help.

HTH,
//Jim Klimov

Bob Friesenhahn

2012-Jan-15 16:58 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On Sun, 15 Jan 2012, Jim Klimov wrote:>
> It does seem possible that in-memory corruption of data payload
> and/or checksum of a block before writing it to disk would render
> it invalid on read (data doesn''t match checksum, ZFS returns EIO)
.
> Maybe even worse if the in-memory block is corrupted before the
> checksumming, and seemingly valid garbage gets stored on disk,
> read afterwards, and used with blind trust.
Please don''t under-state the actual issue.  ZFS assumes that RAM is 
100% reliable.  ZFS uses an in-memory cache called the ARC which can 
span many tens of gigabytes on busy large memory systems.  User data 
is stored in this ARC and the cached data becomes the reference copy 
of the data until it is evicted.  This means that user data can be 
silently and undetectably corrupted due to memory corruption.  The 
effects that zfs''s checksums can detect are just a small subset of the 
problems which may occur if memory returns wrong values.
> In all these cases RAM is the SPOF (single point of failure)
> so all ZFS recommendations involve using ECC systems. Alas,
> even though ECC chips and chipsets are cheap nowadays, not all
> architectures use them anyway (i.e. desktops, laptops, etc.),
> and the tagline of running ZFS for "reliable storage on consumer
> grade hardware" is poisoned by this fact. Other filesystems
Feel free to blame Intel for this since they seem to be primarily 
responsible for delivering CPUs and chipsets which don''t support ECC. 
AMD has not been such a perpetrator, although it is possible to buy 
AMD-based systems which don''t provide ECC.
> I do wonder, however, if it is possible to make a software ECC
> to detect-and/or-repair small memory corruptions on consumer
> grade systems. And where would such part fit - in ZFS (i.e.
This could be done for part of the memory but it would obviously 
result in huge performance loss.  I/O to memory would have to become 
block-oriented rather than random access.  It is still necessary for 
random access to be used in a large part of the memory since it is a 
requirement in order to run programs and there would no way to defend 
that part of the memory.
> some ECC bits appended in every zfs_*_t structure) or in the
> {Solaris} kernel for general VM management. And even then
> there''s a question whether this would solve more problems than
> create a greater one - pose the visibility of solution and
> hide problems that actually exist (because there would be
> some non-ECC parts of the data path and GIGO principle can
> apply at any point). In the bad case, you ECC an invalid
> piece of memory, and afterwards trust it as it matches the
> checksum. On the good side, there is a smaller window that
> data is exposed unprotected, so statistically this solution
> should help.
The problem is that with unreliable memory, the software-based ECC 
would not be able to correct the content of the memory since the ECC 
itself might have been computed incorrectly (due to unreliable 
memory).  You are then faced with notifications of problems that the 
user can''t fix.

The proper solution (regardless of filesystem used) is to assure that 
ECC is included in any computer that you buy.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bayard G. Bell

2012-Jan-15 19:59 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On Sun, 2012-01-15 at 16:28 +0400, Jim Klimov wrote:> 2012-01-14 18:36, Stefan Ring wrote:
> > Inspired by the paper "End-to-end Data Integrity for File
Systems: A
> > ZFS Case Study" [1], I''ve been thinking if it is
possible to devise a way,
> > in which a minimal in-memory data corruption would cause massive data
> > loss. I could imagine a scenario where an entire directory branch
> > drops off the tree structure, for example. Since I know too little
> > about ZFS''s structure, I''m also asking myself if it
is possible to
> > make old snapshots disappear via memory corruption or lose data blocks
> > to leakage (not containing data, but not marked as available).
I''ve never understood why these conclusions are considered so
interesting-- it''s as though ZFS were analyzed as a system but the
conclusions weren''t drawn systematically.

If you don''t protect buffer integrity elsewhere on the system, what
would in be worth for ZFS to provide in-core integrity for its kernel
pages? The vast preponderance of consumers of ZFS data have to use
buffers outside of the ZFS kernel subsystem, leaving you with a trivial
added assurance in protecting against in-core corruption. Compare the
effort of doing that to the cost of using ECC, and there doesn''t seem
to
be anything like a compelling case for putting all that work into ZFS or
accepting the overhead that would result.

Put into a more reasonable context, there may still be something there,
but it looks very different than how the authors seemed to pitch it. Or
have I missed something?
> Alas,
> even though ECC chips and chipsets are cheap nowadays, not all
> architectures use them anyway (i.e. desktops, laptops, etc.),
> and the tagline of running ZFS for "reliable storage on consumer
> grade hardware" is poisoned by this fact. 
Yes, you can get reliable and probably performant ZFS storage without
having to buy enterprise-class components. But you still have to treat
midrange or consumer components as differentiated on reliability and
performance if you want achieve those things meaningfully. ZFS is good,
but it''s not magic.

Richard Elling

2012-Jan-16 06:19 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On Jan 14, 2012, at 6:36 AM, Stefan Ring wrote:
> Inspired by the paper "End-to-end Data Integrity for File Systems: A
> ZFS Case Study" [1], I''ve been thinking if it is possible to
devise a way,
> in which a minimal in-memory data corruption would cause massive data
> loss.
For enterprise-class systems, you will find hardware protection such as ECC
and other mechanisms all the way up and down the datapath. For example,
if you build an ALU, you can add a few transistors to also detect the various
failure modes that afflict data flowing through an ALU. This is one of the
things
that diffentiates a mainframe or SPARC64 processor from a run-of-the-mill PeeCee
processor.
> I could imagine a scenario where an entire directory branch
> drops off the tree structure, for example. Since I know too little
> about ZFS''s structure, I''m also asking myself if it is
possible to
> make old snapshots disappear via memory corruption or lose data blocks
> to leakage (not containing data, but not marked as available).
Sure. If you''d like a fright, read the errata sheet for a modern
microprocessor :-)
> I''d appreciate it if someone with a good understanding of
ZFS''s
> internals and principles could comment on the possibility of such
> scenarios.
ZFS does expect that the processor, memory, and I/O systems work to some 
degree. The only way to get beyond this sort of dependency is to implement a
system like we do for avionics.
> 
> [1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf
Yes. Netapp has funded those researchers in the past. Looks like a FUD piece to
me.
Lookout everyone, the memory system you bought from Intel might suck!
 -- richard

David Magda

2012-Jan-16 16:08 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On Mon, January 16, 2012 01:19, Richard Elling wrote:
>> [1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf
>
> Yes. Netapp has funded those researchers in the past. Looks like a FUD
> piece to me.
> Lookout everyone, the memory system you bought from Intel might suck!
>From the paper:
> This material is based upon work supported by the National Science
> Foundation under the following grants: CCF-0621487, CNS-0509474,
> CNS-0834392, CCF-0811697, CCF-0811697, CCF-0937959, as well as by generous
> donations from NetApp, Inc, Sun Microsystems, and Google.
So Sun paid to FUD themselves?

The conclusions are hardly unreasonable:
> While the reliability mechanisms in ZFS are able to provide reasonable
> robustness against disk corruptions, memory corruptions still remain a
> serious problem to data integrity.
I''ve heard the same thing said ("use ECC!") on this list many
times over
the years.

John Martin

2012-Jan-16 22:54 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On 01/16/12 11:08, David Magda wrote:>
> The conclusions are hardly unreasonable:
>
>> While the reliability mechanisms in ZFS are able to provide reasonable
>> robustness against disk corruptions, memory corruptions still remain a
>> serious problem to data integrity.
>
> I''ve heard the same thing said ("use ECC!") on this list
many times over
> the years.
I believe the whole paragraph quoted from the USENIX paper above is
important:

   While the reliability mechanisms in ZFS are able to
   provide reasonable robustness against disk corruptions,
   memory corruptions still remain a serious problem to
   data integrity. Our results for memory corruptions in-
   dicate cases where bad data is returned to the user, oper-
   ations silently fail, and the whole system crashes. Our
   probability analysis shows that one single bit flip has
   small but non-negligible chances to cause failures such
   as reading/writing corrupt data and system crashing.

The authors provide probability calculations in section 6.3
for single bit flips.  ECC provides detection and correction
of single bit flips.

Richard Elling

2012-Jan-17 16:53 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On Jan 16, 2012, at 8:08 AM, David Magda wrote:
> On Mon, January 16, 2012 01:19, Richard Elling wrote:
> 
>>> [1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf
>> 
>> Yes. Netapp has funded those researchers in the past. Looks like a FUD
>> piece to me.
>> Lookout everyone, the memory system you bought from Intel might suck!
> 
> From the paper:
> 
>> This material is based upon work supported by the National Science
>> Foundation under the following grants: CCF-0621487, CNS-0509474,
>> CNS-0834392, CCF-0811697, CCF-0811697, CCF-0937959, as well as by
generous
>> donations from NetApp, Inc, Sun Microsystems, and Google.
> 
> So Sun paid to FUD themselves?
wouldn''t be the first time...
> The conclusions are hardly unreasonable:
> 
>> While the reliability mechanisms in ZFS are able to provide reasonable
>> robustness against disk corruptions, memory corruptions still remain a
>> serious problem to data integrity.
> 
> I''ve heard the same thing said ("use ECC!") on this list
many times over
> the years.
Agree with the ECC comment :-)

If we can classify this as encouragement to use ECC, then you don''t
need to drag ZFS
into the conversation. Interestingly, the only market that doesn''t use
ECC is the PeeCee
market. Embedded and enterprise markets use ECC.
 -- richard

Bob Friesenhahn

2012-Jan-17 19:52 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On Tue, 17 Jan 2012, Richard Elling wrote:> Agree with the ECC comment :-)
>
> If we can classify this as encouragement to use ECC, then you
don''t need to drag ZFS
> into the conversation. Interestingly, the only market that doesn''t
use ECC is the PeeCee
> market. Embedded and enterprise markets use ECC.
The issue is definitely not specific to ZFS.  For example, the whole 
OS depends on relable memory content in order to function.  Likewise, 
no one likes it if characters mysteriously change in their word 
processing documents.

Most of the blame seems to focus on Intel, with its objective to spew 
CPUs with the highest-clocking performance at the lowest possible 
price point for the desktop market.  AMD CPUs seem to usually be 
slower but include ECC as standard in the CPU or AMD-supplied chipset.

If it can be believed (and even if some may doubt it), Intel sells 
Xeon-branded CPUs which lack ECC support.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Stefan Ring

2012-Jan-17 21:20 UTC

head link

[zfs-discuss] Data loss by memory corruption?

> The issue is definitely not specific to ZFS. ?For example, the whole OS
> depends on relable memory content in order to function. ?Likewise, no one
> likes it if characters mysteriously change in their word processing
> documents.
I don?t care too much if a single document gets corrupted ? there?ll
always be a good copy in a snapshot. I do care however if a whole
directory branch or old snapshots were to disappear.
> Most of the blame seems to focus on Intel, with its objective to spew CPUs
> with the highest-clocking performance at the lowest possible price point
for
> the desktop market. ?AMD CPUs seem to usually be slower but include ECC as
> standard in the CPU or AMD-supplied chipset.
Agreed. I originally bought an AMD-based system for that reason alone,
with the intention of running OpenSolaris on it. Alas, it performed
abysmally, so it was quickly swapped for an Intel-based one (without
ECC).

Additionally, consider that Joyent?s port of KVM supports only Intel
systems, AFAIK.

Bob Friesenhahn

2012-Jan-17 21:31 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On Tue, 17 Jan 2012, Stefan Ring wrote:>
> Additionally, consider that Joyent?s port of KVM supports only Intel
> systems, AFAIK.
Hopefully that will be a short-term issue.  64-core AMD Opteron 
systems are affordable now.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Jim Klimov

2012-Jan-18 10:53 UTC

head link

[zfs-discuss] Data loss by memory corruption?

2012-01-18 1:20, Stefan Ring wrote:>> The issue is definitely not specific to ZFS.  For example, the whole OS
>> depends on relable memory content in order to function.  Likewise, no
one
>> likes it if characters mysteriously change in their word processing
>> documents.
>
> I don?t care too much if a single document gets corrupted ? there?ll
> always be a good copy in a snapshot. I do care however if a whole
> directory branch or old snapshots were to disappear.
Well, as far as this problem "relies" on random memory corruptions,
you don''t get to choose whether your document gets broken or some
low-level part of metadata tree ;)

Besides, what if that document you don''t care about is your
account''s
entry in a banking system (as if they had no other redundancy and
double-checks)? And suddenly you "don''t exist" because of
some EIOIO,
or your balance is zeroed (or worse, highly negative)? ;)

//Jim

Nico Williams

2012-Jan-18 16:36 UTC

head link

[zfs-discuss] Data loss by memory corruption?

On Wed, Jan 18, 2012 at 4:53 AM, Jim Klimov <jimklimov at cos.ru>
wrote:> 2012-01-18 1:20, Stefan Ring wrote:
>> I don?t care too much if a single document gets corrupted ? there?ll
>> always be a good copy in a snapshot. I do care however if a whole
>> directory branch or old snapshots were to disappear.
>
> Well, as far as this problem "relies" on random memory
corruptions,
> you don''t get to choose whether your document gets broken or some
> low-level part of metadata tree ;)
Other filesystems tend to be much more tolerant of bit rot of all
types precisely because they have no block checksums.

But I''d rather have ZFS -- *with* redundancy, of course, and with ECC.

It might be useful to have a way to recover from checksum mismatches
by involving a human.  I''m imagining a tool that tests whether
accepting a block''s actual contents results in making data available
that the human thinks checks out, and if so, then rewriting that
block.  Some bit errors might simply result in meaningless metadata,
but in some cases this can be corrected (e.g., ridiculous block
addresses).  But if ECC takes care of the problem then why waste the
effort?  (Partial answer: because it''d be a very neat GSoC type
project!)
> Besides, what if that document you don''t care about is your
account''s
> entry in a banking system (as if they had no other redundancy and
> double-checks)? And suddenly you "don''t exist" because
of some EIOIO,
> or your balance is zeroed (or worse, highly negative)? ;)
This is why we have paper trails, logs, backups, redundancy at various
levels, ...

Nico
--

Jim Klimov

2012-Jan-19 11:16 UTC

head link

[zfs-discuss] Data loss by memory corruption?

2012-01-18 20:36, Nico Williams wrote:> On Wed, Jan 18, 2012 at 4:53 AM, Jim Klimov<jimklimov at cos.ru> 
wrote:
>> 2012-01-18 1:20, Stefan Ring wrote:
>>> I don?t care too much if a single document gets corrupted ?
there?ll
>>> always be a good copy in a snapshot. I do care however if a whole
>>> directory branch or old snapshots were to disappear.
>>
>> Well, as far as this problem "relies" on random memory
corruptions,
>> you don''t get to choose whether your document gets broken or
some
>> low-level part of metadata tree ;)
>
> Other filesystems tend to be much more tolerant of bit rot of all
> types precisely because they have no block checksums.
>
> But I''d rather have ZFS -- *with* redundancy, of course, and with
ECC.
>
> It might be useful to have a way to recover from checksum mismatches
> by involving a human.  I''m imagining a tool that tests whether
> accepting a block''s actual contents results in making data
available
> that the human thinks checks out, and if so, then rewriting that
> block.  Some bit errors might simply result in meaningless metadata,
> but in some cases this can be corrected (e.g., ridiculous block
> addresses).  But if ECC takes care of the problem then why waste the
> effort?
Because RAM ECC only decreases the probability of one type of
corruption?

You still have CPUs (i.e. overclocked and overheated, as is
likely in enthusiast systems, or in laptops with blocked vents,
thus sometimes generating random garbage).

Many other parts are not SPoF in a good design, i.e. noise
on wire, bugs in HBA and HDD firmware can be mitigated by
some hardware redundancy (multipathing, mixed vendors) in
higher-end systems, and by just ZFS approaches in other systems -
such as ditto copies for metadata and by vdev redundancy; but
these can still corrupt the copies=1 data (i.e. on single-disk
laptops without explicit copies=2).


 > (Partial answer: because it''d be a very neat GSoC type project!)

Good point for at least one motivator ;)

"I don''t care how it is done - but it should be!
This time you may even use sorcery, I''ll not ask questions!" ;)
>
>> Besides, what if that document you don''t care about is your
account''s
>> entry in a banking system (as if they had no other redundancy and
>> double-checks)? And suddenly you "don''t exist"
because of some EIOIO,
>> or your balance is zeroed (or worse, highly negative)? ;)
>
> This is why we have paper trails, logs, backups, redundancy at various
> levels, ...
As if any of them is 100% good and reliable and readily
accessible-available ;)

//Jim

zfs discuss - Jan 2012 - Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?

[zfs-discuss] Data loss by memory corruption?