thr3ads.net - zfs discuss - [zfs-discuss] ZFS reliability on one drive? [Apr 2006]

If this information is useful, please help other people find it:
Share via:

Wout Mertens

2006-Apr-06 09:50 UTC

[zfs-discuss] ZFS reliability on one drive?

Hey there,

I was thinking about laptops and zfs. I support a fair number of  
laptops, and if their drives go bad, it''s almost always some sort of  
bit-rot. Sectors that were written to a while ago, can''t be read any  
more. I haven''t seen a laptop where the drive just wouldn''t
work any
more.

I was considering what could be done to improve reliability from a  
zfs perspective, given only 1 drive.

Since it''s bitrot, once-reliable sectors can break, so just using a  
lot of snapshots won''t help.

First off, how about splitting the drive in 2 partitions and  
mirroring across them?
This keeps the two sets of data nicely separated in case of head  
crash or bad magnetic patches. Only, it means that the drive head  
will have to move around a lot, making things slower. On top of that,  
you lose half your disk space.

Ok, so how about splitting it in 3..8 partitions and running RAID-Z  
across that?
Now the head has to move even more, but it has less ways to go  
between places it has to write data at. And a the I/O scheduler can  
probably arrange I/O so that the head goes back and forth in  
graceful, swooping arcs.
Plus, you don''t lose as much disk space, you can even choose how much  
redundancy you want.

Hmmm, how about giving raidz the same device 8 times?
I know right now this doesn''t make sense, but what if raidz could be  
told to use only one vdev, and provide 12.5% (1/8) redundant bits?
It would still split up a given transaction in stripes, but it could  
choose where on the disk to lay them out. For instance, if some  
research shows that most drive errors occur in 128KB blocks, it would  
just have to make sure that the stripe blocks are at least 256KB apart.


I''m really curious to read your thoughts about this...

Wout.

Gregory Shaw

2006-Apr-06 13:45 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

Wouldn''t it make more sense to plug in a firewire drive, mirror to  
that, and then break the mirror?  It wouldn''t be live all the time,  
but it would give you a backup of your data/system.

Regarding the 8 slices, laptop drives (and non-enterprise-class [PS] 
ATA) aren''t built to do heavy i/o.   By slicing it 8 times (with 8  
writes to the same drive), you''re going to be stressing the hardware  
much harder than otherwise.   I think that would reduce the usable  
lifetime of your drive significantly.

Striping across the same drive really seems like a false sense of  
security.

On Apr 6, 2006, at 3:50 AM, Wout Mertens wrote:
> Hey there,
>
> I was thinking about laptops and zfs. I support a fair number of  
> laptops, and if their drives go bad, it''s almost always some sort
> of bit-rot. Sectors that were written to a while ago, can''t be
read
> any more. I haven''t seen a laptop where the drive just
wouldn''t
> work any more.
>
> I was considering what could be done to improve reliability from a  
> zfs perspective, given only 1 drive.
>
> Since it''s bitrot, once-reliable sectors can break, so just using
a
> lot of snapshots won''t help.
>
> First off, how about splitting the drive in 2 partitions and  
> mirroring across them?
> This keeps the two sets of data nicely separated in case of head  
> crash or bad magnetic patches. Only, it means that the drive head  
> will have to move around a lot, making things slower. On top of  
> that, you lose half your disk space.
>
> Ok, so how about splitting it in 3..8 partitions and running RAID-Z  
> across that?
> Now the head has to move even more, but it has less ways to go  
> between places it has to write data at. And a the I/O scheduler can  
> probably arrange I/O so that the head goes back and forth in  
> graceful, swooping arcs.
> Plus, you don''t lose as much disk space, you can even choose how  
> much redundancy you want.
>
> Hmmm, how about giving raidz the same device 8 times?
> I know right now this doesn''t make sense, but what if raidz could
> be told to use only one vdev, and provide 12.5% (1/8) redundant bits?
> It would still split up a given transaction in stripes, but it  
> could choose where on the disk to lay them out. For instance, if  
> some research shows that most drive errors occur in 128KB blocks,  
> it would just have to make sure that the stripe blocks are at least  
> 256KB apart.
>
>
> I''m really curious to read your thoughts about this...
>
> Wout.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive MS 4382              greg.shaw at sun.com (work)
Louisville, CO 80028-4382                 shaw at fmsoft.com (home)
"When Microsoft writes an application for Linux, I''ve Won." -
Linus
Torvalds

Wout Mertens

2006-Apr-06 17:16 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

On 06 Apr 2006, at 15:45, Gregory Shaw wrote:
> Wouldn''t it make more sense to plug in a firewire drive, mirror to
> that, and then break the mirror?  It wouldn''t be live all the
time,
> but it would give you a backup of your data/system.
Well, it would certainly improve things, but then you need to  
automate the heck out of it with hotplugging and so on, and even then  
people will often not do it. For instance, to back up my personal  
data on my laptop, I need to plug in my iPod and run 1 command. I  
almost never do it.

I realize with only 1 drive you''re not going to win any data- 
availability prizes, but it''s a solution that zfs enables through its  
checksumming and it''s fully automated.

> Regarding the 8 slices, laptop drives (and non-enterprise-class [PS] 
> ATA) aren''t built to do heavy i/o.   By slicing it 8 times (with 8
> writes to the same drive), you''re going to be stressing the  
> hardware much harder than otherwise.   I think that would reduce  
> the usable lifetime of your drive significantly.
But the total amount of data written would only be 12.5% more? And  
with zfs compression turned on, it''s less! :)

As for usable lifetime... I don''t know. I agree that it probably  
increases mechanical stress on the drive, but I really wonder what  
the net result is. I was hoping someone on this mailinglist knows  
more about drive failure modes...

So the idea is to automatically improve the reliability of laptop  
harddisks by leveraging zfs features and paying a price in CPU time  
and disk space. You bring up good points but I don''t think they  
invalidate this concept. No?

Cheers,

Wout.

Richard Elling

2006-Apr-06 19:31 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

On Thu, 2006-04-06 at 11:50 +0200, Wout Mertens wrote:> I was thinking about laptops and zfs. I support a fair number of  
> laptops, and if their drives go bad, it''s almost always some sort
of
> bit-rot. Sectors that were written to a while ago, can''t be read
any
> more. I haven''t seen a laptop where the drive just
wouldn''t work any
> more.
> 
> I was considering what could be done to improve reliability from a  
> zfs perspective, given only 1 drive.
Mirror.
> Since it''s bitrot, once-reliable sectors can break, so just using
a
> lot of snapshots won''t help.
Correct.  Though the on-disk format is highly redundant in a 
single vdev, you are still susceptible to multiple block loss.
Anecdotally, we do see disk blocks fail in clusters, and I''ve
got some data on this laying around here somewhere...
> First off, how about splitting the drive in 2 partitions and  
> mirroring across them?
> This keeps the two sets of data nicely separated in case of head  
> crash or bad magnetic patches. Only, it means that the drive head  
> will have to move around a lot, making things slower. On top of that,  
> you lose half your disk space.
Reliable, fast, or inexpensive: pick one.
> Ok, so how about splitting it in 3..8 partitions and running RAID-Z  
> across that?
Reliable, fast, or inexpensive: pick one :-)
> Now the head has to move even more, but it has less ways to go  
> between places it has to write data at. And a the I/O scheduler can  
> probably arrange I/O so that the head goes back and forth in  
> graceful, swooping arcs.
> Plus, you don''t lose as much disk space, you can even choose how
much
> redundancy you want.
I''ve asked this question over the past year of a number of drive
reliability experts.  The concensus seems to be that head movement
does not significantly impact reliability.  Media seems to stay 
at the top of the failure Pareto, and positioning faults are down
in the tail.  The field data I have also confirms this.
> Hmmm, how about giving raidz the same device 8 times?
> I know right now this doesn''t make sense, but what if raidz could
be
> told to use only one vdev, and provide 12.5% (1/8) redundant bits?
> It would still split up a given transaction in stripes, but it could  
> choose where on the disk to lay them out. For instance, if some  
> research shows that most drive errors occur in 128KB blocks, it would  
> just have to make sure that the stripe blocks are at least 256KB apart.
> 
> 
> I''m really curious to read your thoughts about this...
I''ve been doing some work on this sort of analysis.  There is
definitely
a strong case for mirroring on one disk, when you have but one disk.
I don''t think end-user acceptance is altogether an unsolvable 
problem -- the PC space already does this with snapshots (eg. IBM''s
laptop recovery).  Further, there are many failures that can occur
on a disk before it becomes cost effective to replace.  

My gut reaction is that people have lost data for years and never 
known it, so they think everything is ok.  ZFS will expose this sort
of data loss (was undetected errors, now detected errors), so we 
need to be able to show a good reason for data protection, with 
its associated cost trade-offs.
 -- richard

Jonathan Adams

2006-Apr-06 20:26 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

On Thu, Apr 06, 2006 at 11:50:35AM +0200, Wout Mertens
wrote:> 
> Hmmm, how about giving raidz the same device 8 times?
> I know right now this doesn''t make sense, but what if raidz could
be
> told to use only one vdev, and provide 12.5% (1/8) redundant bits?
> It would still split up a given transaction in stripes, but it could  
> choose where on the disk to lay them out. For instance, if some  
> research shows that most drive errors occur in 128KB blocks, it would  
> just have to make sure that the stripe blocks are at least 256KB apart.
Unfortunately, I don''t believe ZFS notices when two slices are on the
same device (zfs folks, feel free to chime in if I''m wrong), so the
N partitions will each have IOs scheduled independently, causing all
kinds of thrashing.  Seems poor.

Cheers,
- jonathan

-- 
Jonathan Adams, Solaris Kernel Development

Gregory Shaw

2006-Apr-06 20:30 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

In my experience, I''ve seen the following failures on laptop drives:

- Total failure due to abuse (such as dropping the laptop)
- Controller failure (flaky controller)
- Spin-up problems (stickage/age?)
- Bad block count failure.  Laptop drives have a limited number of  
block reallocations available.  Once those have been used, it is no  
longer possible to fix bad blocks.

When you mention bit-rot, I''m assuming you mean bad blocks.  I  
haven''t seen files degrade on disk without an underlying cause, such  
as bad blocks or turning off the laptop in the middle of a write  
operation.

Only the bad block count would be helped by doing RAIDZ.  File  
corruption through power-off should be addressed by ZFS as a whole.

One question:
	If you encounter a bad block that can''t be fixed and it has  
invalidated one of the 8 partitions, what do you do to fix it?

On Apr 6, 2006, at 11:16 AM, Wout Mertens wrote:
> On 06 Apr 2006, at 15:45, Gregory Shaw wrote:
>
>> Wouldn''t it make more sense to plug in a firewire drive,
mirror to
>> that, and then break the mirror?  It wouldn''t be live all the
>> time, but it would give you a backup of your data/system.
>
> Well, it would certainly improve things, but then you need to  
> automate the heck out of it with hotplugging and so on, and even  
> then people will often not do it. For instance, to back up my  
> personal data on my laptop, I need to plug in my iPod and run 1  
> command. I almost never do it.
>
> I realize with only 1 drive you''re not going to win any data- 
> availability prizes, but it''s a solution that zfs enables through
> its checksumming and it''s fully automated.
>
>
>> Regarding the 8 slices, laptop drives (and non-enterprise-class  
>> [PS]ATA) aren''t built to do heavy i/o.   By slicing it 8 times
>> (with 8 writes to the same drive), you''re going to be
stressing
>> the hardware much harder than otherwise.   I think that would  
>> reduce the usable lifetime of your drive significantly.
>
> But the total amount of data written would only be 12.5% more? And  
> with zfs compression turned on, it''s less! :)
>
> As for usable lifetime... I don''t know. I agree that it probably  
> increases mechanical stress on the drive, but I really wonder what  
> the net result is. I was hoping someone on this mailinglist knows  
> more about drive failure modes...
>
> So the idea is to automatically improve the reliability of laptop  
> harddisks by leveraging zfs features and paying a price in CPU time  
> and disk space. You bring up good points but I don''t think they  
> invalidate this concept. No?
>
> Cheers,
>
> Wout.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-----
Gregory Shaw, IT Architect
Phone: (303) 673-8273        Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382           greg.shaw at sun.com (work)
Louisville, CO 80028-4382                 shaw at fmsoft.com (home)
"When Microsoft writes an application for Linux, I''ve Won." -
Linus
Torvalds

Richard Elling

2006-Apr-06 22:01 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

On Thu, 2006-04-06 at 14:30 -0600, Gregory Shaw wrote:> In my experience, I''ve seen the following failures on laptop
drives:
> 
> - Total failure due to abuse (such as dropping the laptop)
> - Controller failure (flaky controller)
> - Spin-up problems (stickage/age?)
> - Bad block count failure.  Laptop drives have a limited number of  
> block reallocations available.  Once those have been used, it is no  
> longer possible to fix bad blocks.
The data from the bad block may not be recoverable.  In the field
data I have, this is always the #1 most common fault.  The "repaired"
block is zero-filled to the spare block.  This is one manifestation
of bit rot.

NB. some flash devices also use block sparing.
> When you mention bit-rot, I''m assuming you mean bad blocks.  I  
> haven''t seen files degrade on disk without an underlying cause,
such
> as bad blocks or turning off the laptop in the middle of a write  
> operation.
ZFS is immune to the power-off problem.  The data on disk is always
consistent.  Whether it was the data you wanted on disk is another
question... reliable, fast, or inexpensive: pick one.
> Only the bad block count would be helped by doing RAIDZ.  File  
> corruption through power-off should be addressed by ZFS as a whole.
Bad blocks can be detected on both write and read operations.
In the case of writes, ZFS could write it somewhere else [*].  In
the case of reads, ZFS knows the data is bad because it checksums,
and can recreate the data if you use some sort of robustness.
This is very different from other file systems where they assume
the data is good and, frankly, don''t handle the bad cases well.

[*] not a panacea, see earlier thread on slowness while disk is 
failed
> One question:
>         If you encounter a bad block that can''t be fixed and it
has
> invalidated one of the 8 partitions, what do you do to fix it?
Someone from the ZFS team can correct me here, I''m not sure how
or when ZFS gives up on the whole vdev.  The vtoc is stored in 
the first block of a partition.  If you lose that, you lose
the partitions.  But this will be a very different failure mode
than losing enough blocks in the partition such that you would
want to fail the whole partion.  I presume FMA will be used for
such diagnosis and leveraged into ZFS rather than reinventing
it (the DE) for ZFS.
 -- richard

Eric Schrock

2006-Apr-06 22:06 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

On Thu, Apr 06, 2006 at 03:01:19PM -0700, Richard Elling
wrote:> 
> Someone from the ZFS team can correct me here, I''m not sure how
> or when ZFS gives up on the whole vdev.  The vtoc is stored in 
> the first block of a partition.  If you lose that, you lose
> the partitions.  But this will be a very different failure mode
> than losing enough blocks in the partition such that you would
> want to fail the whole partion.  I presume FMA will be used for
> such diagnosis and leveraged into ZFS rather than reinventing
> it (the DE) for ZFS.
We do nothing - that block will be bad forever.  In the next phase of
ZFS/FMA, we will be able to detect persistently bad blocks and mark the
the entire vdev degraded (auto-initiating a replace if hot spares are
configured).  This is our only choice, because at that point you''ll
have
no fault tolerance (or N-1 tolerance) for that particular block.  Losing
another drive would result in permanent loss of data.  We have some
ideas about block reallocation, but in general it''s very hard because
ZFS is COW, as well as having snapshots.

- Eri

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Joerg Schilling

2006-Apr-09 14:10 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

Richard Elling <Richard.Elling at Sun.COM> wrote:
> On Thu, 2006-04-06 at 14:30 -0600, Gregory Shaw wrote:
> > In my experience, I''ve seen the following failures on laptop
drives:
> > 
> > - Total failure due to abuse (such as dropping the laptop)
> > - Controller failure (flaky controller)
> > - Spin-up problems (stickage/age?)
> > - Bad block count failure.  Laptop drives have a limited number of  
> > block reallocations available.  Once those have been used, it is no  
> > longer possible to fix bad blocks.
>
> The data from the bad block may not be recoverable.  In the field
> data I have, this is always the #1 most common fault.  The
"repaired"
> block is zero-filled to the spare block.  This is one manifestation
> of bit rot.
>
> NB. some flash devices also use block sparing.
The main problem with bad block reallocation is that S.M.A.R.T. only 
works in case that you read the whole disk frequent enough.

If you read the blocks frequenrt enough, then the data is recovered by
the disk because it id detected before the block becomes completely unreadable.

Otherwise, the content is lost.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Matthew Ahrens

2006-Apr-09 23:57 UTC

head link

[zfs-discuss] ZFS reliability on one drive?

On Thu, Apr 06, 2006 at 11:50:35AM +0200, Wout Mertens
wrote:> I was considering what could be done to improve reliability from a  
> zfs perspective, given only 1 drive.
...> First off, how about splitting the drive in 2 partitions and  
> mirroring across them?
You may be interested in the following bug, which we expect to be
integrated soon:

6410698 ZFS metadata needs to be more highly replicated (ditto blocks)

This will cause us to store multiple copies of metadata on a single
drive pool.  It also provides some infrastructure that will allow us to
eventually store multiple copies of user data even in a non-replicated
pool.

--matt

zfs discuss - Apr 2006 - ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?

[zfs-discuss] ZFS reliability on one drive?