thr3ads.net - zfs discuss - [zfs-discuss] Drive Failure w/o Redundancy [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Jef Pearlman

2007-Jun-26 18:55 UTC

[zfs-discuss] Drive Failure w/o Redundancy

Hi. I''m looking for the best solution to create an expandable
heterogeneous pool of drives. I think in an ideal world, there''d be a
raid version which could cleverly handle both multiple drive sizes and the
addition of new drives into a group (so one could drop in a new drive of
arbitrary size, maintain some redundancy, and gain most of that drive''s
capacity), but my impression is that we''re far from there.

Absent that, I was considering using zfs and just having a single pool. My main
question is this: what is the failure mode of zfs if one of those drives either
fails completely or has errors? Do I permanently lose access to the entire pool?
Can I attempt to read other data? Can I "zfs replace" the bad drive
and get some level of data recovery? Otherwise, by pooling drives am I simply
increasing the probability of a catastrophic data loss? I apologize if this is
addressed elsewhere -- I''ve read a bunch about zfs, but not come across
this particular answer.

As a side-question, does anyone have a suggestion for an intelligent way to
approach this goal? This is not mission-critical data, but I''d prefer
not to make data loss _more_ probable. Perhaps some volume manager (like LVM on
linux) has appropriate features?

Thanks for any help.

-puk
 
 
This message posted from opensolaris.org

Richard Elling

2007-Jun-26 19:50 UTC

head link

[zfs-discuss] Drive Failure w/o Redundancy

Jef Pearlman wrote:> Hi. I''m looking for the best solution to create an expandable
heterogeneous pool of drives. I think in an ideal world, there''d be a
raid version which could cleverly handle both multiple drive sizes and the
addition of new drives into a group (so one could drop in a new drive of
arbitrary size, maintain some redundancy, and gain most of that drive''s
capacity), but my impression is that we''re far from there.
Mirroring (aka RAID-1, though technically more like RAID-1+0) in ZFS will do
this.
> Absent that, I was considering using zfs and just having a single pool. My
main question is this: what is the failure mode of zfs if one of those drives
either fails completely or has errors? Do I permanently lose access to the
entire pool? Can I attempt to read other data? Can I "zfs replace" the
bad drive and get some level of data recovery? Otherwise, by pooling drives am I
simply increasing the probability of a catastrophic data loss? I apologize if
this is addressed elsewhere -- I''ve read a bunch about zfs, but not
come across this particular answer.
We generally recommend a single pool, as long as the use case permits.
But I think you are confused about what a zpool is.  I suggest you look
at the examples or docs.  A good overview is the slide show
	http://www.opensolaris.org/os/community/zfs/docs/zfs_last.pdf
> As a side-question, does anyone have a suggestion for an intelligent way to
approach this goal? This is not mission-critical data, but I''d prefer
not to make data loss _more_ probable. Perhaps some volume manager (like LVM on
linux) has appropriate features?
ZFS, mirrored pool will be the most performant and easiest to manage
with better RAS than a raidz pool.
  -- richard

Jef Pearlman

2007-Jun-27 19:03 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

> Jef Pearlman wrote:
> > Absent that, I was considering using zfs and just
> > having a single pool. My main question is this: what
> > is the failure mode of zfs if one of those drives
> > either fails completely or has errors? Do I
> > permanently lose access to the entire pool? Can I
> > attempt to read other data? Can I "zfs replace" the
> > bad drive and get some level of data recovery?
> > Otherwise, by pooling drives am I simply increasing
> > the probability of a catastrophic data loss? I
> > apologize if this is addressed elsewhere -- I''ve read
> > a bunch about zfs, but not come across this
> > particular answer.
> 
> We generally recommend a single pool, as long as the
> use case permits.
> But I think you are confused about what a zpool is.
>  I suggest you look
> t the examples or docs.  A good overview is the slide
> show
> 	http://www.opensolaris.org/os/community/zfs/docs/zfs_
> last.pdf
Perhaps I''m not asking my question clearly. I''ve already
experimented a fair amount with zfs, including creating and destroying a number
of pools with and without redundancy, replacing vdevs, etc. Maybe asking by
example will clarify what I''m looking for or where I''ve missed
the boat. The key is that I want a grow-as-you-go heterogenous set of disks in
my pool:

Let''s say I start with a 40g drive and a 60g drive. I create a
non-redundant pool (which will be 100g). At some later point, I run across an
unused 30g drive, which I add to the pool. Now my pool is 130g. At some point
after that, the 40g drive fails, either by producing read errors or my failing
to spin up at all. What happens to my pool? Can I mount and access it at all
(for the data not on or striped across the 40g drive)? Can I "zfs
replace" the 40g drive with another drive and have it attempt to copy as
much data over as it can? Or am I just out of luck? zfs seems like a great way
to use old/unutilized drives to expand capacity, but sooner or later one of
those drives will fail, and if it takes out the whole pool (which it might
reasonably do), then it doesn''t work out in the end.
 > > As a side-question, does anyone have a suggestion
> > for an intelligent way to approach this goal? This is
> > not mission-critical data, but I''d prefer not to make
> > data loss _more_ probable. Perhaps some volume
> > manager (like LVM on linux) has appropriate features?
> 
> ZFS, mirrored pool will be the most performant and
> easiest to manage
> with better RAS than a raidz pool.
The problem I''ve come across with using mirror or raidz for this setup
is that (as far as I know) you can''t add disks to mirror/raidz groups,
and if you just add the disk to the pool, you end up in the same situation as
above (with more space but no redundancy).

Thanks for your help.

-Jef
 
 
This message posted from opensolaris.org

Darren Dunham

2007-Jun-27 19:33 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

> Perhaps I''m not asking my question clearly. I''ve already
experimented
> a fair amount with zfs, including creating and destroying a number of
> pools with and without redundancy, replacing vdevs, etc. Maybe asking
> by example will clarify what I''m looking for or where
I''ve missed the
> boat. The key is that I want a grow-as-you-go heterogenous set of
> disks in my pool:
>  Let''s say I start with a 40g drive and a 60g drive. I create a
> non-redundant pool (which will be 100g). At some later point, I run
> across an unused 30g drive, which I add to the pool. Now my pool is
> 130g. At some point after that, the 40g drive fails, either by
> producing read errors or my failing to spin up at all. What happens to
> my pool?
Since you have created a non-redundant pool (or more specifically, a
pool with non-redundant members), the pool will fail.
> The problem I''ve come across with using mirror or raidz for this
setup
> is that (as far as I know) you can''t add disks to mirror/raidz
groups,
> and if you just add the disk to the pool, you end up in the same
> situation as above (with more space but no redundancy).
You can''t add to an existing mirror, but you can add new mirrors (or
raidz) items to the pool.  If so, there''s no loss of redundancy.

-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Neil Perrin

2007-Jun-27 21:33 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

Darren Dunham wrote:>> The problem I''ve come across with using mirror or raidz for
this setup
>> is that (as far as I know) you can''t add disks to mirror/raidz
groups,
>> and if you just add the disk to the pool, you end up in the same
>> situation as above (with more space but no redundancy).
> 
> You can''t add to an existing mirror, but you can add new mirrors
(or
> raidz) items to the pool.  If so, there''s no loss of redundancy.
Maybe I''m missing some context, but you can add to an existing mirror
- see zpool attach.

Neil.

Darren Dunham

2007-Jun-27 21:50 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

> Darren Dunham wrote:
> >> The problem I''ve come across with using mirror or raidz
for this setup
> >> is that (as far as I know) you can''t add disks to
mirror/raidz groups,
> >> and if you just add the disk to the pool, you end up in the same
> >> situation as above (with more space but no redundancy).
> > 
> > You can''t add to an existing mirror, but you can add new
mirrors (or
> > raidz) items to the pool.  If so, there''s no loss of
redundancy.
> 
> Maybe I''m missing some context, but you can add to an existing
mirror
> - see zpool attach.
It depends on what you mean by "add".  :-) 

The original message was about increasing storage allocation.  You can
add redundancy to an existing mirror with attach, but you cannot
increase the allocatable storage.


-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

Erik Trimble

2007-Jun-27 23:42 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

On Wed, 2007-06-27 at 14:50 -0700, Darren Dunham wrote:> > Darren Dunham wrote:
> > >> The problem I''ve come across with using mirror or
raidz for this setup
> > >> is that (as far as I know) you can''t add disks to
mirror/raidz groups,
> > >> and if you just add the disk to the pool, you end up in the
same
> > >> situation as above (with more space but no redundancy).
> > > 
> > > You can''t add to an existing mirror, but you can add new
mirrors (or
> > > raidz) items to the pool.  If so, there''s no loss of
redundancy.
> > 
> > Maybe I''m missing some context, but you can add to an
existing mirror
> > - see zpool attach.
> 
> It depends on what you mean by "add".  :-) 
> 
> The original message was about increasing storage allocation.  You can
> add redundancy to an existing mirror with attach, but you cannot
> increase the allocatable storage.
> 
With mirrors, there is currently more flexibility than with raid-Z[2].
You can increase the allocatable storage size by replacing each disk in
the mirror with a larger sized one (assuming you wait for a
re-sync ;-P )

Thus, the _safe_ way to increase a mirrored vdev''s size is:

Disk A:  100GB
Disk B:  100GB
Disk C:  250GB
Disk D:  250GB


zpool create tank mirror A B
(yank out A, put in C)
(wait for resync)
(yank out B, put in D)
(wait for resync)

and voila!  tank goes from 100GB to 250GB of space.

I believe this should also work if LUNs are used instead of actual disks
- but I don''t believe that resizing a LUN currently in a mirror will
work (please, correct me on this), so, for a SAN-backed ZFS mirror, it
would be:

Assuming A = B < C, and after resizing A, A = C > B

zpool create tank mirror A B
zpool attach tank A C   (where C is a new LUN of the new size desired)
(wait for sync of C)
zpool detach tank A
(unmap LUN A from host, resize A to be the same as C, then map back)
zpool attach C A  
(wait for sync of A)
zpool detach B

I believe that will now result in a mirror of the full size of C, not of
B.

I''d be interested to know if you could do this:

zpool create tank mirror A B
(resize LUN A and B to new size)


without requiring a system reboot after resizing A & B  (that is, the
reboot would be needed to update the new LUN size on the host).


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Richard Elling

2007-Jun-27 23:46 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

Jef Pearlman wrote:> Perhaps I''m not asking my question clearly. I''ve already
experimented a fair amount
> with zfs, including creating and destroying a number of pools with and
without
> redundancy, replacing vdevs, etc. Maybe asking by example will clarify what
I''m
> looking for or where I''ve missed the boat. The key is that I want
a grow-as-you-go
> heterogenous set of disks in my pool:
The short answer:
	zpool add -- add a top-level vdev as a dynamic stripe column
		+ available space is increased

	zpool attach -- add a mirror to an existing vdev
		+ only works when the new mirror is the same size or larger than
		  the existing vdev
		+ available space is unchanged
		+ redundancy (RAS) is increased

	zpool detach -- remove a mirror from an existing vdev
		+ available space increases if removed mirror is smaller than vdev
		+ redundancy (RAS) is decreased

	zpool replace -- functionally equivalent to attach followed by detach

> Let''s say I start with a 40g drive and a 60g drive. I create a
non-redundant pool
> (which will be 100g). At some later point, I run across an unused 30g
drive, which
> I add to the pool. Now my pool is 130g. At some point after that, the 40g
drive
> fails, either by producing read errors or my failing to spin up at all.
What happens
> to my pool? Can I mount and access it at all (for the data not on or
striped across
> the 40g drive)? Can I "zfs replace" the 40g drive with another
drive and have it
> attempt to copy as much data over as it can? Or am I just out of luck? zfs
seems like
> a great way to use old/unutilized drives to expand capacity, but sooner or
later one
> of those drives will fail, and if it takes out the whole pool (which it
might
> reasonably do), then it doesn''t work out in the end.
For non-redundant zpools, a device failure *may* cause the zpool to be
unavailable.
The actual availability depends on the nature of the failure.

A more common scenario might be to add a 400 GByte drive, which you can use to
replace the older drives, or keep online for redundancy.

The zfs copies feature is a little bit harder to grok.  It is difficult to
predict how the system will be affected if you have copies=2 in your above
scenario, because it depends on how the space is allocated.  For more info,
see my notes at:
	http://blogs.sun.com/relling/entry/zfs_copies_and_data_protection

  -- richard

Erik Trimble

2007-Jun-27 23:59 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

On Wed, 2007-06-27 at 12:03 -0700, Jef Pearlman wrote:> > Jef Pearlman wrote:
> > > Absent that, I was considering using zfs and just
> > > having a single pool. My main question is this: what
> > > is the failure mode of zfs if one of those drives
> > > either fails completely or has errors? Do I
> > > permanently lose access to the entire pool? Can I
> > > attempt to read other data? Can I "zfs replace" the
> > > bad drive and get some level of data recovery?
> > > Otherwise, by pooling drives am I simply increasing
> > > the probability of a catastrophic data loss? I
> > > apologize if this is addressed elsewhere -- I''ve read
> > > a bunch about zfs, but not come across this
> > > particular answer.
> > Pooling devices in a non-redundant mode (ie without a raidz or mirror
vdev) increases your chance of losing data, just like every other RAID
system out there.

However, since ZFS doesn''t do concatenation (it stripes), by losing one
drive in a non-redundant stripe, you effectively corrupt the entire
dataset, as virtually all files should have some portion of their data
on the dead drive. 

> > We generally recommend a single pool, as long as the
> > use case permits.
> > But I think you are confused about what a zpool is.
> >  I suggest you look
> > t the examples or docs.  A good overview is the slide
> > show
> > 	http://www.opensolaris.org/os/community/zfs/docs/zfs_
> > last.pdf
> 
> Perhaps I''m not asking my question clearly. I''ve already
experimented a fair amount with zfs, including creating and destroying a number
of pools with and without redundancy, replacing vdevs, etc. Maybe asking by
example will clarify what I''m looking for or where I''ve missed
the boat. The key is that I want a grow-as-you-go heterogenous set of disks in
my pool:
> 
> Let''s say I start with a 40g drive and a 60g drive. I create a
non-redundant pool (which will be 100g). At some later point, I run across an
unused 30g drive, which I add to the pool. Now my pool is 130g. At some point
after that, the 40g drive fails, either by producing read errors or my failing
to spin up at all. What happens to my pool? Can I mount and access it at all
(for the data not on or striped across the 40g drive)? Can I "zfs
replace" the 40g drive with another drive and have it attempt to copy as
much data over as it can? Or am I just out of luck? zfs seems like a great way
to use old/unutilized drives to expand capacity, but sooner or later one of
those drives will fail, and if it takes out the whole pool (which it might
reasonably do), then it doesn''t work out in the end.
>  
Nope. Your zpool is a stripe. As mentioned above, losing one disk in a
stripe effectively destroys all data, just as with any other RAID
system.

> > > As a side-question, does anyone have a suggestion
> > > for an intelligent way to approach this goal? This is
> > > not mission-critical data, but I''d prefer not to make
> > > data loss _more_ probable. Perhaps some volume
> > > manager (like LVM on linux) has appropriate features?
> > 
> > ZFS, mirrored pool will be the most performant and
> > easiest to manage
> > with better RAS than a raidz pool.
> 
> The problem I''ve come across with using mirror or raidz for this
setup is that (as far as I know) you can''t add disks to mirror/raidz
groups, and if you just add the disk to the pool, you end up in the same
situation as above (with more space but no redundancy).
> 
> Thanks for your help.
> 
> -Jef
>  
> 
To answer the original question, you _have_ to create mirrors, which, if
you have odd-sized disks, will end up with unused space.

An example:

Disk A:   20GB
Disk B:   30GB
Disk C:   40GB
Disk D:   60GB


Start with disk A & B:

zpool create tank mirror A B

results in a 20GB pool.

Later, add disks C & D:

zpool add tank mirror C D

this results in a 2-wide stripe of 2 mirrors, which means there is a
total capacity of 60GB (20GB for A & B, 40GB for B & C) of the pool.
10GB of the 30GB drive, and 20GB of the 60GB drive are currently unused.
You can lose one drive from both pairs (i.e. A and C, A and D, B and C,
or B and D) before any data loss.


If you had known about the drive sizes beforehand, the you could have
done something like this:

Partition the drives as follows:

A:  1 20GB partition
B:  1 20gb & 1 10GB partition
C:  1 40GB partition
D:  1 40GB partition & 2 10GB paritions

then you do:

zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1

and you get a total of 70GB of space. However, the performance on this
is going to be bad (as you frequently need to write to both partitions
on B & D, causing head seek), though you can still lose up to 2 drives
before experiencing data loss.


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Richard Elling

2007-Jun-28 04:54 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

Erik Trimble wrote:> If you had known about the drive sizes beforehand, the you could have
> done something like this:
> 
> Partition the drives as follows:
> 
> A:  1 20GB partition
> B:  1 20gb & 1 10GB partition
> C:  1 40GB partition
> D:  1 40GB partition & 2 10GB paritions
> 
> then you do:
> 
> zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1
> 
> and you get a total of 70GB of space. However, the performance on this
> is going to be bad (as you frequently need to write to both partitions
> on B & D, causing head seek), though you can still lose up to 2 drives
> before experiencing data loss.
It is not clear to me that we can say performance will be bad
for stripes on single disks.  The reason is that ZFS dynamic
striping does not use a fixed interleave.  In other words, if
I write a block of N bytes to a M-way dynamic stripe, it is
not guaranteed that each device will get an I/O of N/M size.
I''ve only done a few measurements of this, and I''ve not
completed
my analysis, but my data does not show the sort of thrashing one
might expect from a fixed stripe with small interleave.
  -- richard

Erik Trimble

2007-Jun-28 10:14 UTC

head link

[zfs-discuss] Re: Drive Failure w/o Redundancy

Richard Elling wrote:> Erik Trimble wrote:
>> If you had known about the drive sizes beforehand, the you could have
>> done something like this:
>>
>> Partition the drives as follows:
>>
>> A:  1 20GB partition
>> B:  1 20gb & 1 10GB partition
>> C:  1 40GB partition
>> D:  1 40GB partition & 2 10GB paritions
>>
>> then you do:
>>
>> zpool create tank mirror Ap0 Bp0 mirror Cp0 Dp0 mirror Bp1 Dp1
>>
>> and you get a total of 70GB of space. However, the performance on this
>> is going to be bad (as you frequently need to write to both partitions
>> on B & D, causing head seek), though you can still lose up to 2
drives
>> before experiencing data loss.
>
> It is not clear to me that we can say performance will be bad
> for stripes on single disks.  The reason is that ZFS dynamic
> striping does not use a fixed interleave.  In other words, if
> I write a block of N bytes to a M-way dynamic stripe, it is
> not guaranteed that each device will get an I/O of N/M size.
> I''ve only done a few measurements of this, and I''ve not
completed
> my analysis, but my data does not show the sort of thrashing one
> might expect from a fixed stripe with small interleave.
>  -- richardThat is correct, Richard.  However, it applies to relatively small 
read/writes, which do not exceed the max stripe size.  Now, this is 
probably pretty likely, but there is another issue here:   even given 
that not all disks will have an I/O on a stripe access, there still is a 
relatively good chance that both partitions on the disk get an I/O 
request.  On average, I''d assume that you don''t really improve
much over
a full-stripe I/O, and in either case, it would be worse than a zpool 
which did not have multiple partitions on the same disk.   Also, for 
large file access - where you guaranty the need for full-stripe access - 
you certainly are going to disk thrash.

Numbers would be nice, of course. :-)

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

zfs discuss - Jun 2007 - Drive Failure w/o Redundancy

[zfs-discuss] Drive Failure w/o Redundancy

[zfs-discuss] Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy

[zfs-discuss] Re: Drive Failure w/o Redundancy