thr3ads.net - zfs discuss - [zfs-discuss] Proposal: ZFS Hot Spare support [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Eric Schrock

2006-Mar-30 17:03 UTC

[zfs-discuss] Proposal: ZFS Hot Spare support

As mentioned last night, we''ve been reviewing a proposal for hot spare
support in ZFS.  Below you can find a current draft of the proposed
interfaces.  This has not yet been submitted for ARC review, but
comments are welcome.  Note that this does not include any enhanced FMA
diagnosis to determine when a device is "faulted".  This will come in
a
follow-on project, of which some preliminary designs have been sketched
out but not enough to draft any coherent proposal.

- Eric


A. DESCRIPTION

ZFS, as an integrated volume manager and filesystem, has the ability to
replace disks within an active pool.  This allows administrators to
replace failing or faulted drives to keep the system functioning
with the required level of replication.  Most other volume managers also
support the ability to perform this replacement automatically through
the use of "hot spares".  This case will add this functionality to
ZFS.

This case will increment the on-disk version number in accordance to
PSARC 2006/206, as the resulting labels introduce a new pool state that
older pools will not understand, and exported pools containing hot
spares will not be importable on earlier versions.

B. POOL MANAGEMENT

Hot spares are stored with each pool, although they can be overlapped
between different pools.  This allows administrators to reserve
system-wide hot spares, as well as per-pool hot spares according to their
policies.

1. Creating a pool with hot spares

A pool can be created with hot spares by using the new ''spare''
vdev:

	# zpool create test mirror c0d0 c1d0 spare c2d0 c3d0

This will create a pool with a single mirror and two spares.  Only a
single ''spare'' vdev can be specified, though it can appear
anywhere
within the command line.  The resulting pool looks like the following:

	# zpool status
	  pool: test
	 state: ONLINE
	 scrub: none requested
	config:

		NAME         STATE     READ WRITE CKSUM
		test         ONLINE       0     0     0
		  mirror     ONLINE       0     0     0
		    c0d0     ONLINE       0     0     0
		    c1d0     ONLINE       0     0     0
		spares
		  c2d0       ONLINE
		  c3d0       ONLINE

	errors: No known data errors

2. Adding hot spares to a pool

Hot spares can be added to a pool in the same manner by using ''zpool
add'':

	# zpool add test spare c4d0 c5d0

This will add two disks to the set of available spares in the pool.

3. Removing hot spares from a pool

Hot spares can be removed from a pool with the new ''zpool
remove''
subcommand.  This subcommand suggests the ability to remove arbitrary
devices, and certainly is a feature that will be supported in a future
release, but currently this will only allow removing hot spares.  For
example:

	# zpool remove test c2d0

If the hot spare is currently spared in, then the command will print an
error and exit.

4. Activating a hot spare

Hot spares can be used for replacement just like any other device using
''zpool replace''.  If ZFS detects that the device is a hot
spare within
the same pool, then it will create a ''spare'' vdev instead of a
''replacing'' vdev:

	# zpool replace test c0d0 c2d0
	# zpool status
	...
	config:
		NAME           STATE     READ WRITE CKSUM
		test           ONLINE       0     0     0
		  mirror       ONLINE       0     0     0
		    spare      ONLINE       0     0     0
		      c0d0     ONLINE       0     0     0  35.5K resilvered
		      c2d0     ONLINE       0     0     0  35.5K resilvered
		    c1d0       ONLINE       0     0     0
		spares
		  c2d0         SPARED    currently in use
		  c3d0         ONLINE


The difference between a ''replacing'' and
''spare'' vdev is that the former
automatically removes the original drive once the replace completes.
With spares, the vdev remains until the original device is removed from
the system, at which point the hot spare is returned to the pool of
available spares.  Note that in this example we have replaced an online
device.  Under normal circumstances, the device in question would be
faulted or the administrator would have proactively offlined the device.

5. Deactivating a hot spare

There are 3 ways in which a hot spare can be deactivated: cancelling the
hot spare, replacing the original drive, or permanently swapping in the
hot spare.

To cancel a hot spare attempt, the user can simply ''zpool
detach'' the
hot spare in question, at which point it will be returned to the set of
available spares, the the original drive will remain in its current
position (faulted or not):

	# zpool detach test c2d0
	# zpool status
	...
	config:

		NAME         STATE     READ WRITE CKSUM
		test         ONLINE       0     0     0
		  mirror     ONLINE       0     0     0
		    c0d0     ONLINE       0     0     0  35.5K resilvered
		    c1d0     ONLINE       0     0     0
		spares
		  c2d0       ONLINE
		  c3d0       ONLINE

If the original device is replaced, then the spare is automatically
removed once the replace completes:

	# zpool replace test c0d0 c4d0
	# zpool status
	...
	config:

		NAME             STATE     READ WRITE CKSUM
		test             ONLINE       0     0     0
		  mirror         ONLINE       0     0     0
		    spare        ONLINE       0     0     0
		      replacing  ONLINE       0     0     0
			c0d0     ONLINE       0     0     0  38K resilvered
			c4d0     ONLINE       0     0     0  38K resilvered
		      c2d0       ONLINE       0     0     0  38K resilvered
		    c1d0         ONLINE       0     0     0
		spares
		  c2d0           SPARED    currently in use
		  c3d0           ONLINE
	<wait for replace to complete>
	# zpool status
	...
	config:

		NAME         STATE     READ WRITE CKSUM
		test         ONLINE       0     0     0
		  mirror     ONLINE       0     0     0
		    c4d0     ONLINE       0     0     0  35.5K resilvered
		    c1d0     ONLINE       0     0     0
		spares
		  c2d0       ONLINE
		  c3d0       ONLINE

If the user instead wants the hot spare to permanently assume the place
of the original device, the original device can be removed with ''zpool
detach''.  At this point the hot spare will become a functioning device,
and automatically be removed from the list of available hot spares
(for all pools if it is shared):

	# zpool detach test c0d0
	# zpool status
	...
	config:

		NAME         STATE     READ WRITE CKSUM
		test         ONLINE       0     0     0
		  mirror     ONLINE       0     0     0
		    c2d0     ONLINE       0     0     0  35K resilvered
		    c1d0     ONLINE       0     0     0
		spares
		  c3d0       ONLINE

6. Determining device usage

A hot spare is considered ''in use'' for the purpose of
libdiskmgt and
zpool(1M) if it is labelled as a spare and is currently in one or more
pool''s list of active spares.  If a spare is part of an exported pool,
it is not considered in use, due largely to the fact that distinguishing
this case from a recently destroyed pool is difficult and not solvable
in the general case.

C. AUTOMATED REPLACEMENT

In order to perform automated replacement, a ZFS FMA agent will be added
that subscribes to ''fault.zfs.vdev.*'' faults.  When a fault is
received,
the agent will examine the pool to see if it has any available hot
spares.  If so, it will perform a ''zpool replace'' with an
available
spare.  The initial algorithm for this will be ''first come, first
serve'', which may not be ideal for all circumstances (such as when not
all spares are the same size).  It is anticipated that these
circumstances will be rare, and that the algorithm can be improved in
the future.

This is currently limited by the fact that the ZFS diagnosis engine only
emits faults when a device has disappeared from the system.  When the DE
is enhanced to proactively fault drives based on error rates, then the
agent will automaticaly leverage this feature.

In addition, note that there is no automated response capable of
bringing the original drive back online.  The user must explicitly take
one of the actions described above.  A future enhancement will
allow ZFS to subscribe to hotplug events and automatically replace the
affected drive when it is replaced on the system.

D. MANPAGE DIFFS

XXX

Ellis, Mike

2006-Mar-30 18:20 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

I didn''t catch a mention of RaidZ in your note.

How would hot-spares play in a RaidZ type configuration? (Especially
with the "auto-return-home" (or predictive replacement) feature your
mention.[ In traditional Raid-Arrays hot-spare rebuilds and
"go-home-transitions" are handled differently to cut down on exposure
windows, and resource utilization, not sure if/how that applies here...
] 

If I read/interpreted the last part of your note, I think its OK to use
a max-size LUN to hot-spare any (LUN <= HOT_SPARE), returning
"home"
once its job is done, ready to spare for any other pool/lun. (obviously
not the entire hot-spare will be used if its "sparing" for a smaller
failed LUN).   

Maybe there is no difference between Mirrored/RaidZ-configurations (ZFS
masking all this?), but even in this case some note regarding this
working for both mirrored and RaidZ configurations might make sense?

Thanks,

 -- MikeE



-----Original Message-----
From: zfs-discuss-bounces at opensolaris.org
[mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Eric Schrock
Sent: Thursday, March 30, 2006 12:04 PM
To: zfs-discuss at opensolaris.org
Subject: [zfs-discuss] Proposal: ZFS Hot Spare support


As mentioned last night, we''ve been reviewing a proposal for hot spare
support in ZFS.  Below you can find a current draft of the proposed
interfaces.  This has not yet been submitted for ARC review, but
comments are welcome.  Note that this does not include any enhanced FMA
diagnosis to determine when a device is "faulted".  This will come in
a
follow-on project, of which some preliminary designs have been sketched
out but not enough to draft any coherent proposal.

- Eric


A. DESCRIPTION

ZFS, as an integrated volume manager and filesystem, has the ability to
replace disks within an active pool.  This allows administrators to
replace failing or faulted drives to keep the system functioning
with the required level of replication.  Most other volume managers also
support the ability to perform this replacement automatically through
the use of "hot spares".  This case will add this functionality to
ZFS.

This case will increment the on-disk version number in accordance to
PSARC 2006/206, as the resulting labels introduce a new pool state that
older pools will not understand, and exported pools containing hot
spares will not be importable on earlier versions.

B. POOL MANAGEMENT

Hot spares are stored with each pool, although they can be overlapped
between different pools.  This allows administrators to reserve
system-wide hot spares, as well as per-pool hot spares according to
their
policies.

1. Creating a pool with hot spares

A pool can be created with hot spares by using the new ''spare''
vdev:

	# zpool create test mirror c0d0 c1d0 spare c2d0 c3d0

This will create a pool with a single mirror and two spares.  Only a
single ''spare'' vdev can be specified, though it can appear
anywhere
within the command line.  The resulting pool looks like the following:

	# zpool status
	  pool: test
	 state: ONLINE
	 scrub: none requested
	config:

		NAME         STATE     READ WRITE CKSUM
		test         ONLINE       0     0     0
		  mirror     ONLINE       0     0     0
		    c0d0     ONLINE       0     0     0
		    c1d0     ONLINE       0     0     0
		spares
		  c2d0       ONLINE
		  c3d0       ONLINE

	errors: No known data errors

2. Adding hot spares to a pool

Hot spares can be added to a pool in the same manner by using ''zpool
add'':

	# zpool add test spare c4d0 c5d0

This will add two disks to the set of available spares in the pool.

3. Removing hot spares from a pool

Hot spares can be removed from a pool with the new ''zpool
remove''
subcommand.  This subcommand suggests the ability to remove arbitrary
devices, and certainly is a feature that will be supported in a future
release, but currently this will only allow removing hot spares.  For
example:

	# zpool remove test c2d0

If the hot spare is currently spared in, then the command will print an
error and exit.

4. Activating a hot spare

Hot spares can be used for replacement just like any other device using
''zpool replace''.  If ZFS detects that the device is a hot
spare within
the same pool, then it will create a ''spare'' vdev instead of a
''replacing'' vdev:

	# zpool replace test c0d0 c2d0
	# zpool status
	...
	config:
		NAME           STATE     READ WRITE CKSUM
		test           ONLINE       0     0     0
		  mirror       ONLINE       0     0     0
		    spare      ONLINE       0     0     0
		      c0d0     ONLINE       0     0     0  35.5K
resilvered
		      c2d0     ONLINE       0     0     0  35.5K
resilvered
		    c1d0       ONLINE       0     0     0
		spares
		  c2d0         SPARED    currently in use
		  c3d0         ONLINE


The difference between a ''replacing'' and
''spare'' vdev is that the former
automatically removes the original drive once the replace completes.
With spares, the vdev remains until the original device is removed from
the system, at which point the hot spare is returned to the pool of
available spares.  Note that in this example we have replaced an online
device.  Under normal circumstances, the device in question would be
faulted or the administrator would have proactively offlined the device.

5. Deactivating a hot spare

There are 3 ways in which a hot spare can be deactivated: cancelling the
hot spare, replacing the original drive, or permanently swapping in the
hot spare.

To cancel a hot spare attempt, the user can simply ''zpool
detach'' the
hot spare in question, at which point it will be returned to the set of
available spares, the the original drive will remain in its current
position (faulted or not):

	# zpool detach test c2d0
	# zpool status
	...
	config:

		NAME         STATE     READ WRITE CKSUM
		test         ONLINE       0     0     0
		  mirror     ONLINE       0     0     0
		    c0d0     ONLINE       0     0     0  35.5K
resilvered
		    c1d0     ONLINE       0     0     0
		spares
		  c2d0       ONLINE
		  c3d0       ONLINE

If the original device is replaced, then the spare is automatically
removed once the replace completes:

	# zpool replace test c0d0 c4d0
	# zpool status
	...
	config:

		NAME             STATE     READ WRITE CKSUM
		test             ONLINE       0     0     0
		  mirror         ONLINE       0     0     0
		    spare        ONLINE       0     0     0
		      replacing  ONLINE       0     0     0
			c0d0     ONLINE       0     0     0  38K
resilvered
			c4d0     ONLINE       0     0     0  38K
resilvered
		      c2d0       ONLINE       0     0     0  38K
resilvered
		    c1d0         ONLINE       0     0     0
		spares
		  c2d0           SPARED    currently in use
		  c3d0           ONLINE
	<wait for replace to complete>
	# zpool status
	...
	config:

		NAME         STATE     READ WRITE CKSUM
		test         ONLINE       0     0     0
		  mirror     ONLINE       0     0     0
		    c4d0     ONLINE       0     0     0  35.5K
resilvered
		    c1d0     ONLINE       0     0     0
		spares
		  c2d0       ONLINE
		  c3d0       ONLINE

If the user instead wants the hot spare to permanently assume the place
of the original device, the original device can be removed with ''zpool
detach''.  At this point the hot spare will become a functioning device,
and automatically be removed from the list of available hot spares
(for all pools if it is shared):

	# zpool detach test c0d0
	# zpool status
	...
	config:

		NAME         STATE     READ WRITE CKSUM
		test         ONLINE       0     0     0
		  mirror     ONLINE       0     0     0
		    c2d0     ONLINE       0     0     0  35K resilvered
		    c1d0     ONLINE       0     0     0
		spares
		  c3d0       ONLINE

6. Determining device usage

A hot spare is considered ''in use'' for the purpose of
libdiskmgt and
zpool(1M) if it is labelled as a spare and is currently in one or more
pool''s list of active spares.  If a spare is part of an exported pool,
it is not considered in use, due largely to the fact that distinguishing
this case from a recently destroyed pool is difficult and not solvable
in the general case.

C. AUTOMATED REPLACEMENT

In order to perform automated replacement, a ZFS FMA agent will be added
that subscribes to ''fault.zfs.vdev.*'' faults.  When a fault is
received,
the agent will examine the pool to see if it has any available hot
spares.  If so, it will perform a ''zpool replace'' with an
available
spare.  The initial algorithm for this will be ''first come, first
serve'', which may not be ideal for all circumstances (such as when not
all spares are the same size).  It is anticipated that these
circumstances will be rare, and that the algorithm can be improved in
the future.

This is currently limited by the fact that the ZFS diagnosis engine only
emits faults when a device has disappeared from the system.  When the DE
is enhanced to proactively fault drives based on error rates, then the
agent will automaticaly leverage this feature.

In addition, note that there is no automated response capable of
bringing the original drive back online.  The user must explicitly take
one of the actions described above.  A future enhancement will
allow ZFS to subscribe to hotplug events and automatically replace the
affected drive when it is replaced on the system.

D. MANPAGE DIFFS

XXX
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Eric Schrock

2006-Mar-30 18:28 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Thu, Mar 30, 2006 at 01:20:20PM -0500, Ellis, Mike
wrote:> I didn''t catch a mention of RaidZ in your note.
> 
> How would hot-spares play in a RaidZ type configuration? (Especially
> with the "auto-return-home" (or predictive replacement) feature
your
> mention.[ In traditional Raid-Arrays hot-spare rebuilds and
> "go-home-transitions" are handled differently to cut down on
exposure
> windows, and resource utilization, not sure if/how that applies here...
> ] 
> 
> If I read/interpreted the last part of your note, I think its OK to use
> a max-size LUN to hot-spare any (LUN <= HOT_SPARE), returning
"home"
> once its job is done, ready to spare for any other pool/lun. (obviously
> not the entire hot-spare will be used if its "sparing" for a
smaller
> failed LUN).   
Yep.  The initial concern raised was "what if I have a pool with half
36G disks and half 72G disks?"  If you then have both 36G and 72G
spares, then using a 72G spare for a 36G disk could potentially deprive
you of a needed hot spare should a 72G disk fail.  In general, this is a
misconfigured system, since it gives you a false sense of security when
examining your available hot spares.  Hence not worrying about it in the
initial version.
> Maybe there is no difference between Mirrored/RaidZ-configurations (ZFS
> masking all this?), but even in this case some note regarding this
> working for both mirrored and RaidZ configurations might make sense?
Yes, Mirror and RAID-Z replacements are handled identically, and use the
same resilvering code.  There is no need to do any special-casing or
worry about "exposure windows" or anything like that.  I can add
statements to that effect.  Note that it may also be possible to
hot-spare unreplicated pools with the arrival of predictive analysis and
pro-active replacement.   The usefulness of this feature is rather
questionable, however.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

David Magda

2006-Mar-31 01:33 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Mar 30, 2006, at 12:03, Eric Schrock wrote:
> A hot spare is considered ''in use'' for the purpose of
libdiskmgt and
> zpool(1M) if it is labelled as a spare and is currently in one or more
> pool''s list of active spares.  If a spare is part of an exported
pool,
> it is not considered in use, due largely to the fact that  
> distinguishing
> this case from a recently destroyed pool is difficult and not solvable
> in the general case.
Would it be possible (or useful) to have a ''pool'' of spares
available
to a couple of ZFS pools?

Instead of associating the disks with a particular pool, you would be  
able to say "if a disk fails in ZFS pool X, Y, or Z, grab a disk 1,  
2, or 3; if a disk fails in ZFS pool A, B, or C, grab disk 4 or 5;  
all other ZFS pools should grab disk 6".

Barry Robison

2006-Mar-31 02:31 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

David Magda wrote:
>
> Would it be possible (or useful) to have a ''pool'' of
spares available
> to a couple of ZFS pools?
>
> Instead of associating the disks with a particular pool, you would be  
> able to say "if a disk fails in ZFS pool X, Y, or Z, grab a disk 1,  
> 2, or 3; if a disk fails in ZFS pool A, B, or C, grab disk 4 or 5;  
> all other ZFS pools should grab disk 6".
B. POOL MANAGEMENT

Hot spares are stored with each pool, although they can be overlapped
between different pools.  This allows administrators to reserve
system-wide hot spares, as well as per-pool hot spares according to their
policies.

So spares can belong to multiple pools, I take it.

Eric Schrock

2006-Mar-31 04:46 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Thu, Mar 30, 2006 at 08:33:32PM -0500, David Magda
wrote:> 
> Would it be possible (or useful) to have a ''pool'' of
spares available
> to a couple of ZFS pools?
> 
> Instead of associating the disks with a particular pool, you would be  
> able to say "if a disk fails in ZFS pool X, Y, or Z, grab a disk 1,  
> 2, or 3; if a disk fails in ZFS pool A, B, or C, grab disk 4 or 5;  
> all other ZFS pools should grab disk 6".
We kicked this idea around for a while, but there are two main reasons
for not doing it:

1. You need to invent a new grammar for describing arbitrary relations
   between spares and pools.  We can''t leverage any existing ZFS CLI to
   do this for us.

2. The information about which spares are allocated to your pool is no
   longer associated with your disks.  With ZFS, we''ve tried very to
   keep all information about your data, including how to mount it,
   share it, and manage redundancy, with the data itself.  Having a
   separate pool means that ''zpool export'' no longer takes
information
   about my hot spares anymore, which is not too appealing.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Eric Schrock

2006-Mar-31 05:45 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Fri, Mar 31, 2006 at 01:31:49PM +1100, Barry Robison
wrote:> 
> So spares can belong to multiple pools, I take it.
> 
Yep.  Here''s an example:

# zpool create test mirror c0t0d0 c0t1d0 spare c1t0d0 c1t1d0
# zpool create test2 mirror c4t0d0 c4t1d0 spare c1t0d0 c1t1d0
# zpool status
  pool: test
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
        spares
          c1t0d0    ONLINE  
          c1t1d0    ONLINE  

errors: No known data errors

  pool: test2
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        test2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
        spares
          c1t0d0    ONLINE  
          c1t1d0    ONLINE  

errors: No known data errors
# zpool replace test c0t0d0 c1t0d0
# zpool status
  pool: test
 state: ONLINE
 scrub: resilver completed with 0 errors on Thu Mar 30 21:42:37 2006
config:

        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            spare     ONLINE       0     0     0
              c0t0d0  ONLINE       0     0     0  35.5K resilvered
              c1t0d0  ONLINE       0     0     0  35.5K resilvered
            c0t1d0    ONLINE       0     0     0
        spares
          c1t0d0      SPARED    currently in use
          c1t1d0      ONLINE  

errors: No known data errors

  pool: test2
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        test2       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
        spares
          c1t0d0    SPARED    in use by pool ''test''
          c1t1d0    ONLINE  

errors: No known data errors

It''s probably a bug that the ''test'' pool is reported
as ONLINE.  By
definition, a ''spare'' vdev should probably be treated as
DEGRADED.  I
can fix that...

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Darren Reed

2006-Mar-31 07:16 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Eric Schrock wrote:
> ...
>
># zpool replace test c0t0d0 c1t0d0
># zpool status
>  pool: test
> state: ONLINE
> scrub: resilver completed with 0 errors on Thu Mar 30 21:42:37 2006
>config:
>
>        NAME          STATE     READ WRITE CKSUM
>        test          ONLINE       0     0     0
>          mirror      ONLINE       0     0     0
>            spare     ONLINE       0     0     0
>              c0t0d0  ONLINE       0     0     0  35.5K resilvered
>              c1t0d0  ONLINE       0     0     0  35.5K resilvered
>            c0t1d0    ONLINE       0     0     0
>        spares
>          c1t0d0      SPARED    currently in use
>          c1t1d0      ONLINE
>...
>It''s probably a bug that the ''test'' pool is
reported as ONLINE.  By
>definition, a ''spare'' vdev should probably be treated as
DEGRADED.  I
>can fix that...
>
To me the output here is a little confusing.  Shouldn''t the status
of c0t0d0 in mirror''s spare output say something other than
"ONLINE"?
Perhaps also that for c1t0d0?

I''d expect c1t0d0 to be ONLINE (in the mirror/spare output) after the
replacement is complete and at some other state in the meantime.

Darren

Jeff Bonwick

2006-Mar-31 07:57 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

> >        spares
> >          c1t0d0      SPARED    currently in use
> >          c1t1d0      ONLINE
> To me the output here is a little confusing.  Shouldn''t the status
> of c0t0d0 in mirror''s spare output say something other than
"ONLINE"?
> Perhaps also that for c1t0d0?
I agree.  I''d expect ONLINE to mean in use, and OFFLINE to mean
not in use (and thus available).  But that''s still somewhat indirect.

How about TAKEN and AVAILABLE?

Jeff

przemolicc at poczta.fm

2006-Mar-31 09:02 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Thu, Mar 30, 2006 at 09:03:30AM -0800, Eric Schrock
wrote:> 3. Removing hot spares from a pool
> 
> Hot spares can be removed from a pool with the new ''zpool
remove''
> subcommand.  This subcommand suggests the ability to remove arbitrary
> devices, and certainly is a feature that will be supported in a future
> release, but currently this will only allow removing hot spares.  For
> example:
> 
> 	# zpool remove test c2d0
> 
> If the hot spare is currently spared in, then the command will print an
> error and exit.
I am not sure whether shrinking of pool is considered in the future but if it is
wouldn''t be better to use another syntax:

[SPARES]
# zpool remove test spare c2d0

[SHRINKING]
# zpool remove test c2d0

This way I distinguish betweent removing spare and _shrinking_ pool. Without
that I could easily make a mistake.

przemol

Robert Milkowski

2006-Mar-31 10:59 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Hello Eric,

  This is great!

  However it would be really usefull if you could specify that some of
  spares are global - so if I create new pool this spares will
  assigned automatically.
  

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

David Magda

2006-Mar-31 12:48 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Mar 31, 2006, at 00:45, Eric Schrock wrote:
> # zpool create test mirror c0t0d0 c0t1d0 spare c1t0d0 c1t1d0
> # zpool create test2 mirror c4t0d0 c4t1d0 spare c1t0d0 c1t1d0
Yes, I must have read over section B too quickly, since this is more  
or less what I meant in my question.

Thanks for clearing things up.

Regards,
David

Eric Schrock

2006-Mar-31 17:39 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Thu, Mar 30, 2006 at 11:57:35PM -0800, Jeff Bonwick
wrote:> > >        spares
> > >          c1t0d0      SPARED    currently in use
> > >          c1t1d0      ONLINE
> 
> > To me the output here is a little confusing.  Shouldn''t the
status
> > of c0t0d0 in mirror''s spare output say something other than
"ONLINE"?
> > Perhaps also that for c1t0d0?
> 
> I agree.  I''d expect ONLINE to mean in use, and OFFLINE to mean
> not in use (and thus available).  But that''s still somewhat
indirect.
> 
> How about TAKEN and AVAILABLE?
I''m all for AVAILABLE.  It''s still possible to have UNAVAIL
spares as
well, as the kernel verifies that they can be opened and correspond to a
known device.  Of course, this makes me wonder about replacing hot
spares.  If we validate the GUID is known, how does one replace a hot
spare?  If I swap in a different drive, it''ll complain that the disk
doesn''t match the known spare.  Perhaps ''zpool
replace'' needs to support
hot spares, and the future hotplug work can replace them automatically.
I''ll need to think about that for a bit...

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Eric Schrock

2006-Mar-31 17:41 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Fri, Mar 31, 2006 at 12:59:47PM +0200, Robert Milkowski
wrote:> Hello Eric,
> 
>   This is great!
> 
>   However it would be really usefull if you could specify that some of
>   spares are global - so if I create new pool this spares will
>   assigned automatically.
I''m hesitant to do this for two reasons:

1. We''re creating auxilliary ZFS state that is independent of the pool 
   data.  This means that we need to invent a new syntax for managing
   system-wide global spares, as well as how to assign them to pools.

2. Creating pools is not a common operation.  Most systems will have
   only one or two pools on it.  It''s easily enough to simply add the
   same spares to both pools, and more configurable.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Eric Schrock

2006-Mar-31 17:47 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Fri, Mar 31, 2006 at 11:02:59AM +0200, przemolicc at poczta.fm
wrote:> 
> I am not sure whether shrinking of pool is considered in the future but if
it is
> wouldn''t be better to use another syntax:
> 
> [SPARES]
> # zpool remove test spare c2d0
> 
> [SHRINKING]
> # zpool remove test c2d0
> 
> This way I distinguish betweent removing spare and _shrinking_ pool.
Without
> that I could easily make a mistake.
For future pool removal, I anticipate having labelled mirrors and
RAID-Z vdevs, so that you can identify them by name, such as:

	mirror-1
	   c0d0
	   c1d0
	mirror-2
	   c2d0
	   c3d0

Then, you can remove a toplevel vdev by saying ''zpool remove
mirror-1''.
The only way that this could become confusing is if you have an
unreplicated pool with hot spares, but I don''t see this being a useful
configuration.

Note that another possibility would be:

	# zpool remove mirror c0d0

Which means "remove the mirror containing disk c0d0", but that has
other
issues (especially if support mirrors of RAID-Z and more complicated
configurations).

This is definitely a reason not to have ''zpool remove'' behave
like
''zpool detach'' for a single drive case.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Darren Reed

2006-Mar-31 20:51 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Jeff Bonwick wrote:
>>>       spares
>>>         c1t0d0      SPARED    currently in use
>>>         c1t1d0      ONLINE
>>>      
>>>
>
>  
>
>>To me the output here is a little confusing.  Shouldn''t the
status
>>of c0t0d0 in mirror''s spare output say something other than
"ONLINE"?
>>Perhaps also that for c1t0d0?
>>    
>>
>
>I agree.  I''d expect ONLINE to mean in use, and OFFLINE to mean
>not in use (and thus available).  But that''s still somewhat
indirect.
>
>How about TAKEN and AVAILABLE?
>  
>
I agree with those suggestions.

Darren

Robert Milkowski

2006-Mar-31 23:59 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Hello Eric,

Friday, March 31, 2006, 7:41:57 PM, you wrote:

ES> On Fri, Mar 31, 2006 at 12:59:47PM +0200, Robert Milkowski
wrote:>> Hello Eric,
>> 
>>   This is great!
>> 
>>   However it would be really usefull if you could specify that some of
>>   spares are global - so if I create new pool this spares will
>>   assigned automatically.
ES> I''m hesitant to do this for two reasons:

ES> 1. We''re creating auxilliary ZFS state that is independent of
the pool
ES>    data.  This means that we need to invent a new syntax for managing
ES>    system-wide global spares, as well as how to assign them to pools.

ES> 2. Creating pools is not a common operation.  Most systems will have
ES>    only one or two pools on it.  It''s easily enough to simply
add the
ES>    same spares to both pools, and more configurable.

I don''t know - ZFS was mainly targeted for large systems (I mean in
those systems you will see big difference with ZFS) and for example
here we add quite a lot of storage on regular basis (I won''t make just
one large pool, rather many small pools) and creating globa host
spares at the beginning would be welcomed improvements - the same way
we have it on HW arrays.

btw: I guess hot spares in ZFS won''t make it into U2...?

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Eric Schrock

2006-Apr-01 00:11 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Sat, Apr 01, 2006 at 01:59:38AM +0200, Robert Milkowski
wrote:> 
> I don''t know - ZFS was mainly targeted for large systems (I mean
in
> those systems you will see big difference with ZFS) and for example
> here we add quite a lot of storage on regular basis (I won''t make
just
> one large pool, rather many small pools) and creating globa host
> spares at the beginning would be welcomed improvements - the same way
> we have it on HW arrays.
Why won''t you make just one large pool, rather than many small pools?
The only reason not to do so is:

a. Different performance characteristics
	or
b. Different fault tolerance characteristics

I can see a server with just two or three pools (one for the root disk,
one for customer data, etc), but I don''t see why you would create lots
of new pools on a regular basis.  Can you explain your use case and
reasons in a little more detail?  "Because we can do it on product X"
doesn''t really help, especially when a HW array is so fundamentally
different from a ZFS storage pool.

Supposing we were to adopt the idea of "global spares", where would
this
information be stored?  What would the zpool(1M) interface look like?
Could I still do per-pool spares?  What would happen when I exported and
imported a pool?  If a spare is swapped in permanently (an asynchronous
event in the kernel), does it then remove it from the global list of
spares for subsequent pools?  I''m still having trouble envisioning the
details of how this would actually work...
> btw: I guess hot spares in ZFS won''t make it into U2...?
Yes, that is correct.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Robert Milkowski

2006-Apr-01 00:41 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Hello Eric,

Saturday, April 1, 2006, 2:11:09 AM, you wrote:

ES> On Sat, Apr 01, 2006 at 01:59:38AM +0200, Robert Milkowski
wrote:>> 
>> I don''t know - ZFS was mainly targeted for large systems (I
mean in
>> those systems you will see big difference with ZFS) and for example
>> here we add quite a lot of storage on regular basis (I won''t
make just
>> one large pool, rather many small pools) and creating globa host
>> spares at the beginning would be welcomed improvements - the same way
>> we have it on HW arrays.
ES> Why won''t you make just one large pool, rather than many small
pools?
ES> The only reason not to do so is:

ES> a. Different performance characteristics
ES>         or
ES> b. Different fault tolerance characteristics

ES> I can see a server with just two or three pools (one for the root disk,
ES> one for customer data, etc), but I don''t see why you would
create lots
ES> of new pools on a regular basis.  Can you explain your use case and
ES> reasons in a little more detail?  "Because we can do it on product
X"
ES> doesn''t really help, especially when a HW array is so
fundamentally
ES> different from a ZFS storage pool.

Answers to a) and b) are no and no.

In our case in one soultion we''re thnking to put zfs on we''ve
got
let''s say 8x 3511 JBODs connected to two hosts in a cluster. Right now
we have additional head unit (with HW controllers) and we''re doing
raid-5 group on every enclosure using 11 disks and leaving last disk
as a global hot spare. With zfs I was thinking of doing something
similar - raidz for every JBOD (so in this case I will endup with 8
pools and 8 hot spares).

Now I can make just one large raidz pool (+ some hot spares) but it
could be risky. So I can make one large pool which is actually a
"concatenation/stripe" of many raidz groups - in an essence it could
be a stripe/concatenation of raidz groups where each raidz group is
build from 11 disks from one enclosure. That way availibility is
better then having one large raidz pool and probably performance is
better as Bill pointed out (however I don''t understand why). In that
configuration I would endup with ~40TB logical data pool.

Now what happens if two disks in one raidz group fail? I will loose
whole 40TB of data.

What happens if there''s a problem with one disk (very long IOs but
it''s still working - it happens) with entire pool? Instead of heaving
problem with one smaller pool now I''ve got a performance problem with
entire 40TB pool.

Now if I want to serve some data from the other cluster node I can
just switch some pools to the other node - something I can''t do with
one pool.

ES> Supposing we were to adopt the idea of "global spares", where
would this
ES> information be stored?  What would the zpool(1M) interface look like?
ES> Could I still do per-pool spares?  What would happen when I exported and
ES> imported a pool?  If a spare is swapped in permanently (an asynchronous
ES> event in the kernel), does it then remove it from the global list of
ES> spares for subsequent pools?  I''m still having trouble
envisioning the
ES> details of how this would actually work...

Maybe just another pool with hot spares? Then be default all new pools
would have an variable use_global_hotspares set to on?

Something like:

          zpool create global_hotspares hotspare c1t0d0 c2t0d0 c3t0d0

if you don''t want to use global_hotspares in a given pool you could do

   zfs set use_global_hs=off pool

Now if a pool (normal pool) is exported and then imported it just
looks for a pool with either a specific ID, name or any other tag
which would mean it''s a pool with global hotspares (only if
use_global_hs is set to on for the pool being imported). If no such
pool is available it can only use local hotspares directly attached to
it (if there are any). Now if you import a pool with global hotspares
all actually active pools (or later imported) which have use_global_hs
set to on will automatically use it.

??
>> btw: I guess hot spares in ZFS won''t make it into U2...?
ES> Yes, that is correct.

Thanks for info.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Eric Schrock

2006-Apr-01 01:15 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Sat, Apr 01, 2006 at 02:41:39AM +0200, Robert Milkowski
wrote:> 
> Now I can make just one large raidz pool (+ some hot spares) but it
> could be risky. So I can make one large pool which is actually a
> "concatenation/stripe" of many raidz groups - in an essence it
could
> be a stripe/concatenation of raidz groups where each raidz group is
> build from 11 disks from one enclosure. That way availibility is
> better then having one large raidz pool and probably performance is
> better as Bill pointed out (however I don''t understand why). In
that
> configuration I would endup with ~40TB logical data pool.
> 
> Now what happens if two disks in one raidz group fail? I will loose
> whole 40TB of data.
This won''t be the case with metadata replication, which should be
coming
soon.  You will only lose the plain file contents of the objects
contained within that toplevel vdev.  

Of course, if you''re measuring "time to restore from backup",
then it
doesn''t matter if we survive the failure, since you''ll still
have to
restore all your data from backup.  Although I could imagine some
creative ways of using zfs send/receive to make this faster.
> What happens if there''s a problem with one disk (very long IOs but
> it''s still working - it happens) with entire pool? Instead of
heaving
> problem with one smaller pool now I''ve got a performance problem
with
> entire 40TB pool.
This should be handled by the ZFS I/O scheduler automatically.  We have 
some work to do in this area, but I wouldn''t design a feature around
lack of current performance.
> Now if I want to serve some data from the other cluster node I can
> just switch some pools to the other node - something I can''t do
with
> one pool.
Yes, this is definitely true.
> Maybe just another pool with hot spares? Then be default all new pools
> would have an variable use_global_hotspares set to on?
> 
> Something like:
> 
>           zpool create global_hotspares hotspare c1t0d0 c2t0d0 c3t0d0
> 
> if you don''t want to use global_hotspares in a given pool you
could do
> 
>    zfs set use_global_hs=off pool
> 
> Now if a pool (normal pool) is exported and then imported it just
> looks for a pool with either a specific ID, name or any other tag
> which would mean it''s a pool with global hotspares (only if
> use_global_hs is set to on for the pool being imported). If no such
> pool is available it can only use local hotspares directly attached to
> it (if there are any). Now if you import a pool with global hotspares
> all actually active pools (or later imported) which have use_global_hs
> set to on will automatically use it.
OK, so this is just a "magic pool" that behaves differently?  This
starts to get very nasty very quickly.  The name "global_hostspares"
is
reserved, and all of a sudden all the operations I can do it are
different.  You can only add individual disks, you can''t remove certain
disks, the output of "zpool status" has to be different, importing a
hot
spare pool has to be handled specially, renames (when supported) will
have to be handled carfeully,  I can''t create ZFS filesystems
in it, and the edge conditions continue...

Based on my observations, it seems to me that:

1. This introduces an order of magnitude more edge conditions that alter
   normal interaction with the system.
2. It requires work (particularly "zpool set") that we
haven''t yet done.
3. It does not replace the need for per-pool spares.
4. It is not the common use case.
5. The behavior can be replicated with a small amount of manual work
   given the current proposal.

We can implement this as a future RFE, but right now we should implement
the straightforward solution, and deal with the complexities of this
proposal at a later date.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Erik Trimble

2006-Apr-01 01:17 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

The main reason to have different ZFS pools is to implement redundancy
ACROSS JBOD enclosures. 

I''m assuming that you can''t add new disks to a udev unit
afterwards -
you can only add new udevs to a pool.  Or is this incorrect?
 



In Robert''s case, the best thing to do is this (assuming he wants
maximum disk space usage, while still retaining some redundancy):

(for simplicity''s sake, I''m showing a 3-array (3 drives/array)
config)

zpool create tank raidz c0t0d0s2 c1t0d0s2 c2t0d0s2 raidz c0t1d0s2
c1t1d0s2 c2t2d0s2 raidz c0t2t0s2 c1t2d0s2 c2t2d0s2


That is, create a stripe of RAID-Z undevs.


This insulates you against the loss of any one JBOD.  


You can then add the remaining disks as HotSpare to the pool.


(of course, using the 3511s, you probably would be best off creating
each RAID-5 subarray using the HW controller, then simply striping them
using ZFS).


-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

Darren Reed

2006-Apr-01 01:23 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Jeff Bonwick wrote:
>>>       spares
>>>         c1t0d0      SPARED    currently in use
>>>         c1t1d0      ONLINE
>>>      
>>>
>
>  
>
>>To me the output here is a little confusing.  Shouldn''t the
status
>>of c0t0d0 in mirror''s spare output say something other than
"ONLINE"?
>>Perhaps also that for c1t0d0?
>>    
>>
>
>I agree.  I''d expect ONLINE to mean in use, and OFFLINE to mean
>not in use (and thus available).  But that''s still somewhat
indirect.
>
>How about TAKEN and AVAILABLE?
>  
>
I forgot to mention, I think that the "ONLINE" status of the disk
being
spared-out should be something different.

I think this is what is meant by the "35.5k resilvering"?

To me this is the only obscure part of the output.
I''d rather see something like:


        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            spare     ONLINE       0     0     0
              c0t0d0  RESYNC       0     0     0  35.5K
              c1t0d0  RESYNC       0     0     0  35.5K
            c0t1d0    ONLINE       0     0     0
        spares
          c1t0d0      TAKEN     currently in use
          c1t1d0      AVAILABLE  


I''m tempted to suggest that "RESYNC" should be different for
the incoming
disk and the outgoing disk, maybe:


        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            spare     ONLINE       0     0     0
              c0t0d0  OUTSYNC      0     0     0  35.5K
              c1t0d0  INSYNC       0     0     0  35.5K
            c0t1d0    ONLINE       0     0     0
        spares
          c1t0d0      TAKEN     currently in use
          c1t1d0      AVAILABLE  


The idea is that the "spares" section under "mirror" is now
self
explanatory.
I''m not too enamoured by "OUTSYNC" or "INSYNC" as
useful words here but
hopefully they should convey the idea.  "SYNCUP" and
"SYNCDOWN" are some
other altnatives I can think of right now.

Darren

Eric Schrock

2006-Apr-01 02:32 UTC

head link

[zfs-discuss] Re: Proposal: ZFS Hot Spare support

On Fri, Mar 31, 2006 at 05:17:25PM -0800, Erik Trimble
wrote:> The main reason to have different ZFS pools is to implement redundancy
> ACROSS JBOD enclosures. 
I''m a little confused.  To implement redundancy across anything,
doesn''t
that mean they have to be in the same pool?  How to I get redundancy
across multiple pools?
> zpool create tank raidz c0t0d0s2 c1t0d0s2 c2t0d0s2 raidz c0t1d0s2
> c1t1d0s2 c2t2d0s2 raidz c0t2t0s2 c1t2d0s2 c2t2d0s2
But isn''t this just one pool?
> (of course, using the 3511s, you probably would be best off creating
> each RAID-5 subarray using the HW controller, then simply striping them
> using ZFS).
It depends.  If you want better performance, this might be true (though
benchmarks would be in order).  If you want better fault tolerance, then
its better to expose them as JBODs and have ZFS deal with them.  Then
you get the self-healing capabilities of ZFS that you simply cannot get
from a hardware RAID solution.  For sure, you would want to RAID the
subarrays, or else you''re putting all your reliability entirely within
the hands of the hardware...

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Eric Schrock

2006-Apr-01 02:36 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Fri, Mar 31, 2006 at 05:23:28PM -0800, Darren Reed
wrote:> 
> I forgot to mention, I think that the "ONLINE" status of the disk
being
> spared-out should be something different.
> 
Well, the example I gave is pretyt contrived.  Under normal
circumstances, the device you''re sparing out is faulted.  It''s
really
important that we show the actual state of that device, not just some
faked-up value.  For example, the following all imply very different
capabilities of the pool:

	spare		DEGRADED
	  diskA		ONLINE
	  diskB		ONLINE

	spare		DEGRADED
	  diskA		FAULTED
	  diskB		ONLINE

	spare		DEGRADED
	  diskA		DEGRADED
	  diskB		ONLINE

Note that this is the same as with replacing.  If you go to replace a
online device, we don''t go change its state.  We kicked around the idea
of trying to fake up something to visually represent which was being
replaced, but changing the ''state'' definitely didn''t
work for the above
reasons.  Event though a device is being replaced and/or spared, it has
a state that is distinct from its current role in the vdev tree.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Joe Little

2006-Apr-01 03:39 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

In our case, we are predominantly using iscsi with multiple raw LUNs
being exposed for RAIDZ. Each backend unit has mostly uniform disk
sizes, but disks sizes differ between units as disks are purchased
over time and generally target to maximize storage space. Thus, we
will likely be seeing a large heterogeneous disk farm that according
to what ZFS best practices, state, should be in separate, uniform
raidz zpools. So, a spare pool that may fit multiple zpools can come
in handy there.


On 3/31/06, Eric Schrock <eric.schrock at sun.com>
wrote:> On Sat, Apr 01, 2006 at 01:59:38AM +0200, Robert Milkowski wrote:
> >
> > I don''t know - ZFS was mainly targeted for large systems (I
mean in
> > those systems you will see big difference with ZFS) and for example
> > here we add quite a lot of storage on regular basis (I won''t
make just
> > one large pool, rather many small pools) and creating globa host
> > spares at the beginning would be welcomed improvements - the same way
> > we have it on HW arrays.
>
> Why won''t you make just one large pool, rather than many small
pools?
> The only reason not to do so is:
>
> a. Different performance characteristics
>         or
> b. Different fault tolerance characteristics
>
> I can see a server with just two or three pools (one for the root disk,
> one for customer data, etc), but I don''t see why you would create
lots
> of new pools on a regular basis.  Can you explain your use case and
> reasons in a little more detail?  "Because we can do it on product
X"
> doesn''t really help, especially when a HW array is so
fundamentally
> different from a ZFS storage pool.
>
> Supposing we were to adopt the idea of "global spares", where
would this
> information be stored?  What would the zpool(1M) interface look like?
> Could I still do per-pool spares?  What would happen when I exported and
> imported a pool?  If a spare is swapped in permanently (an asynchronous
> event in the kernel), does it then remove it from the global list of
> spares for subsequent pools?  I''m still having trouble envisioning
the
> details of how this would actually work...
>
> > btw: I guess hot spares in ZFS won''t make it into U2...?
>
> Yes, that is correct.
>
> - Eric
>
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Jeff Bonwick

2006-Apr-01 08:05 UTC

head link

[zfs-discuss] Re: Proposal: ZFS Hot Spare support

> It depends.  If you want better performance, this might be true (though
> benchmarks would be in order).  If you want better fault tolerance, then
> its better to expose them as JBODs and have ZFS deal with them.  Then
> you get the self-healing capabilities of ZFS that you simply cannot get
> from a hardware RAID solution.
Another option is to get the best of both worlds by letting the
arrays do RAID-5, and then mirroring or RAID-Z-ing the arrays.

A RAID-Z group of RAID-5 arrays can tolerate at least three
whole-disk failures before losing data.  It can also tolerate
the failure of an entire array *plus* one whole-disk failure
on each of the remaining arrays).  Using RAID-Z (or mirroring)
means that you get self-healing data: if an array returns bad data,
ZFS will detect it and reconstruct good data from the other arrays.

Jeff

Jeff Bonwick

2006-Apr-01 08:35 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

> I don''t know - ZFS was mainly targeted for large systems
Actually, our goal is to run the gamut.  I want ZFS not just on
large disk farms, but also on my laptop.  Eventually I''d also
like to get ZFS onto iPods and Compact Flash cards, so that a
power outage doesn''t mean losing your music or your pictures.

Jeff

Bill Sommerfeld

2006-Apr-02 02:25 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Thu, 2006-03-30 at 12:03, Eric Schrock wrote:> C. AUTOMATED REPLACEMENT
> 
> In order to perform automated replacement, a ZFS FMA agent will be added
> that subscribes to ''fault.zfs.vdev.*'' faults.  When a
fault is received,
> the agent will examine the pool to see if it has any available hot
> spares.  If so, it will perform a ''zpool replace'' with an
available
> spare.  
I''ve seen automated replacement go bad...

For a while we had an E420R and its connected A5100 JBOD on a UPS.
The UPS battery went bad.  We discovered this the hard way when a series
of brownouts caused the UPS to reach into the battery and find nothing
there..  

The E420R sailed right through as if nothing had happened (who knows --
maybe proportionally bigger capacitors in the power supply?), but 
the A5100 really didn''t like this.  I believe all the drives took a
little while to reset and spin back up. 

In the mean time, SVM concluded that a bunch of drives in the array had
gone bad, and decided to replace as many as it had hot spares.  Once the
array came all the way back on line, mirroring to the replacements
started..

In reality, all the drives were fine; it just took the better part of a
day to unwind all the premature replacements.

Not quite sure what heuristics you''d use to avoid this sort of thing,
though....

					- Bill

Eric Schrock

2006-Apr-02 02:59 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Sat, Apr 01, 2006 at 09:25:55PM -0500, Bill Sommerfeld
wrote:> 
> I''ve seen automated replacement go bad...
> 
Well, this is certainly what would happen with the current bits.  The
good news is that this is all done through FMA by subscribing to the
fault.fs.zfs.vdev.* fault.  In the future, as we make the diagnosis
engine smarter, this hotplug support will be able to automatically
leverage whatever we come up with.

I don''t know what the "right answer" is in the case you
described, but
we''ll certainly be gathering lots of data (via FMA error/fault logs) as
well as hooking into SMART and the rest of the I/O subsystem to make
more intelligent diagnosis in the future.  I''ve got some stuff scoped
out for the next phase (SERD on I/O and checksum errors) as well as the
next advancements beyond that (processing SMART data and subscribing to
hotplug events).  Expect to see more info soon.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Gregory Shaw

2006-Apr-02 03:33 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Veritas VM has a flag for this.  If you set this on the disk volumes, it
won''t try to use them as reallocation targets.

I found this the hard way.  We were mirroring on 9176 (early precursor
to the D178) between datacenters and between two arrays.  When one array
went away in a power failure, it ''mirrored'' everything to the
same
array.

Performance on the box went away in a hurry.  

When mirroring, we''ll need to have a similar flag, as it gets even more
interesting when you''ve got high-performance disk (database storage)
and
low-performance disk (used for DB exports).  Mix up the mirroring there,
and things will get ugly.  I would assume that you''d put different
tiers
of storage into different pools to reduce the chance of this happening,
but it''s still a possibility.

On Sat, 2006-04-01 at 21:25 -0500, Bill Sommerfeld
wrote:> On Thu, 2006-03-30 at 12:03, Eric Schrock wrote:
> > C. AUTOMATED REPLACEMENT
> > 
> > In order to perform automated replacement, a ZFS FMA agent will be
added
> > that subscribes to ''fault.zfs.vdev.*'' faults.  When
a fault is received,
> > the agent will examine the pool to see if it has any available hot
> > spares.  If so, it will perform a ''zpool replace''
with an available
> > spare.  
> 
> I''ve seen automated replacement go bad...
> 
> For a while we had an E420R and its connected A5100 JBOD on a UPS.
> The UPS battery went bad.  We discovered this the hard way when a series
> of brownouts caused the UPS to reach into the battery and find nothing
> there..  
> 
> The E420R sailed right through as if nothing had happened (who knows --
> maybe proportionally bigger capacitors in the power supply?), but 
> the A5100 really didn''t like this.  I believe all the drives took
a
> little while to reset and spin back up. 
> 
> In the mean time, SVM concluded that a bunch of drives in the array had
> gone bad, and decided to replace as many as it had hot spares.  Once the
> array came all the way back on line, mirroring to the replacements
> started..
> 
> In reality, all the drives were fine; it just took the better part of a
> day to unwind all the premature replacements.
> 
> Not quite sure what heuristics you''d use to avoid this sort of
thing,
> though....
> 
> 					- Bill
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Darren Reed

2006-Apr-03 07:06 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Jeff Bonwick wrote:
>>I don''t know - ZFS was mainly targeted for large systems
>>    
>>
>
>Actually, our goal is to run the gamut.  I want ZFS not just on
>large disk farms, but also on my laptop.  Eventually I''d also
>like to get ZFS onto iPods and Compact Flash cards, so that a
>power outage doesn''t mean losing your music or your pictures.
>  
>
And where power outage includes spontaneous popping out
of said devices from their "holder" too :)  I can''t remember
how many Amiga floppies I burnt because the weren''t always
consistent on disk.

Darren

Darren Reed

2006-Apr-03 07:33 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Eric Schrock wrote:
>On Fri, Mar 31, 2006 at 05:23:28PM -0800, Darren Reed wrote:
>  
>
>>I forgot to mention, I think that the "ONLINE" status of the
disk being
>>spared-out should be something different.
>>
>>    
>>
>
>Well, the example I gave is pretyt contrived.  Under normal
>circumstances, the device you''re sparing out is faulted. 
It''s really
>important that we show the actual state of that device, not just some
>faked-up value.  For example, the following all imply very different
>capabilities of the pool:
>
>	spare		DEGRADED
>	  diskA		ONLINE
>	  diskB		ONLINE
>
>	spare		DEGRADED
>	  diskA		FAULTED
>	  diskB		ONLINE
>
>	spare		DEGRADED
>	  diskA		DEGRADED
>	  diskB		ONLINE
>
>Note that this is the same as with replacing.
>
Looking at those three, the "DEGRADED" for the first spare
set seems like a bug to me.  My assumption is that:

ONLINE(spare) = ONLINE(diskA) + ONLINE(diskB)

and I think this is the intuitive way to read the above output.
If that isn''t the story then something needs to not say
"ONLINE".
>If you go to replace a
>online device, we don''t go change its state.  We kicked around the
idea
>of trying to fake up something to visually represent which was being
>replaced, but changing the ''state'' definitely
didn''t work for the above
>reasons.  Event though a device is being replaced and/or spared, it has
>a state that is distinct from its current role in the vdev tree.
>  
>
It would be very worthwhile if something could be faked, visually,
to represent what is going on inside, if only to avoid the first case
of output (above) which seems non-sensical.

Darren

Robert Milkowski

2006-Apr-03 09:40 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Hello Eric,

Saturday, April 1, 2006, 3:15:10 AM, you wrote:

ES> On Sat, Apr 01, 2006 at 02:41:39AM +0200, Robert Milkowski
wrote:>> 
>> Now I can make just one large raidz pool (+ some hot spares) but it
>> could be risky. So I can make one large pool which is actually a
>> "concatenation/stripe" of many raidz groups - in an essence
it could
>> be a stripe/concatenation of raidz groups where each raidz group is
>> build from 11 disks from one enclosure. That way availibility is
>> better then having one large raidz pool and probably performance is
>> better as Bill pointed out (however I don''t understand why).
In that
>> configuration I would endup with ~40TB logical data pool.
>> 
>> Now what happens if two disks in one raidz group fail? I will loose
>> whole 40TB of data.
ES> This won''t be the case with metadata replication, which should
be coming
ES> soon.  You will only lose the plain file contents of the objects
ES> contained within that toplevel vdev.  

Yeah, that would be better. But still there''s a problem how to correct
that situation - as you mentioned you will be probably forced to
restore whole 40TB of datam instead of 5TB.

>> What happens if there''s a problem with one disk (very long IOs
but
>> it''s still working - it happens) with entire pool? Instead of
heaving
>> problem with one smaller pool now I''ve got a performance
problem with
>> entire 40TB pool.
ES> This should be handled by the ZFS I/O scheduler automatically.  We have
ES> some work to do in this area, but I wouldn''t design a feature
around
ES> lack of current performance.

That''s good to hear.

ES> Based on my observations, it seems to me that:

ES> 1. This introduces an order of magnitude more edge conditions that alter
ES>    normal interaction with the system.
ES> 2. It requires work (particularly "zpool set") that we
haven''t yet done.
ES> 3. It does not replace the need for per-pool spares.
ES> 4. It is not the common use case.

I can''t agree with #4.
IMHO in most raid enviroments, especially with a lot of disks, you
just create some global hot spares and don''t think about it later when
adding new disks, etc.

ES> We can implement this as a future RFE, but right now we should implement
ES> the straightforward solution, and deal with the complexities of this
ES> proposal at a later date.

That''s reasonable.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Apr-03 09:49 UTC

head link

[zfs-discuss] Re: Proposal: ZFS Hot Spare support

Hello Jeff,

Saturday, April 1, 2006, 10:05:29 AM, you wrote:
>> It depends.  If you want better performance, this might be true (though
>> benchmarks would be in order).  If you want better fault tolerance,
then
>> its better to expose them as JBODs and have ZFS deal with them.  Then
>> you get the self-healing capabilities of ZFS that you simply cannot get
>> from a hardware RAID solution.
JB> Another option is to get the best of both worlds by letting the
JB> arrays do RAID-5, and then mirroring or RAID-Z-ing the arrays.

JB> A RAID-Z group of RAID-5 arrays can tolerate at least three
JB> whole-disk failures before losing data.  It can also tolerate
JB> the failure of an entire array *plus* one whole-disk failure
JB> on each of the remaining arrays).  Using RAID-Z (or mirroring)
JB> means that you get self-healing data: if an array returns bad data,
JB> ZFS will detect it and reconstruct good data from the other arrays.

I haven''t considered this one - sounds interesting.
However less storage will ba available but still this could be
interesting.

Thanks.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2006-Apr-03 18:24 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

Hello Eric,

Saturday, April 1, 2006, 3:15:10 AM, you wrote:

ES> This won''t be the case with metadata replication, which should
be coming
ES> soon.  You will only lose the plain file contents of the objects
ES> contained within that toplevel vdev.  

It just occured to me - if there would be a zfs command to get a list
of "broken" (data missing) files due to failure of some disks then
with such a list one could restore only bad files and not a whole pool
(assuming that you can overwrite these files).

Most backup software lets you restore only files listed in a file.

What do you think?


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Eric Schrock

2006-Apr-03 18:32 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Mon, Apr 03, 2006 at 08:24:45PM +0200, Robert Milkowski
wrote:> 
> It just occured to me - if there would be a zfs command to get a list
> of "broken" (data missing) files due to failure of some disks
then
> with such a list one could restore only bad files and not a whole pool
> (assuming that you can overwrite these files).
> 
> Most backup software lets you restore only files listed in a file.
> 
> What do you think?
Starting in build 36, we get 50% of the way there.  If you do a scrub of
a pool, and then run ''zpool status -v'', you''ll get a
detailed list of
all the unrecoverable (logical) blocks in the pool found during the
scrub.  The problem is that they are currently only identified by
dataset name and object number - not exactly conducive to repair
procedures.  There is a future RFE to translate the object number to a
filename (when available), but it''s non-trivial when the filesystem is
currently mounted.  We can''t grok around the internal DMU state without
going through the "front door" of the ZPL.  Matt or Mark may be able
to
shed some light on how much investigation they''ve done in this area, if
any.

The result, of course, would be _very_ cool.  With background scrubbing
(also coming in the future), you will always have an up-to-date list of
damaged data in your pool, or hopefully lack thereof :-)

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Peter Tribble

2006-Apr-03 19:10 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Sat, 2006-04-01 at 01:11, Eric Schrock wrote:> Why won''t you make just one large pool, rather than many small
pools?
> The only reason not to do so is:
> 
> a. Different performance characteristics
> 	or
> b. Different fault tolerance characteristics
Or:

c. Different administrative boundaries.

By which I mean that pools are the unit that is imported and exported.

If different projects (groups - possibly with separate funding) buy
storage,
I would expect to align the pools with what they purchased. That way I
can
split the storage up later without breaking up the data.

Or I allocate storage off a SAN. In that case I would want to import and
export
pools to move data around on the SAN - ie. between machines. Say a
machine becomes
busy, I would want to be able to export a pool and import it on another
machine
attached to the SAN and run the service there.

I''m not sure what the model for global spares is here. I can see that
for a
spare local to a pool then when I export the pool I lose the spare (the
spare
is physically associated with the pool and should remain so).

-- 
-Peter Tribble
L.I.S., University of Hertfordshire - http://www.herts.ac.uk/
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Eric Schrock

2006-Apr-03 19:34 UTC

head link

[zfs-discuss] Proposal: ZFS Hot Spare support

On Mon, Apr 03, 2006 at 08:10:01PM +0100, Peter Tribble
wrote:> 
> Or:
> 
> c. Different administrative boundaries.
> 
> By which I mean that pools are the unit that is imported and exported.
Yep.  This is the use case that Robert pointed that I had failed to
consider.
> If different projects (groups - possibly with separate funding) buy
> storage, I would expect to align the pools with what they purchased.
> That way I can split the storage up later without breaking up the
> data.
> 
> Or I allocate storage off a SAN. In that case I would want to import
> and export pools to move data around on the SAN - ie. between
> machines. Say a machine becomes busy, I would want to be able to
> export a pool and import it on another
> machine attached to the SAN and run the service there.
> 
> I''m not sure what the model for global spares is here. I can see
that
> for a spare local to a pool then when I export the pool I lose the
> spare (the spare is physically associated with the pool and should
> remain so).
Yep. Global spares are likely per-system, rather than per-pool.  For
example, exporting a pool will not touch any globally configured hot
spares.  As a result of Robert''s suggestion, we''ll be
examining how to
expose this in an adminsitratively meaningful way in the future.  As
usual, the difficultly is all about the admnistrative interface.  The
actual FMA agent that goes off and does the replacement is trivial, and
can get the suggested replacement from anywhere.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Sanjay G. Nadkarni

2006-Apr-10 22:13 UTC

head link

[zfs-discuss] Re: Re[2]: Proposal: ZFS Hot Spare support

At one point there was talk of implementing "hot space" rather than
hotspares.  Is this a precursor to that step ? Or is hot space a different
notion ?

-Sanjay
 
 
This message posted from opensolaris.org

Bill Moore

2006-Apr-11 19:39 UTC

head link

[zfs-discuss] Re: Re[2]: Proposal: ZFS Hot Spare support

On Mon, Apr 10, 2006 at 03:13:17PM -0700, Sanjay G. Nadkarni
wrote:> At one point there was talk of implementing "hot space" rather
than
> hotspares.  Is this a precursor to that step ? Or is hot space a
> different notion ?
They serve similar purposes, but are not 100% replacements for each
other.  We will still be working on hot space, but it will not be
a short-term project.


--Bill

Maybe Matching Threads

Search for more maybe matching threads

zfs discuss - Mar 2006 - Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Re: Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Re: Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Re: Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Proposal: ZFS Hot Spare support

[zfs-discuss] Re: Re[2]: Proposal: ZFS Hot Spare support

[zfs-discuss] Re: Re[2]: Proposal: ZFS Hot Spare support

Maybe Matching Threads