thr3ads.net - zfs discuss - [zfs-discuss] Replacing faulty disk in ZFS pool [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Andreas Höschler

2009-Aug-06 19:17 UTC

[zfs-discuss] Replacing faulty disk in ZFS pool

Dear managers,

one of our servers (X4240) shows a faulty disk:

------------------------------------------------------------------------
-bash-3.00# zpool status
? pool: rpool
?state: ONLINE
?scrub: none requested
config:

? ? ? ? NAME ? ? ? ? ?STATE ? ? READ WRITE CKSUM
? ? ? ? rpool ? ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? mirror ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ? c1t0d0s0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ? c1t1d0s0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0

errors: No known data errors

? pool: tank
?state: DEGRADED
status: One or more devices are faulted in response to persistent
errors.
? ? ? ? Sufficient replicas exist for the pool to continue functioning
in a
? ? ? ? degraded state.
action: Replace the faulted device, or use ''zpool clear'' to
mark the
device
? ? ? ? repaired.
?scrub: none requested
config:

? ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM
? ? ? ? tank ? ? ? ?DEGRADED ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? mirror ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ? c1t2d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ? c1t3d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? mirror ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ? c1t5d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ? c1t4d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? mirror ? ?DEGRADED ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ? c1t6d0 ?FAULTED ? ? ?0 ? ?19 ? ? 0 ?too many errors
? ? ? ? ? ? c1t7d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0

errors: No known data errors
------------------------------------------------------------------------
I derived the following possible approaches to solve the problem:

1) A way to reestablish redundancy would be to use the command

? ? ? ?zpool attach tank c1t7d0 c1t15d0

to add c1t15d0 to the virtual device "c1t6d0 + c1t7d0". We still would
have the faulty disk in the virtual device.

We could then dettach the faulty disk with the command

? ? ? ?zpool dettach tank c1t6d0

2) Another approach would be to add a spare disk to tank

? ? ? ?zpool add tank spare c1t15d0

and the replace to replace the faulty disk.

? ? ? ?zpool replace tank c1t6d0 c1t15d0

In theory that is easy, but since I have never done that and since this
is a productive server I would appreciate if somone with more
experience would look on my agenda before I issue these commands.

What is the difference between the two approaches? Which one do you
recommend? And is that really all that has to be done or am I missing a
bit? I mean can c1t6d0 be physically replaced after issuing "zpool
dettach tank c1t6d0" or "zpool replace tank c1t6d0 c1t15d0"? I
also
found the command

? ? ? ?zpool offline tank ?...

but am not sure whether this should be used in my case. Hints are
greatly appreciated!

Thanks a lot,

? Andreas

Cindy.Swearingen at Sun.COM

2009-Aug-06 20:04 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

Hi Andreas,

Good job for using a mirrored configuration. :-)

Your various approaches would work.

My only comment about #2 is that it might take some time for the spare
to kick in for the faulted disk.

Both 1 and 2 would take a bit more time than just replacing the faulted
disk with a spare disk, like this:

# zpool replace tank c1t6d0 c1t15d0

Then you could physically replace c1t6d0 and add it back to the pool as
a spare, like this:

# zpool add tank spare c1t6d0

For a production system, the steps above might be the most efficient.
Get the faulted disk replaced with a known good disk so the pool is
no longer degraded, then physically replace the bad disk when you have
the time and add it back to the pool as a spare.

It is also good practice to run a zpool scrub to ensure the
replacement is operational and use zpool clear to clear the previous
errors on the pool. If the system is used heavily, then you might want 
to run the zpool scrub when system use is reduced.

If you were going to physically replace c1t6d0 while it was still
attached to the pool, then you might offline it first.

Cindy

On 08/06/09 13:17, Andreas H?schler wrote:> Dear managers,
> 
> one of our servers (X4240) shows a faulty disk:
> 
> ------------------------------------------------------------------------
> -bash-3.00# zpool status
>   pool: rpool
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME          STATE     READ WRITE CKSUM
>         rpool         ONLINE       0     0     0
>           mirror      ONLINE       0     0     0
>             c1t0d0s0  ONLINE       0     0     0
>             c1t1d0s0  ONLINE       0     0     0
> 
> errors: No known data errors
> 
>   pool: tank
>  state: DEGRADED
> status: One or more devices are faulted in response to persistent
> errors.
>         Sufficient replicas exist for the pool to continue functioning
> in a
>         degraded state.
> action: Replace the faulted device, or use ''zpool clear''
to mark the
> device
>         repaired.
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        DEGRADED     0     0     0
>           mirror    ONLINE       0     0     0
>             c1t2d0  ONLINE       0     0     0
>             c1t3d0  ONLINE       0     0     0
>           mirror    ONLINE       0     0     0
>             c1t5d0  ONLINE       0     0     0
>             c1t4d0  ONLINE       0     0     0
>           mirror    DEGRADED     0     0     0
>             c1t6d0  FAULTED      0    19     0  too many errors
>             c1t7d0  ONLINE       0     0     0
> 
> errors: No known data errors
> ------------------------------------------------------------------------
> I derived the following possible approaches to solve the problem:
> 
> 1) A way to reestablish redundancy would be to use the command
> 
>        zpool attach tank c1t7d0 c1t15d0
> 
> to add c1t15d0 to the virtual device "c1t6d0 + c1t7d0". We still
would
> have the faulty disk in the virtual device.
> 
> We could then dettach the faulty disk with the command
> 
>        zpool dettach tank c1t6d0
> 
> 2) Another approach would be to add a spare disk to tank
> 
>        zpool add tank spare c1t15d0
> 
> and the replace to replace the faulty disk.
> 
>        zpool replace tank c1t6d0 c1t15d0
> 
> In theory that is easy, but since I have never done that and since this
> is a productive server I would appreciate if somone with more
> experience would look on my agenda before I issue these commands.
> 
> What is the difference between the two approaches? Which one do you
> recommend? And is that really all that has to be done or am I missing a
> bit? I mean can c1t6d0 be physically replaced after issuing "zpool
> dettach tank c1t6d0" or "zpool replace tank c1t6d0 c1t15d0"?
I also
> found the command
> 
>        zpool offline tank  ...
> 
> but am not sure whether this should be used in my case. Hints are
> greatly appreciated!
> 
> Thanks a lot,
> 
>   Andreas
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discus
> s

Don Turnbull

2009-Aug-06 20:13 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

I believe there are a couple of ways that work.  The commands I''ve 
always used are to attach the new disk as a spare (if not already) and 
then replace the failed disk with the spare.  I don''t know if there are
advantages or disavantages but I also have never had a problem doing it 
this way.

Andreas H?schler wrote:> Dear managers,
>
> one of our servers (X4240) shows a faulty disk:
>
> ------------------------------------------------------------------------
> -bash-3.00# zpool status
>   pool: rpool
>  state: ONLINE
>  scrub: none requested
> config:
>
>         NAME          STATE     READ WRITE CKSUM
>         rpool         ONLINE       0     0     0
>           mirror      ONLINE       0     0     0
>             c1t0d0s0  ONLINE       0     0     0
>             c1t1d0s0  ONLINE       0     0     0
>
> errors: No known data errors
>
>   pool: tank
>  state: DEGRADED
> status: One or more devices are faulted in response to persistent
> errors.
>         Sufficient replicas exist for the pool to continue functioning
> in a
>         degraded state.
> action: Replace the faulted device, or use ''zpool clear''
to mark the
> device
>         repaired.
>  scrub: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         tank        DEGRADED     0     0     0
>           mirror    ONLINE       0     0     0
>             c1t2d0  ONLINE       0     0     0
>             c1t3d0  ONLINE       0     0     0
>           mirror    ONLINE       0     0     0
>             c1t5d0  ONLINE       0     0     0
>             c1t4d0  ONLINE       0     0     0
>           mirror    DEGRADED     0     0     0
>             c1t6d0  FAULTED      0    19     0  too many errors
>             c1t7d0  ONLINE       0     0     0
>
> errors: No known data errors
> ------------------------------------------------------------------------
> I derived the following possible approaches to solve the problem:
>
> 1) A way to reestablish redundancy would be to use the command
>
>        zpool attach tank c1t7d0 c1t15d0
>
> to add c1t15d0 to the virtual device "c1t6d0 + c1t7d0". We still
would
> have the faulty disk in the virtual device.
>
> We could then dettach the faulty disk with the command
>
>        zpool dettach tank c1t6d0
>
> 2) Another approach would be to add a spare disk to tank
>
>        zpool add tank spare c1t15d0
>
> and the replace to replace the faulty disk.
>
>        zpool replace tank c1t6d0 c1t15d0
>
> In theory that is easy, but since I have never done that and since this
> is a productive server I would appreciate if somone with more
> experience would look on my agenda before I issue these commands.
>
> What is the difference between the two approaches? Which one do you
> recommend? And is that really all that has to be done or am I missing a
> bit? I mean can c1t6d0 be physically replaced after issuing "zpool
> dettach tank c1t6d0" or "zpool replace tank c1t6d0 c1t15d0"?
I also
> found the command
>
>        zpool offline tank  ...
>
> but am not sure whether this should be used in my case. Hints are
> greatly appreciated!
>
> Thanks a lot,
>
>   Andreas
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>

Andreas Höschler

2009-Aug-06 20:18 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

Hi Cindy,

> Good job for using a mirrored configuration. :-)
Thanks!
> Your various approaches would work.
>
> My only comment about #2 is that it might take some time for the spare
> to kick in for the faulted disk.
>
> Both 1 and 2 would take a bit more time than just replacing the faulted
> disk with a spare disk, like this:
>
> # zpool replace tank c1t6d0 c1t15d0
You mean I can execute

	zpool replace tank c1t6d0 c1t15d0

without having made c1t15d0 a spare disk first with

	zpool add tank spare c1t15d0

? After doing that c1t6d0 is offline and ready to be physically 
replaced?
> Then you could physically replace c1t6d0 and add it back to the pool as
> a spare, like this:
>
> # zpool add tank spare c1t6d0
>
> For a production system, the steps above might be the most efficient.
> Get the faulted disk replaced with a known good disk so the pool is
> no longer degraded, then physically replace the bad disk when you have
> the time and add it back to the pool as a spare.
>
> It is also good practice to run a zpool scrub to ensure the
> replacement is operational
That would be

	zpool scrub tank

in my case!?
> and use zpool clear to clear the previous
> errors on the pool.
I assume teh complete comamnd fo rmy case is

	zpool clear tank

Why d we have to do that. Couldb''t zfs realize that everything is fine 
again after executing "zpool replace tank c1t6d0 c1t15d0"?
>  If the system is used heavily, then you might want to run the zpool 
> scrub when system use is reduced.
That would be now! :-)
> If you were going to physically replace c1t6d0 while it was still
> attached to the pool, then you might offline it first.
Ok, this sounds like approach 3)

  	zpool offline tank c1t6d0
	<physically replace c1t6d0 with a new one>
	zpool online tank c1t6d0

Would that be it?

Thanks a lot!

Regards,

   Andreas

Cindy.Swearingen at Sun.COM

2009-Aug-06 20:38 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

Andreas,

More comments below.

Cindy

On 08/06/09 14:18, Andreas H?schler wrote:> Hi Cindy,
> 
> 
>> Good job for using a mirrored configuration. :-)
> 
> 
> Thanks!
> 
>> Your various approaches would work.
>>
>> My only comment about #2 is that it might take some time for the spare
>> to kick in for the faulted disk.
>>
>> Both 1 and 2 would take a bit more time than just replacing the faulted
>> disk with a spare disk, like this:
>>
>> # zpool replace tank c1t6d0 c1t15d0
> 
> 
> You mean I can execute
> 
>     zpool replace tank c1t6d0 c1t15d0
> 
> without having made c1t15d0 a spare disk first with
Yes, that is correct.> 
>     zpool add tank spare c1t15d0
> 
> ? After doing that c1t6d0 is offline and ready to be physically replaced?
Yes, that is correct.> 
>> Then you could physically replace c1t6d0 and add it back to the pool as
>> a spare, like this:
>>
>> # zpool add tank spare c1t6d0
>>
>> For a production system, the steps above might be the most efficient.
>> Get the faulted disk replaced with a known good disk so the pool is
>> no longer degraded, then physically replace the bad disk when you have
>> the time and add it back to the pool as a spare.
>>
>> It is also good practice to run a zpool scrub to ensure the
>> replacement is operational
> 
> 
> That would be
> 
>     zpool scrub tank
> 
> in my case!?
Yes.> 
>> and use zpool clear to clear the previous
>> errors on the pool.
> 
> 
> I assume teh complete comamnd fo rmy case is
> 
>     zpool clear tank
> 
> Why d we have to do that. Couldb''t zfs realize that everything is
fine
> again after executing "zpool replace tank c1t6d0 c1t15d0"?
Yes, sometimes the clear is not necessary but it will also clear the 
error counts if need be.> 
>>  If the system is used heavily, then you might want to run the zpool 
>> scrub when system use is reduced.
> 
> 
> That would be now! :-)
> 
>> If you were going to physically replace c1t6d0 while it was still
>> attached to the pool, then you might offline it first.
> 
> 
> Ok, this sounds like approach 3)
> 
>       zpool offline tank c1t6d0
>     <physically replace c1t6d0 with a new one>
>     zpool online tank c1t6d0
> 
> Would that be it?
Those steps would be like this:

zpool offline tank c1t6d0
<physically replace c1t6d0 with a new one>
zpool replace tank c1t6d0
zpool online tank c1t6d0

On some hardware, you must unconfigure the disk before replacing it,
such as after taking it offline.

I''m not sure if the x4240 is in that category. If you do the
replacement
with another known good disk (c1t15d0) then you do not have to
unconfigure the failed disk first. See Example 11-1 for more information:

http://docs.sun.com/app/docs/doc/819-5461/gbbvf?a=view
> 
> Thanks a lot!
> 
> Regards,
> 
>    Andreas
> 
>

Don Turnbull

2009-Aug-06 20:46 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

If her adds the spare and then manually forces a replace, it will take 
no more time than any other way.  I do this quite frequently and without 
needing the scrub which does take quite a lot of time.

Cindy.Swearingen at Sun.COM wrote:> Hi Andreas,
>
> Good job for using a mirrored configuration. :-)
>
> Your various approaches would work.
>
> My only comment about #2 is that it might take some time for the spare
> to kick in for the faulted disk.
>
> Both 1 and 2 would take a bit more time than just replacing the faulted
> disk with a spare disk, like this:
>
> # zpool replace tank c1t6d0 c1t15d0
>
> Then you could physically replace c1t6d0 and add it back to the pool as
> a spare, like this:
>
> # zpool add tank spare c1t6d0
>
> For a production system, the steps above might be the most efficient.
> Get the faulted disk replaced with a known good disk so the pool is
> no longer degraded, then physically replace the bad disk when you have
> the time and add it back to the pool as a spare.
>
> It is also good practice to run a zpool scrub to ensure the
> replacement is operational and use zpool clear to clear the previous
> errors on the pool. If the system is used heavily, then you might want 
> to run the zpool scrub when system use is reduced.
>
> If you were going to physically replace c1t6d0 while it was still
> attached to the pool, then you might offline it first.
>
> Cindy
>
> On 08/06/09 13:17, Andreas H?schler wrote:
>   
>> Dear managers,
>>
>> one of our servers (X4240) shows a faulty disk:
>>
>>
------------------------------------------------------------------------
>> -bash-3.00# zpool status
>>   pool: rpool
>>  state: ONLINE
>>  scrub: none requested
>> config:
>>
>>         NAME          STATE     READ WRITE CKSUM
>>         rpool         ONLINE       0     0     0
>>           mirror      ONLINE       0     0     0
>>             c1t0d0s0  ONLINE       0     0     0
>>             c1t1d0s0  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>   pool: tank
>>  state: DEGRADED
>> status: One or more devices are faulted in response to persistent
>> errors.
>>         Sufficient replicas exist for the pool to continue functioning
>> in a
>>         degraded state.
>> action: Replace the faulted device, or use ''zpool
clear'' to mark the
>> device
>>         repaired.
>>  scrub: none requested
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         tank        DEGRADED     0     0     0
>>           mirror    ONLINE       0     0     0
>>             c1t2d0  ONLINE       0     0     0
>>             c1t3d0  ONLINE       0     0     0
>>           mirror    ONLINE       0     0     0
>>             c1t5d0  ONLINE       0     0     0
>>             c1t4d0  ONLINE       0     0     0
>>           mirror    DEGRADED     0     0     0
>>             c1t6d0  FAULTED      0    19     0  too many errors
>>             c1t7d0  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
------------------------------------------------------------------------
>> I derived the following possible approaches to solve the problem:
>>
>> 1) A way to reestablish redundancy would be to use the command
>>
>>        zpool attach tank c1t7d0 c1t15d0
>>
>> to add c1t15d0 to the virtual device "c1t6d0 + c1t7d0". We
still would
>> have the faulty disk in the virtual device.
>>
>> We could then dettach the faulty disk with the command
>>
>>        zpool dettach tank c1t6d0
>>
>> 2) Another approach would be to add a spare disk to tank
>>
>>        zpool add tank spare c1t15d0
>>
>> and the replace to replace the faulty disk.
>>
>>        zpool replace tank c1t6d0 c1t15d0
>>
>> In theory that is easy, but since I have never done that and since this
>> is a productive server I would appreciate if somone with more
>> experience would look on my agenda before I issue these commands.
>>
>> What is the difference between the two approaches? Which one do you
>> recommend? And is that really all that has to be done or am I missing a
>> bit? I mean can c1t6d0 be physically replaced after issuing "zpool
>> dettach tank c1t6d0" or "zpool replace tank c1t6d0
c1t15d0"? I also
>> found the command
>>
>>        zpool offline tank  ...
>>
>> but am not sure whether this should be used in my case. Hints are
>> greatly appreciated!
>>
>> Thanks a lot,
>>
>>   Andreas
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discus
>> s
>>     
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>

Andreas Höschler

2009-Aug-06 21:05 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

Hi all,
>>     zpool add tank spare c1t15d0
>> ? After doing that c1t6d0 is offline and ready to be physically 
>> replaced?
>
> Yes, that is correct.
>>> Then you could physically replace c1t6d0 and add it back to the
pool
>>> as
>>> a spare, like this:
>>>
>>> # zpool add tank spare c1t6d0
>>>
>>> For a production system, the steps above might be the most
efficient.
>>> Get the faulted disk replaced with a known good disk so the pool is
>>> no longer degraded, then physically replace the bad disk when you 
>>> have
>>> the time and add it back to the pool as a spare.
>>>
>>> It is also good practice to run a zpool scrub to ensure the
>>> replacement is operational
>> That would be
>>     zpool scrub tank
>> in my case!?
>
> Yes.
>>> and use zpool clear to clear the previous
>>> errors on the pool.
>> I assume teh complete comamnd fo rmy case is
>>     zpool clear tank
>> Why d we have to do that. Couldb''t zfs realize that everything
is
>> fine again after executing "zpool replace tank c1t6d0
c1t15d0"?
>
> Yes, sometimes the clear is not necessary but it will also clear the 
> error counts if need be.
I have done

	zpool add tank spare c1t15d0
	zpool replace tank c1t6d0 c1t15d0

now and waited for the completion of the resilvering process. "zpool 
status" now gives me

scrub: resilver completed after 0h22m with 0 errors on Thu Aug  6 
22:55:37 2009
config:

         NAME           STATE     READ WRITE CKSUM
         tank           DEGRADED     0     0     0
           mirror       ONLINE       0     0     0
             c1t2d0     ONLINE       0     0     0
             c1t3d0     ONLINE       0     0     0
           mirror       ONLINE       0     0     0
             c1t5d0     ONLINE       0     0     0
             c1t4d0     ONLINE       0     0     0
           mirror       DEGRADED     0     0     0
             spare      DEGRADED     0     0     0
               c1t6d0   FAULTED      0    19     0  too many errors
               c1t15d0  ONLINE       0     0     0
             c1t7d0     ONLINE       0     0     0
         spares
           c1t15d0      INUSE     currently in use

errors: No known data errors

This does look like a final step is missing. Can I simply physically 
replace c1t6d0 now or do I have to do

	zpool offline tank c1t6d0

first? Moreover it seems I have to run a

	zpool clear

in my case to get rid of the DEGRADED message!? What is the missing bit 
here?
> zpool offline tank c1t6d0
> <physically replace c1t6d0 with a new one>
> zpool replace tank c1t6d0
> zpool online tank c1t6d0
Just out of curiosity (since I used the other road this time), how does 
the replace command know what exactly to do here. In my case I ordered 
the system specifically to replace c1t6d0 with c1t15d0 by doing "zpool 
replace tank c1t6d0 c1t15d0" but if I simply issue

	zpool replace tank c1t6d0

it ...!??

Thanks a lot,

   Andreas

Cindy.Swearingen at Sun.COM

2009-Aug-06 21:13 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

Andreas,

I think you can still offline the faulted disk, c1t6d0.

The difference between these two replacements:

zpool replace tank c1t6d0 c1t15d0
zpool replace tank c1t6d0

Is that in the second case, you are telling ZFS that c1t6d0
has been physically replaced in the same location. This would
be equivalent but unnecessary syntax:

zpool replace tank c1t6d0 c1t6d0

Another option is to set the autoreplace pool property to on,
which will do the replacement steps (zpool replace) after
you physically replace the disk in the same physical location
as the faulted disk. This is also described in Example 11-1,
here:

http://docs.sun.com/app/docs/doc/819-5461/gbbvf?a=view

After you physically replace c1t6d0, then you might have
to detach the spare, c1t15d0, back to the spare pool,
like this:

# zpool detach tank c1t15d0

I''m not sure this step is always necessary...

cs

On 08/06/09 15:05, Andreas H?schler wrote:> Hi all,
> 
>>>     zpool add tank spare c1t15d0
>>> ? After doing that c1t6d0 is offline and ready to be physically 
>>> replaced?
>>
>>
>> Yes, that is correct.
>>
>>>> Then you could physically replace c1t6d0 and add it back to the
pool as
>>>> a spare, like this:
>>>>
>>>> # zpool add tank spare c1t6d0
>>>>
>>>> For a production system, the steps above might be the most
efficient.
>>>> Get the faulted disk replaced with a known good disk so the
pool is
>>>> no longer degraded, then physically replace the bad disk when
you have
>>>> the time and add it back to the pool as a spare.
>>>>
>>>> It is also good practice to run a zpool scrub to ensure the
>>>> replacement is operational
>>>
>>> That would be
>>>     zpool scrub tank
>>> in my case!?
>>
>>
>> Yes.
>>
>>>> and use zpool clear to clear the previous
>>>> errors on the pool.
>>>
>>> I assume teh complete comamnd fo rmy case is
>>>     zpool clear tank
>>> Why d we have to do that. Couldb''t zfs realize that
everything is
>>> fine again after executing "zpool replace tank c1t6d0
c1t15d0"?
>>
>>
>> Yes, sometimes the clear is not necessary but it will also clear the 
>> error counts if need be.
> 
> 
> I have done
> 
>     zpool add tank spare c1t15d0
>     zpool replace tank c1t6d0 c1t15d0
> 
> now and waited for the completion of the resilvering process. "zpool 
> status" now gives me
> 
> scrub: resilver completed after 0h22m with 0 errors on Thu Aug  6 
> 22:55:37 2009
> config:
> 
>          NAME           STATE     READ WRITE CKSUM
>          tank           DEGRADED     0     0     0
>            mirror       ONLINE       0     0     0
>              c1t2d0     ONLINE       0     0     0
>              c1t3d0     ONLINE       0     0     0
>            mirror       ONLINE       0     0     0
>              c1t5d0     ONLINE       0     0     0
>              c1t4d0     ONLINE       0     0     0
>            mirror       DEGRADED     0     0     0
>              spare      DEGRADED     0     0     0
>                c1t6d0   FAULTED      0    19     0  too many errors
>                c1t15d0  ONLINE       0     0     0
>              c1t7d0     ONLINE       0     0     0
>          spares
>            c1t15d0      INUSE     currently in use
> 
> errors: No known data errors
> 
> This does look like a final step is missing. Can I simply physically 
> replace c1t6d0 now or do I have to do
> 
>     zpool offline tank c1t6d0
> 
> first? Moreover it seems I have to run a
> 
>     zpool clear
> 
> in my case to get rid of the DEGRADED message!? What is the missing bit 
> here?
> 
>> zpool offline tank c1t6d0
>> <physically replace c1t6d0 with a new one>
>> zpool replace tank c1t6d0
>> zpool online tank c1t6d0
> 
> 
> Just out of curiosity (since I used the other road this time), how does 
> the replace command know what exactly to do here. In my case I ordered 
> the system specifically to replace c1t6d0 with c1t15d0 by doing "zpool
> replace tank c1t6d0 c1t15d0" but if I simply issue
> 
>     zpool replace tank c1t6d0
> 
> it ...!??
> 
> Thanks a lot,
> 
>    Andreas
> 
> 
>

Andreas Höschler

2009-Aug-06 21:35 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

Hi Cindy,
> I think you can still offline the faulted disk, c1t6d0.
OK, here it gets tricky. I have

         NAME           STATE     READ WRITE CKSUM
         tank           DEGRADED     0     0     0
           mirror       ONLINE       0     0     0
             c1t2d0     ONLINE       0     0     0
             c1t3d0     ONLINE       0     0     0
           mirror       ONLINE       0     0     0
             c1t5d0     ONLINE       0     0     0
             c1t4d0     ONLINE       0     0     0
           mirror       DEGRADED     0     0     0
             spare      DEGRADED     0     0     0
               c1t6d0   FAULTED      0    19     0  too many errors
               c1t15d0  ONLINE       0     0     0
             c1t7d0     ONLINE       0     0     0
         spares
           c1t15d0      INUSE     currently in use

now. When I issue the command

	zpool offline tank c1t6d0

I get

	cannot offline c1t6d0: no valid replicas

??

However

	zpool detach tank c1t6d0

seems to work!

pool: tank
  state: ONLINE
  scrub: resilver completed after 0h22m with 0 errors on Thu Aug  6 
22:55:37 2009
config:

         NAME         STATE     READ WRITE CKSUM
         tank         ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c1t2d0   ONLINE       0     0     0
             c1t3d0   ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c1t5d0   ONLINE       0     0     0
             c1t4d0   ONLINE       0     0     0
           mirror     ONLINE       0     0     0
             c1t15d0  ONLINE       0     0     0
             c1t7d0   ONLINE       0     0     0

errors: No known data errors

This looks like I can remove and physically replace c1t6d0 now! :-)

Thanks,

   Andreas

Cindy.Swearingen at Sun.COM

2009-Aug-06 21:42 UTC

head link

[zfs-discuss] Replacing faulty disk in ZFS pool

Dang. This is a bug we talked about recently that is fixed in Nevada and 
an upcoming Solaris 10 release.

Okay, so you can''t offline the faulted disk, but you were able to
replace it and detach the spare.

Cool beans...

Cindy

On 08/06/09 15:35, Andreas H?schler wrote:> Hi Cindy,
> 
>> I think you can still offline the faulted disk, c1t6d0.
> 
> 
> OK, here it gets tricky. I have
> 
>          NAME           STATE     READ WRITE CKSUM
>          tank           DEGRADED     0     0     0
>            mirror       ONLINE       0     0     0
>              c1t2d0     ONLINE       0     0     0
>              c1t3d0     ONLINE       0     0     0
>            mirror       ONLINE       0     0     0
>              c1t5d0     ONLINE       0     0     0
>              c1t4d0     ONLINE       0     0     0
>            mirror       DEGRADED     0     0     0
>              spare      DEGRADED     0     0     0
>                c1t6d0   FAULTED      0    19     0  too many errors
>                c1t15d0  ONLINE       0     0     0
>              c1t7d0     ONLINE       0     0     0
>          spares
>            c1t15d0      INUSE     currently in use
> 
> now. When I issue the command
> 
>     zpool offline tank c1t6d0
> 
> I get
> 
>     cannot offline c1t6d0: no valid replicas
> 
> ??
> 
> However
> 
>     zpool detach tank c1t6d0
> 
> seems to work!
> 
> pool: tank
>   state: ONLINE
>   scrub: resilver completed after 0h22m with 0 errors on Thu Aug  6 
> 22:55:37 2009
> config:
> 
>          NAME         STATE     READ WRITE CKSUM
>          tank         ONLINE       0     0     0
>            mirror     ONLINE       0     0     0
>              c1t2d0   ONLINE       0     0     0
>              c1t3d0   ONLINE       0     0     0
>            mirror     ONLINE       0     0     0
>              c1t5d0   ONLINE       0     0     0
>              c1t4d0   ONLINE       0     0     0
>            mirror     ONLINE       0     0     0
>              c1t15d0  ONLINE       0     0     0
>              c1t7d0   ONLINE       0     0     0
> 
> errors: No known data errors
> 
> This looks like I can remove and physically replace c1t6d0 now! :-)
> 
> Thanks,
> 
>    Andreas
>

zfs discuss - Aug 2009 - Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool

[zfs-discuss] Replacing faulty disk in ZFS pool