thr3ads.net - zfs discuss - [zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Ryan Schwartz

2010-Jul-08 20:55 UTC

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

I''ve got an x4500 with a zpool in a weird state. The two spares are
listed twice each, once as AVAIL, and once as FAULTED.

[IDGSUN02:/opt/src] root# zpool status
 pool: idgsun02
state: ONLINE
scrub: none requested
config:

       NAME        STATE     READ WRITE CKSUM
       idgsun02    ONLINE       0     0     0
         raidz2    ONLINE       0     0     0
           c0t1d0  ONLINE       0     0     0
           c0t5d0  ONLINE       0     0     0
           c1t1d0  ONLINE       0     0     0
           c1t5d0  ONLINE       0     0     0
           c6t1d0  ONLINE       0     0     0
           c6t5d0  ONLINE       0     0     0
           c7t1d0  ONLINE       0     0     0
           c7t5d0  ONLINE       0     0     0
           c4t1d0  ONLINE       0     0     0
           c4t5d0  ONLINE       0     0     0
         raidz2    ONLINE       0     0     0
           c0t0d0  ONLINE       0     0     0
           c0t4d0  ONLINE       0     0     0
           c1t0d0  ONLINE       0     0     0
           c1t4d0  ONLINE       0     0     0
           c6t0d0  ONLINE       0     0     0
           c6t4d0  ONLINE       0     0     0
           c7t0d0  ONLINE       0     0     0
           c7t4d0  ONLINE       0     0     0
           c4t0d0  ONLINE       0     0     0
           c4t4d0  ONLINE       0     0     0
       spares
         c0t6d0    AVAIL   
         c5t5d0    AVAIL   
         c0t6d0    FAULTED   corrupted data
         c5t5d0    FAULTED   corrupted data

errors: No known data errors

I''ve been working with Sun support, but wanted to toss it out to the
community as well. I found and compiled the zpconfig util from here:
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSGuids and found that the
spares in question have different GUIDs, but the same vdev path:

       spares[0]
               type=''disk''
               guid=7826011125406290675
               path=''/dev/dsk/c0t6d0s0''
               devid=''id1,sd at
SATA_____HITACHI_HUA7210S______GTF000PAHJMLXF/a''
               phys_path=''/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at
1/disk at 6,0:a''
               whole_disk=1
               is_spare=1
               stats:
                   state=7
                   aux=0
                   ...
       spares[1]
               type=''disk''
               guid=870554111467930413
               path=''/dev/dsk/c5t5d0s0''
               devid=''id1,sd at
SATA_____HITACHI_HUA7210S______GTF000PAHJ5NLF/a''
               phys_path=''/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at
1/disk at 5,0:a''
               whole_disk=1
               is_spare=1
               stats:
                   state=7
                   aux=0
                   ...
       spares[2]
               type=''disk''
               guid=5486341412008712208
               path=''/dev/dsk/c0t6d0s0''
               devid=''id1,sd at
SATA_____HITACHI_HUA7210S______GTF000PAHJMLXF/a''
               phys_path=''/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at
1/disk at 6,0:a''
               whole_disk=1
               stats:
                   state=4
                   aux=2
                   ...
       spares[3]
               type=''disk''
               guid=16971039974506843020
               path=''/dev/dsk/c5t5d0s0''
               devid=''id1,sd at
SATA_____HITACHI_HUA7210S______GTF000PAHJ5NLF/a''
               phys_path=''/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at
1/disk at 5,0:a''
               whole_disk=1
               stats:
                   state=4
                   aux=2
                   ...

I''ve exported/imported the pool and the spares are still listed as
above.The regular ''zpool remove idgsun02 c0t6d0s0'' (and
c5t5d0s0) also do not work, but do not produce any error output either. This
sounds remarkably like
http://bugs.opensolaris.org/bugdatabase/view_bug.do;?bug_id=6893472 but as I
said, the export/import does not correct the issue. Any suggestions on how I can
remove the "FAULTED" spares from the pool? Can I use the GUID with
zpool remove somehow?
-- 
Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An
Ingram Digital Company
Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com

Cindy Swearingen

2010-Jul-08 22:25 UTC

head link

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

Hi Ryan,

What events lead up to this situation? I''ve seen a similar problem when
a system upgrade caused the controller numbers of the spares to change. 
In that case, the workaround was to export the pool, correct the spare 
device names, and import the pool. I''m not sure if this workaround 
applies to your case. Do you know if the spare device names changed?

My hunch is that you could export this pool, reconnect the spare
devices, and reimport the pool, but I''d rather test this on my own pool
first and I can''t reproduce this problem.

I don''t think you can remove the spares by their GUID. At least,
I couldn''t.

You said you tried to remove the spares with zpool remove.

Did you try this command:

# zpool remove idgsun02 c0t6d0

Or this command, which I don''t think would work, but you would
get a message like this:

# zpool remove idgsun02 c0t6d0s0
cannot remove c0t6d0s0: no such device in pool

Thanks,

Cindy

On 07/08/10 14:55, Ryan Schwartz wrote:> I''ve got an x4500 with a zpool in a weird state. The two spares
are listed twice each, once as AVAIL, and once as FAULTED.
> 
> [IDGSUN02:/opt/src] root# zpool status
>  pool: idgsun02
> state: ONLINE
> scrub: none requested
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        idgsun02    ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c0t1d0  ONLINE       0     0     0
>            c0t5d0  ONLINE       0     0     0
>            c1t1d0  ONLINE       0     0     0
>            c1t5d0  ONLINE       0     0     0
>            c6t1d0  ONLINE       0     0     0
>            c6t5d0  ONLINE       0     0     0
>            c7t1d0  ONLINE       0     0     0
>            c7t5d0  ONLINE       0     0     0
>            c4t1d0  ONLINE       0     0     0
>            c4t5d0  ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c0t0d0  ONLINE       0     0     0
>            c0t4d0  ONLINE       0     0     0
>            c1t0d0  ONLINE       0     0     0
>            c1t4d0  ONLINE       0     0     0
>            c6t0d0  ONLINE       0     0     0
>            c6t4d0  ONLINE       0     0     0
>            c7t0d0  ONLINE       0     0     0
>            c7t4d0  ONLINE       0     0     0
>            c4t0d0  ONLINE       0     0     0
>            c4t4d0  ONLINE       0     0     0
>        spares
>          c0t6d0    AVAIL   
>          c5t5d0    AVAIL   
>          c0t6d0    FAULTED   corrupted data
>          c5t5d0    FAULTED   corrupted data
> 
> errors: No known data errors
> 
> I''ve been working with Sun support, but wanted to toss it out to
the community as well. I found and compiled the zpconfig util from here:
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSGuids and found that the
spares in question have different GUIDs, but the same vdev path:
> 
>        spares[0]
>                type=''disk''
>                guid=7826011125406290675
>                path=''/dev/dsk/c0t6d0s0''
>                devid=''id1,sd at
SATA_____HITACHI_HUA7210S______GTF000PAHJMLXF/a''
>                phys_path=''/pci at 0,0/pci1022,7458 at
1/pci11ab,11ab at 1/disk at 6,0:a''
>                whole_disk=1
>                is_spare=1
>                stats:
>                    state=7
>                    aux=0
>                    ...
>        spares[1]
>                type=''disk''
>                guid=870554111467930413
>                path=''/dev/dsk/c5t5d0s0''
>                devid=''id1,sd at
SATA_____HITACHI_HUA7210S______GTF000PAHJ5NLF/a''
>                phys_path=''/pci at 1,0/pci1022,7458 at
4/pci11ab,11ab at 1/disk at 5,0:a''
>                whole_disk=1
>                is_spare=1
>                stats:
>                    state=7
>                    aux=0
>                    ...
>        spares[2]
>                type=''disk''
>                guid=5486341412008712208
>                path=''/dev/dsk/c0t6d0s0''
>                devid=''id1,sd at
SATA_____HITACHI_HUA7210S______GTF000PAHJMLXF/a''
>                phys_path=''/pci at 0,0/pci1022,7458 at
1/pci11ab,11ab at 1/disk at 6,0:a''
>                whole_disk=1
>                stats:
>                    state=4
>                    aux=2
>                    ...
>        spares[3]
>                type=''disk''
>                guid=16971039974506843020
>                path=''/dev/dsk/c5t5d0s0''
>                devid=''id1,sd at
SATA_____HITACHI_HUA7210S______GTF000PAHJ5NLF/a''
>                phys_path=''/pci at 1,0/pci1022,7458 at
4/pci11ab,11ab at 1/disk at 5,0:a''
>                whole_disk=1
>                stats:
>                    state=4
>                    aux=2
>                    ...
> 
> I''ve exported/imported the pool and the spares are still listed as
above.The regular ''zpool remove idgsun02 c0t6d0s0'' (and
c5t5d0s0) also do not work, but do not produce any error output either. This
sounds remarkably like
http://bugs.opensolaris.org/bugdatabase/view_bug.do;?bug_id=6893472 but as I
said, the export/import does not correct the issue. Any suggestions on how I can
remove the "FAULTED" spares from the pool? Can I use the GUID with
zpool remove somehow?

Ryan Schwartz

2010-Jul-09 16:38 UTC

head link

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

Hi Cindy,

Not sure exactly when the drives went into this state, but it is likely that it
happened when I added a second pool, added the same spares to the second pool,
then later destroyed the second pool. There have been no controller or any other
hardware changes to this system - it is all original parts. The device names are
valid, the issue is that they are listed twice - once for a spare which is AVAIL
and another time for the spare which is FAULTED.

I''ve tried zpool remove, zpool offline, zpool clear, zpool
export/import, I''ve unconfigured the drives via cfgadm and tried a
remove, nothing works to remove the FAULTED spares.

I was just able remove the AVAIL spares, but only since they were listed first
in the spares list:

[IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0  
[IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0
[IDGSUN02:/dev/dsk] root# zpool status
  pool: idgsun02
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        idgsun02    ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c6t0d0  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
        spares
          c0t6d0    FAULTED   corrupted data
          c5t5d0    FAULTED   corrupted data

errors: No known data errors

What''s interesting is that running the zpool remove commands a second
time has no effect (presumably because zpool is using GUID internally).

I may have, at one point, tried to re-add the drive again after seeing the state
FAULTED and not being able to remove it, which is probably where the second set
of entries came from. (Pretty much exactly what''s described here:
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares).

What I really need is to be able to remove the two bogus faulted spares, and I
think the only way I''ll be able to do that is via the GUIDs, since the
(valid) vdev path is shown as the same for each. I would guess zpool is
attempting to remove the device  I''ve got a support case open, but no
traction on that as of yet.
-- 
Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An
Ingram Digital Company
Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com

On Jul 8, 2010, at 5:25 PM, Cindy Swearingen wrote:
> Hi Ryan,
> 
> What events lead up to this situation? I''ve seen a similar problem
when a system upgrade caused the controller numbers of the spares to change. In
that case, the workaround was to export the pool, correct the spare device
names, and import the pool. I''m not sure if this workaround applies to
your case. Do you know if the spare device names changed?
> 
> My hunch is that you could export this pool, reconnect the spare
> devices, and reimport the pool, but I''d rather test this on my own
pool first and I can''t reproduce this problem.
> 
> I don''t think you can remove the spares by their GUID. At least,
> I couldn''t.
> 
> You said you tried to remove the spares with zpool remove.
> 
> Did you try this command:
> 
> # zpool remove idgsun02 c0t6d0
> 
> Or this command, which I don''t think would work, but you would
> get a message like this:
> 
> # zpool remove idgsun02 c0t6d0s0
> cannot remove c0t6d0s0: no such device in pool
> 
> Thanks,
> 
> Cindy

Cindy Swearingen

2010-Jul-09 18:00 UTC

head link

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

Hi Ryan,

Which Solaris release is this?

Thanks,

Cindy

On 07/09/10 10:38, Ryan Schwartz wrote:> Hi Cindy,
> 
> Not sure exactly when the drives went into this state, but it is likely
that it happened when I added a second pool, added the same spares to the second
pool, then later destroyed the second pool. There have been no controller or any
other hardware changes to this system - it is all original parts. The device
names are valid, the issue is that they are listed twice - once for a spare
which is AVAIL and another time for the spare which is FAULTED.
> 
> I''ve tried zpool remove, zpool offline, zpool clear, zpool
export/import, I''ve unconfigured the drives via cfgadm and tried a
remove, nothing works to remove the FAULTED spares.
> 
> I was just able remove the AVAIL spares, but only since they were listed
first in the spares list:
> 
> [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0  
> [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0
> [IDGSUN02:/dev/dsk] root# zpool status
>   pool: idgsun02
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         idgsun02    ONLINE       0     0     0
>           raidz2    ONLINE       0     0     0
>             c0t1d0  ONLINE       0     0     0
>             c0t5d0  ONLINE       0     0     0
>             c1t1d0  ONLINE       0     0     0
>             c1t5d0  ONLINE       0     0     0
>             c6t1d0  ONLINE       0     0     0
>             c6t5d0  ONLINE       0     0     0
>             c7t1d0  ONLINE       0     0     0
>             c7t5d0  ONLINE       0     0     0
>             c4t1d0  ONLINE       0     0     0
>             c4t5d0  ONLINE       0     0     0
>           raidz2    ONLINE       0     0     0
>             c0t0d0  ONLINE       0     0     0
>             c0t4d0  ONLINE       0     0     0
>             c1t0d0  ONLINE       0     0     0
>             c1t4d0  ONLINE       0     0     0
>             c6t0d0  ONLINE       0     0     0
>             c6t4d0  ONLINE       0     0     0
>             c7t0d0  ONLINE       0     0     0
>             c7t4d0  ONLINE       0     0     0
>             c4t0d0  ONLINE       0     0     0
>             c4t4d0  ONLINE       0     0     0
>         spares
>           c0t6d0    FAULTED   corrupted data
>           c5t5d0    FAULTED   corrupted data
> 
> errors: No known data errors
> 
> What''s interesting is that running the zpool remove commands a
second time has no effect (presumably because zpool is using GUID internally).
> 
> I may have, at one point, tried to re-add the drive again after seeing the
state FAULTED and not being able to remove it, which is probably where the
second set of entries came from. (Pretty much exactly what''s described
here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares).
> 
> What I really need is to be able to remove the two bogus faulted spares,
and I think the only way I''ll be able to do that is via the GUIDs,
since the (valid) vdev path is shown as the same for each. I would guess zpool
is attempting to remove the device  I''ve got a support case open, but
no traction on that as of yet.

Ryan Schwartz

2010-Jul-09 18:06 UTC

head link

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

Ok, so after removing the spares marked as AVAIL and re-adding them again, I put
myself back in the "you''re effed, dude" boat. What I should
have done at that point is a zpool export/import at that point which would have
resolved it.

So what I did was recreate the steps that got me into the state where the AVAIL
spares were listed first, rather than the FAULTED ones (which allowed me to
remove them as demonstrated in my previous email).

I created another pool sharing the same spares, removed the spares then
destroyed it, then exported and imported the main pool again. Once that
operation completed, I was then able to remove the spares again, export/import
the pool, and the problem is now resolved.

zpool create cleanup c5t3d0 c4t3d0 spare c0t6d0 c5t5d0
zpool remove cleanup c0t6d0 c5t5d0
zpool destroy cleanup
zpool export idgsun02
zpool import idgsun02
zpool remove idgsun02 c0t6d0
zpool remove idgsun02 c5t5d0
zpool export idgsun02
zpool import idgsun02

And the resultant zpool status is this:

[IDGSUN02:/] root# zpool status 
  pool: idgsun02
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        idgsun02    ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t5d0  ONLINE       0     0     0
            c6t1d0  ONLINE       0     0     0
            c6t5d0  ONLINE       0     0     0
            c7t1d0  ONLINE       0     0     0
            c7t5d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c4t5d0  ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0
            c6t0d0  ONLINE       0     0     0
            c6t4d0  ONLINE       0     0     0
            c7t0d0  ONLINE       0     0     0
            c7t4d0  ONLINE       0     0     0
            c4t0d0  ONLINE       0     0     0
            c4t4d0  ONLINE       0     0     0
        spares
          c0t6d0    AVAIL   
          c5t5d0    AVAIL   

errors: No known data errors

Hopefully this might help someone in the future if they get into this situation.
-- 
Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An
Ingram Digital Company
Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com

On Jul 9, 2010, at 11:38 AM, Ryan Schwartz wrote:
> Hi Cindy,
> 
> Not sure exactly when the drives went into this state, but it is likely
that it happened when I added a second pool, added the same spares to the second
pool, then later destroyed the second pool. There have been no controller or any
other hardware changes to this system - it is all original parts. The device
names are valid, the issue is that they are listed twice - once for a spare
which is AVAIL and another time for the spare which is FAULTED.
> 
> I''ve tried zpool remove, zpool offline, zpool clear, zpool
export/import, I''ve unconfigured the drives via cfgadm and tried a
remove, nothing works to remove the FAULTED spares.
> 
> I was just able remove the AVAIL spares, but only since they were listed
first in the spares list:
> 
> [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0  
> [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0
> [IDGSUN02:/dev/dsk] root# zpool status
>  pool: idgsun02
> state: ONLINE
> scrub: none requested
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        idgsun02    ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c0t1d0  ONLINE       0     0     0
>            c0t5d0  ONLINE       0     0     0
>            c1t1d0  ONLINE       0     0     0
>            c1t5d0  ONLINE       0     0     0
>            c6t1d0  ONLINE       0     0     0
>            c6t5d0  ONLINE       0     0     0
>            c7t1d0  ONLINE       0     0     0
>            c7t5d0  ONLINE       0     0     0
>            c4t1d0  ONLINE       0     0     0
>            c4t5d0  ONLINE       0     0     0
>          raidz2    ONLINE       0     0     0
>            c0t0d0  ONLINE       0     0     0
>            c0t4d0  ONLINE       0     0     0
>            c1t0d0  ONLINE       0     0     0
>            c1t4d0  ONLINE       0     0     0
>            c6t0d0  ONLINE       0     0     0
>            c6t4d0  ONLINE       0     0     0
>            c7t0d0  ONLINE       0     0     0
>            c7t4d0  ONLINE       0     0     0
>            c4t0d0  ONLINE       0     0     0
>            c4t4d0  ONLINE       0     0     0
>        spares
>          c0t6d0    FAULTED   corrupted data
>          c5t5d0    FAULTED   corrupted data
> 
> errors: No known data errors
> 
> What''s interesting is that running the zpool remove commands a
second time has no effect (presumably because zpool is using GUID internally).
> 
> I may have, at one point, tried to re-add the drive again after seeing the
state FAULTED and not being able to remove it, which is probably where the
second set of entries came from. (Pretty much exactly what''s described
here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares).
> 
> What I really need is to be able to remove the two bogus faulted spares,
and I think the only way I''ll be able to do that is via the GUIDs,
since the (valid) vdev path is shown as the same for each. I would guess zpool
is attempting to remove the device  I''ve got a support case open, but
no traction on that as of yet.
> -- 
> Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. -
An Ingram Digital Company
> Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com
> 
> On Jul 8, 2010, at 5:25 PM, Cindy Swearingen wrote:
> 
>> Hi Ryan,
>> 
>> What events lead up to this situation? I''ve seen a similar
problem when a system upgrade caused the controller numbers of the spares to
change. In that case, the workaround was to export the pool, correct the spare
device names, and import the pool. I''m not sure if this workaround
applies to your case. Do you know if the spare device names changed?
>> 
>> My hunch is that you could export this pool, reconnect the spare
>> devices, and reimport the pool, but I''d rather test this on my
own pool first and I can''t reproduce this problem.
>> 
>> I don''t think you can remove the spares by their GUID. At
least,
>> I couldn''t.
>> 
>> You said you tried to remove the spares with zpool remove.
>> 
>> Did you try this command:
>> 
>> # zpool remove idgsun02 c0t6d0
>> 
>> Or this command, which I don''t think would work, but you would
>> get a message like this:
>> 
>> # zpool remove idgsun02 c0t6d0s0
>> cannot remove c0t6d0s0: no such device in pool
>> 
>> Thanks,
>> 
>> Cindy

Ryan Schwartz

2010-Jul-09 18:08 UTC

head link

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

Cindy,

[IDGSUN02:/] root# cat /etc/release 
                       Solaris 10 10/08 s10x_u6wos_07b X86
           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 27 October 2008

But as noted in my recent email, I''ve resolved this with an
export/import with only 2 of the 4 spares listed (they were listed as FAULTED,
but the export/import fixed that right up).
-- 
Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An
Ingram Digital Company
Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com

On Jul 9, 2010, at 1:00 PM, Cindy Swearingen wrote:
> Hi Ryan,
> 
> Which Solaris release is this?
> 
> Thanks,
> 
> Cindy
> 
> On 07/09/10 10:38, Ryan Schwartz wrote:
>> Hi Cindy,
>> Not sure exactly when the drives went into this state, but it is likely
that it happened when I added a second pool, added the same spares to the second
pool, then later destroyed the second pool. There have been no controller or any
other hardware changes to this system - it is all original parts. The device
names are valid, the issue is that they are listed twice - once for a spare
which is AVAIL and another time for the spare which is FAULTED.
>> I''ve tried zpool remove, zpool offline, zpool clear, zpool
export/import, I''ve unconfigured the drives via cfgadm and tried a
remove, nothing works to remove the FAULTED spares.
>> I was just able remove the AVAIL spares, but only since they were
listed first in the spares list:
>> [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 
[IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0
>> [IDGSUN02:/dev/dsk] root# zpool status
>>  pool: idgsun02
>> state: ONLINE
>> scrub: none requested
>> config:
>>        NAME        STATE     READ WRITE CKSUM
>>        idgsun02    ONLINE       0     0     0
>>          raidz2    ONLINE       0     0     0
>>            c0t1d0  ONLINE       0     0     0
>>            c0t5d0  ONLINE       0     0     0
>>            c1t1d0  ONLINE       0     0     0
>>            c1t5d0  ONLINE       0     0     0
>>            c6t1d0  ONLINE       0     0     0
>>            c6t5d0  ONLINE       0     0     0
>>            c7t1d0  ONLINE       0     0     0
>>            c7t5d0  ONLINE       0     0     0
>>            c4t1d0  ONLINE       0     0     0
>>            c4t5d0  ONLINE       0     0     0
>>          raidz2    ONLINE       0     0     0
>>            c0t0d0  ONLINE       0     0     0
>>            c0t4d0  ONLINE       0     0     0
>>            c1t0d0  ONLINE       0     0     0
>>            c1t4d0  ONLINE       0     0     0
>>            c6t0d0  ONLINE       0     0     0
>>            c6t4d0  ONLINE       0     0     0
>>            c7t0d0  ONLINE       0     0     0
>>            c7t4d0  ONLINE       0     0     0
>>            c4t0d0  ONLINE       0     0     0
>>            c4t4d0  ONLINE       0     0     0
>>        spares
>>          c0t6d0    FAULTED   corrupted data
>>          c5t5d0    FAULTED   corrupted data
>> errors: No known data errors
>> What''s interesting is that running the zpool remove commands a
second time has no effect (presumably because zpool is using GUID internally).
>> I may have, at one point, tried to re-add the drive again after seeing
the state FAULTED and not being able to remove it, which is probably where the
second set of entries came from. (Pretty much exactly what''s described
here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares).
>> What I really need is to be able to remove the two bogus faulted
spares, and I think the only way I''ll be able to do that is via the
GUIDs, since the (valid) vdev path is shown as the same for each. I would guess
zpool is attempting to remove the device  I''ve got a support case open,
but no traction on that as of yet.

Cindy Swearingen

2010-Jul-09 18:46 UTC

head link

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

I was going to suggest the export/import step next. :-)

I''m glad you were able to resolve it.

We are working on making spare behavior more robust.

In the meantime, my advice is keep life simple and do not share spares, 
logs, caches, or even disks between pools.

Thanks,

Cindy
On 07/09/10 12:08, Ryan Schwartz wrote:> Cindy,
> 
> [IDGSUN02:/] root# cat /etc/release 
>                        Solaris 10 10/08 s10x_u6wos_07b X86
>            Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
>                         Use is subject to license terms.
>                             Assembled 27 October 2008
> 
> But as noted in my recent email, I''ve resolved this with an
export/import with only 2 of the 4 spares listed (they were listed as FAULTED,
but the export/import fixed that right up).

zfs discuss - Jul 2010 - zpool spares listed twice, as both AVAIL and FAULTED

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED

[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED