Ryan Schwartz
2010-Jul-08 20:55 UTC
[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
I''ve got an x4500 with a zpool in a weird state. The two spares are listed twice each, once as AVAIL, and once as FAULTED. [IDGSUN02:/opt/src] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM idgsun02 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0 AVAIL c5t5d0 AVAIL c0t6d0 FAULTED corrupted data c5t5d0 FAULTED corrupted data errors: No known data errors I''ve been working with Sun support, but wanted to toss it out to the community as well. I found and compiled the zpconfig util from here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSGuids and found that the spares in question have different GUIDs, but the same vdev path: spares[0] type=''disk'' guid=7826011125406290675 path=''/dev/dsk/c0t6d0s0'' devid=''id1,sd at SATA_____HITACHI_HUA7210S______GTF000PAHJMLXF/a'' phys_path=''/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 6,0:a'' whole_disk=1 is_spare=1 stats: state=7 aux=0 ... spares[1] type=''disk'' guid=870554111467930413 path=''/dev/dsk/c5t5d0s0'' devid=''id1,sd at SATA_____HITACHI_HUA7210S______GTF000PAHJ5NLF/a'' phys_path=''/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 5,0:a'' whole_disk=1 is_spare=1 stats: state=7 aux=0 ... spares[2] type=''disk'' guid=5486341412008712208 path=''/dev/dsk/c0t6d0s0'' devid=''id1,sd at SATA_____HITACHI_HUA7210S______GTF000PAHJMLXF/a'' phys_path=''/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 6,0:a'' whole_disk=1 stats: state=4 aux=2 ... spares[3] type=''disk'' guid=16971039974506843020 path=''/dev/dsk/c5t5d0s0'' devid=''id1,sd at SATA_____HITACHI_HUA7210S______GTF000PAHJ5NLF/a'' phys_path=''/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 5,0:a'' whole_disk=1 stats: state=4 aux=2 ... I''ve exported/imported the pool and the spares are still listed as above.The regular ''zpool remove idgsun02 c0t6d0s0'' (and c5t5d0s0) also do not work, but do not produce any error output either. This sounds remarkably like http://bugs.opensolaris.org/bugdatabase/view_bug.do;?bug_id=6893472 but as I said, the export/import does not correct the issue. Any suggestions on how I can remove the "FAULTED" spares from the pool? Can I use the GUID with zpool remove somehow? -- Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com
Cindy Swearingen
2010-Jul-08 22:25 UTC
[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Hi Ryan, What events lead up to this situation? I''ve seen a similar problem when a system upgrade caused the controller numbers of the spares to change. In that case, the workaround was to export the pool, correct the spare device names, and import the pool. I''m not sure if this workaround applies to your case. Do you know if the spare device names changed? My hunch is that you could export this pool, reconnect the spare devices, and reimport the pool, but I''d rather test this on my own pool first and I can''t reproduce this problem. I don''t think you can remove the spares by their GUID. At least, I couldn''t. You said you tried to remove the spares with zpool remove. Did you try this command: # zpool remove idgsun02 c0t6d0 Or this command, which I don''t think would work, but you would get a message like this: # zpool remove idgsun02 c0t6d0s0 cannot remove c0t6d0s0: no such device in pool Thanks, Cindy On 07/08/10 14:55, Ryan Schwartz wrote:> I''ve got an x4500 with a zpool in a weird state. The two spares are listed twice each, once as AVAIL, and once as FAULTED. > > [IDGSUN02:/opt/src] root# zpool status > pool: idgsun02 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > idgsun02 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c6t5d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c7t5d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c4t5d0 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c0t0d0 ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c1t0d0 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c6t0d0 ONLINE 0 0 0 > c6t4d0 ONLINE 0 0 0 > c7t0d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > spares > c0t6d0 AVAIL > c5t5d0 AVAIL > c0t6d0 FAULTED corrupted data > c5t5d0 FAULTED corrupted data > > errors: No known data errors > > I''ve been working with Sun support, but wanted to toss it out to the community as well. I found and compiled the zpconfig util from here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSGuids and found that the spares in question have different GUIDs, but the same vdev path: > > spares[0] > type=''disk'' > guid=7826011125406290675 > path=''/dev/dsk/c0t6d0s0'' > devid=''id1,sd at SATA_____HITACHI_HUA7210S______GTF000PAHJMLXF/a'' > phys_path=''/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 6,0:a'' > whole_disk=1 > is_spare=1 > stats: > state=7 > aux=0 > ... > spares[1] > type=''disk'' > guid=870554111467930413 > path=''/dev/dsk/c5t5d0s0'' > devid=''id1,sd at SATA_____HITACHI_HUA7210S______GTF000PAHJ5NLF/a'' > phys_path=''/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 5,0:a'' > whole_disk=1 > is_spare=1 > stats: > state=7 > aux=0 > ... > spares[2] > type=''disk'' > guid=5486341412008712208 > path=''/dev/dsk/c0t6d0s0'' > devid=''id1,sd at SATA_____HITACHI_HUA7210S______GTF000PAHJMLXF/a'' > phys_path=''/pci at 0,0/pci1022,7458 at 1/pci11ab,11ab at 1/disk at 6,0:a'' > whole_disk=1 > stats: > state=4 > aux=2 > ... > spares[3] > type=''disk'' > guid=16971039974506843020 > path=''/dev/dsk/c5t5d0s0'' > devid=''id1,sd at SATA_____HITACHI_HUA7210S______GTF000PAHJ5NLF/a'' > phys_path=''/pci at 1,0/pci1022,7458 at 4/pci11ab,11ab at 1/disk at 5,0:a'' > whole_disk=1 > stats: > state=4 > aux=2 > ... > > I''ve exported/imported the pool and the spares are still listed as above.The regular ''zpool remove idgsun02 c0t6d0s0'' (and c5t5d0s0) also do not work, but do not produce any error output either. This sounds remarkably like http://bugs.opensolaris.org/bugdatabase/view_bug.do;?bug_id=6893472 but as I said, the export/import does not correct the issue. Any suggestions on how I can remove the "FAULTED" spares from the pool? Can I use the GUID with zpool remove somehow?
Ryan Schwartz
2010-Jul-09 16:38 UTC
[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Hi Cindy, Not sure exactly when the drives went into this state, but it is likely that it happened when I added a second pool, added the same spares to the second pool, then later destroyed the second pool. There have been no controller or any other hardware changes to this system - it is all original parts. The device names are valid, the issue is that they are listed twice - once for a spare which is AVAIL and another time for the spare which is FAULTED. I''ve tried zpool remove, zpool offline, zpool clear, zpool export/import, I''ve unconfigured the drives via cfgadm and tried a remove, nothing works to remove the FAULTED spares. I was just able remove the AVAIL spares, but only since they were listed first in the spares list: [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 [IDGSUN02:/dev/dsk] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM idgsun02 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0 FAULTED corrupted data c5t5d0 FAULTED corrupted data errors: No known data errors What''s interesting is that running the zpool remove commands a second time has no effect (presumably because zpool is using GUID internally). I may have, at one point, tried to re-add the drive again after seeing the state FAULTED and not being able to remove it, which is probably where the second set of entries came from. (Pretty much exactly what''s described here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares). What I really need is to be able to remove the two bogus faulted spares, and I think the only way I''ll be able to do that is via the GUIDs, since the (valid) vdev path is shown as the same for each. I would guess zpool is attempting to remove the device I''ve got a support case open, but no traction on that as of yet. -- Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com On Jul 8, 2010, at 5:25 PM, Cindy Swearingen wrote:> Hi Ryan, > > What events lead up to this situation? I''ve seen a similar problem when a system upgrade caused the controller numbers of the spares to change. In that case, the workaround was to export the pool, correct the spare device names, and import the pool. I''m not sure if this workaround applies to your case. Do you know if the spare device names changed? > > My hunch is that you could export this pool, reconnect the spare > devices, and reimport the pool, but I''d rather test this on my own pool first and I can''t reproduce this problem. > > I don''t think you can remove the spares by their GUID. At least, > I couldn''t. > > You said you tried to remove the spares with zpool remove. > > Did you try this command: > > # zpool remove idgsun02 c0t6d0 > > Or this command, which I don''t think would work, but you would > get a message like this: > > # zpool remove idgsun02 c0t6d0s0 > cannot remove c0t6d0s0: no such device in pool > > Thanks, > > Cindy
Cindy Swearingen
2010-Jul-09 18:00 UTC
[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Hi Ryan, Which Solaris release is this? Thanks, Cindy On 07/09/10 10:38, Ryan Schwartz wrote:> Hi Cindy, > > Not sure exactly when the drives went into this state, but it is likely that it happened when I added a second pool, added the same spares to the second pool, then later destroyed the second pool. There have been no controller or any other hardware changes to this system - it is all original parts. The device names are valid, the issue is that they are listed twice - once for a spare which is AVAIL and another time for the spare which is FAULTED. > > I''ve tried zpool remove, zpool offline, zpool clear, zpool export/import, I''ve unconfigured the drives via cfgadm and tried a remove, nothing works to remove the FAULTED spares. > > I was just able remove the AVAIL spares, but only since they were listed first in the spares list: > > [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 > [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 > [IDGSUN02:/dev/dsk] root# zpool status > pool: idgsun02 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > idgsun02 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c6t5d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c7t5d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c4t5d0 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c0t0d0 ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c1t0d0 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c6t0d0 ONLINE 0 0 0 > c6t4d0 ONLINE 0 0 0 > c7t0d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > spares > c0t6d0 FAULTED corrupted data > c5t5d0 FAULTED corrupted data > > errors: No known data errors > > What''s interesting is that running the zpool remove commands a second time has no effect (presumably because zpool is using GUID internally). > > I may have, at one point, tried to re-add the drive again after seeing the state FAULTED and not being able to remove it, which is probably where the second set of entries came from. (Pretty much exactly what''s described here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares). > > What I really need is to be able to remove the two bogus faulted spares, and I think the only way I''ll be able to do that is via the GUIDs, since the (valid) vdev path is shown as the same for each. I would guess zpool is attempting to remove the device I''ve got a support case open, but no traction on that as of yet.
Ryan Schwartz
2010-Jul-09 18:06 UTC
[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Ok, so after removing the spares marked as AVAIL and re-adding them again, I put myself back in the "you''re effed, dude" boat. What I should have done at that point is a zpool export/import at that point which would have resolved it. So what I did was recreate the steps that got me into the state where the AVAIL spares were listed first, rather than the FAULTED ones (which allowed me to remove them as demonstrated in my previous email). I created another pool sharing the same spares, removed the spares then destroyed it, then exported and imported the main pool again. Once that operation completed, I was then able to remove the spares again, export/import the pool, and the problem is now resolved. zpool create cleanup c5t3d0 c4t3d0 spare c0t6d0 c5t5d0 zpool remove cleanup c0t6d0 c5t5d0 zpool destroy cleanup zpool export idgsun02 zpool import idgsun02 zpool remove idgsun02 c0t6d0 zpool remove idgsun02 c5t5d0 zpool export idgsun02 zpool import idgsun02 And the resultant zpool status is this: [IDGSUN02:/] root# zpool status pool: idgsun02 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM idgsun02 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 spares c0t6d0 AVAIL c5t5d0 AVAIL errors: No known data errors Hopefully this might help someone in the future if they get into this situation. -- Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com On Jul 9, 2010, at 11:38 AM, Ryan Schwartz wrote:> Hi Cindy, > > Not sure exactly when the drives went into this state, but it is likely that it happened when I added a second pool, added the same spares to the second pool, then later destroyed the second pool. There have been no controller or any other hardware changes to this system - it is all original parts. The device names are valid, the issue is that they are listed twice - once for a spare which is AVAIL and another time for the spare which is FAULTED. > > I''ve tried zpool remove, zpool offline, zpool clear, zpool export/import, I''ve unconfigured the drives via cfgadm and tried a remove, nothing works to remove the FAULTED spares. > > I was just able remove the AVAIL spares, but only since they were listed first in the spares list: > > [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 > [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 > [IDGSUN02:/dev/dsk] root# zpool status > pool: idgsun02 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > idgsun02 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c0t5d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c6t5d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c7t5d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c4t5d0 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > c0t0d0 ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c1t0d0 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c6t0d0 ONLINE 0 0 0 > c6t4d0 ONLINE 0 0 0 > c7t0d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > c4t0d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > spares > c0t6d0 FAULTED corrupted data > c5t5d0 FAULTED corrupted data > > errors: No known data errors > > What''s interesting is that running the zpool remove commands a second time has no effect (presumably because zpool is using GUID internally). > > I may have, at one point, tried to re-add the drive again after seeing the state FAULTED and not being able to remove it, which is probably where the second set of entries came from. (Pretty much exactly what''s described here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares). > > What I really need is to be able to remove the two bogus faulted spares, and I think the only way I''ll be able to do that is via the GUIDs, since the (valid) vdev path is shown as the same for each. I would guess zpool is attempting to remove the device I''ve got a support case open, but no traction on that as of yet. > -- > Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company > Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com > > On Jul 8, 2010, at 5:25 PM, Cindy Swearingen wrote: > >> Hi Ryan, >> >> What events lead up to this situation? I''ve seen a similar problem when a system upgrade caused the controller numbers of the spares to change. In that case, the workaround was to export the pool, correct the spare device names, and import the pool. I''m not sure if this workaround applies to your case. Do you know if the spare device names changed? >> >> My hunch is that you could export this pool, reconnect the spare >> devices, and reimport the pool, but I''d rather test this on my own pool first and I can''t reproduce this problem. >> >> I don''t think you can remove the spares by their GUID. At least, >> I couldn''t. >> >> You said you tried to remove the spares with zpool remove. >> >> Did you try this command: >> >> # zpool remove idgsun02 c0t6d0 >> >> Or this command, which I don''t think would work, but you would >> get a message like this: >> >> # zpool remove idgsun02 c0t6d0s0 >> cannot remove c0t6d0s0: no such device in pool >> >> Thanks, >> >> Cindy
Ryan Schwartz
2010-Jul-09 18:08 UTC
[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
Cindy, [IDGSUN02:/] root# cat /etc/release Solaris 10 10/08 s10x_u6wos_07b X86 Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 27 October 2008 But as noted in my recent email, I''ve resolved this with an export/import with only 2 of the 4 spares listed (they were listed as FAULTED, but the export/import fixed that right up). -- Ryan Schwartz, UNIX Systems Administrator, VitalSource Technologies, Inc. - An Ingram Digital Company Mob: (608) 886-3513 ? ryan.schwartz at ingramdigital.com On Jul 9, 2010, at 1:00 PM, Cindy Swearingen wrote:> Hi Ryan, > > Which Solaris release is this? > > Thanks, > > Cindy > > On 07/09/10 10:38, Ryan Schwartz wrote: >> Hi Cindy, >> Not sure exactly when the drives went into this state, but it is likely that it happened when I added a second pool, added the same spares to the second pool, then later destroyed the second pool. There have been no controller or any other hardware changes to this system - it is all original parts. The device names are valid, the issue is that they are listed twice - once for a spare which is AVAIL and another time for the spare which is FAULTED. >> I''ve tried zpool remove, zpool offline, zpool clear, zpool export/import, I''ve unconfigured the drives via cfgadm and tried a remove, nothing works to remove the FAULTED spares. >> I was just able remove the AVAIL spares, but only since they were listed first in the spares list: >> [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c0t6d0 [IDGSUN02:/dev/dsk] root# zpool remove idgsun02 c5t5d0 >> [IDGSUN02:/dev/dsk] root# zpool status >> pool: idgsun02 >> state: ONLINE >> scrub: none requested >> config: >> NAME STATE READ WRITE CKSUM >> idgsun02 ONLINE 0 0 0 >> raidz2 ONLINE 0 0 0 >> c0t1d0 ONLINE 0 0 0 >> c0t5d0 ONLINE 0 0 0 >> c1t1d0 ONLINE 0 0 0 >> c1t5d0 ONLINE 0 0 0 >> c6t1d0 ONLINE 0 0 0 >> c6t5d0 ONLINE 0 0 0 >> c7t1d0 ONLINE 0 0 0 >> c7t5d0 ONLINE 0 0 0 >> c4t1d0 ONLINE 0 0 0 >> c4t5d0 ONLINE 0 0 0 >> raidz2 ONLINE 0 0 0 >> c0t0d0 ONLINE 0 0 0 >> c0t4d0 ONLINE 0 0 0 >> c1t0d0 ONLINE 0 0 0 >> c1t4d0 ONLINE 0 0 0 >> c6t0d0 ONLINE 0 0 0 >> c6t4d0 ONLINE 0 0 0 >> c7t0d0 ONLINE 0 0 0 >> c7t4d0 ONLINE 0 0 0 >> c4t0d0 ONLINE 0 0 0 >> c4t4d0 ONLINE 0 0 0 >> spares >> c0t6d0 FAULTED corrupted data >> c5t5d0 FAULTED corrupted data >> errors: No known data errors >> What''s interesting is that running the zpool remove commands a second time has no effect (presumably because zpool is using GUID internally). >> I may have, at one point, tried to re-add the drive again after seeing the state FAULTED and not being able to remove it, which is probably where the second set of entries came from. (Pretty much exactly what''s described here: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFaultedSpares). >> What I really need is to be able to remove the two bogus faulted spares, and I think the only way I''ll be able to do that is via the GUIDs, since the (valid) vdev path is shown as the same for each. I would guess zpool is attempting to remove the device I''ve got a support case open, but no traction on that as of yet.
Cindy Swearingen
2010-Jul-09 18:46 UTC
[zfs-discuss] zpool spares listed twice, as both AVAIL and FAULTED
I was going to suggest the export/import step next. :-) I''m glad you were able to resolve it. We are working on making spare behavior more robust. In the meantime, my advice is keep life simple and do not share spares, logs, caches, or even disks between pools. Thanks, Cindy On 07/09/10 12:08, Ryan Schwartz wrote:> Cindy, > > [IDGSUN02:/] root# cat /etc/release > Solaris 10 10/08 s10x_u6wos_07b X86 > Copyright 2008 Sun Microsystems, Inc. All Rights Reserved. > Use is subject to license terms. > Assembled 27 October 2008 > > But as noted in my recent email, I''ve resolved this with an export/import with only 2 of the 4 spares listed (they were listed as FAULTED, but the export/import fixed that right up).