thr3ads.net - zfs discuss - [zfs-discuss] zfs pool in degraded state, zpool offline fails with no valid replicas [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Eric Hill

2007-Jan-05 18:14 UTC

[zfs-discuss] zfs pool in degraded state, zpool offline fails with no valid replicas

I have a pool of 48 500GB disks across four SCSI channels (12 per channel).  One
of the disks failed, and was replaced.  The pool is now in a degraded state, but
I can''t seem to get the pool to be happy with the replacement.  I did a
resilver and the pool is error free with the exception of this dead target.

vault:/#zpool status
  pool: pool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Fri Jan  5 11:58:31 2007
config:

        NAME         STATE     READ WRITE CKSUM
        pool         DEGRADED     0     0     0
          raidz2     ONLINE       0     0     0
            c3t0d0   ONLINE       0     0     0
            c3t4d0   ONLINE       0     0     0
            c3t8d0   ONLINE       0     0     0
            c4t0d0   ONLINE       0     0     0
            c4t4d0   ONLINE       0     0     0
            c4t8d0   ONLINE       0     0     0
            c5t0d0   ONLINE       0     0     0
            c5t4d0   ONLINE       0     0     0
            c5t8d0   ONLINE       0     0     0
            c6t0d0   ONLINE       0     0     0
            c6t4d0   ONLINE       0     0     0
            c6t8d0   ONLINE       0     0     0
          raidz2     ONLINE       0     0     0
            c3t1d0   ONLINE       0     0     0
            c3t5d0   ONLINE       0     0     0
            c3t9d0   ONLINE       0     0     0
            c4t1d0   ONLINE       0     0     0
            c4t5d0   ONLINE       0     0     0
            c4t9d0   ONLINE       0     0     0
            c5t1d0   ONLINE       0     0     0
            c5t5d0   ONLINE       0     0     0
            c5t9d0   ONLINE       0     0     0
            c6t1d0   ONLINE       0     0     0
            c6t5d0   ONLINE       0     0     0
            c6t9d0   ONLINE       0     0     0
          raidz2     DEGRADED     0     0     0
            c3t2d0   ONLINE       0     0     0
            c3t6d0   ONLINE       0     0     0
            c3t10d0  ONLINE       0     0     0
            c4t2d0   ONLINE       0     0     0
            c4t6d0   ONLINE       0     0     0
            c4t10d0  ONLINE       0     0     0
            c5t2d0   ONLINE       0     0     0
            c5t6d0   UNAVAIL      0     0     0  cannot open
            c5t10d0  ONLINE       0     0     0
            c6t2d0   ONLINE       0     0     0
            c6t6d0   ONLINE       0     0     0
            c6t10d0  ONLINE       0     0     0
          raidz2     ONLINE       0     0     0
            c3t3d0   ONLINE       0     0     0
            c3t7d0   ONLINE       0     0     0
            c3t11d0  ONLINE       0     0     0
            c4t3d0   ONLINE       0     0     0
            c4t7d0   ONLINE       0     0     0
            c4t11d0  ONLINE       0     0     0
            c5t3d0   ONLINE       0     0     0
            c5t7d0   ONLINE       0     0     0
            c5t11d0  ONLINE       0     0     0
            c6t3d0   ONLINE       0     0     0
            c6t7d0   ONLINE       0     0     0
            c6t11d0  ONLINE       0     0     0

errors: No known data errors
vault:/#zpool offline pool c5t6d0
cannot offline c5t6d0: no valid replicas
vault:/#

So WTF?  This is a RAIDZ2 pool that is functional, and I can''t offline
a disk?  FWIW, I can see the disk:

vault:/#ls -l /dev/dsk/c5t6d0
lrwxrwxrwx   1 root     root          74 Dec 29 20:39 /dev/dsk/c5t6d0 ->
../../devices/pci at 0,0/pci10de,5d at e/pci10b5,8114 at 0/pci1000,10b0 at 8/sd
at 6,0:wd
vault:/#

And furthermore I can see it with the format command:

vault:/#format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
...
      30. c5t6d0 <IFT-A24U-G2421-1-347G-465.51GB>
          /pci at 0,0/pci10de,5d at e/pci10b5,8114 at 0/pci1000,10b0 at 8/sd at
6,0
...
vault:/#

So why in the world does zfs think that the disk isn''t available and
why can''t I re-enable it?
 
 
This message posted from opensolaris.org

Bill Moore

2007-Jan-05 18:44 UTC

head link

[zfs-discuss] zfs pool in degraded state, zpool offline fails with no valid replicas

On Fri, Jan 05, 2007 at 10:14:21AM -0800, Eric Hill
wrote:> I have a pool of 48 500GB disks across four SCSI channels (12 per
> channel).  One of the disks failed, and was replaced.  The pool is now
> in a degraded state, but I can''t seem to get the pool to be happy
with
> the replacement.  I did a resilver and the pool is error free with the
> exception of this dead target.
> 
> ...
> 
> So why in the world does zfs think that the disk isn''t available
and
> why can''t I re-enable it?
ZFS can see the disk, but it''s a different disk than the one that used
to be there.  Have you tried:

    zpool replace pool c5t6d0

The difference between replace and online is that you can only online a
disk that was previously taken offline (that is, it has to be the same
exact disk with the same exact data on it).  If it is a different disk,
then you have to do "zpool replace" to tell ZFS you wish reconstruct
the
contents of the old disk onto the new one and forget the old one ever
existed.

--Bill

Eric Hill

2007-Jan-05 19:36 UTC

head link

[zfs-discuss] Re: zfs pool in degraded state,

Hi Bill,

vault:/#zpool replace pool c5t6d0
invalid vdev specification
use ''-f'' to override the following errors:
/dev/dsk/c5t6d0s0 is part of active ZFS pool pool. Please see zpool(1M).
vault:/#zpool replace -f pool c5t6d0
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c5t6d0s0 is part of active ZFS pool pool. Please see zpool(1M).
vault:/#
 
 
This message posted from opensolaris.org

Eric Hill

2007-Jan-05 20:13 UTC

head link

[zfs-discuss] Re: zfs pool in degraded state,

And to add more fuel to the fire, an fmdump -eV shows the following:

Jan 05 2007 11:30:38.030057310 ereport.fs.zfs.vdev.open_failed
nvlist version: 0
        class = ereport.fs.zfs.vdev.open_failed
        ena = 0x88c01b571200801
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x66dd422b2d14d75b
                vdev = 0x1750a5751459ad65
        (end detector)

        pool = pool
        pool_guid = 0x66dd422b2d14d75b
        pool_context = 0
        vdev_guid = 0x1750a5751459ad65
        vdev_type = disk
        vdev_path = /dev/dsk/c5t6d0s0
        vdev_devid = id1,sd at n600d0230006ecdd50ec7ef5e4aac4e00/a
        parent_guid = 0x33b0223eb6c89eac
        parent_type = raidz
        prev_state = 0x1
        __ttl = 0x1
        __tod = 0x459e8b3e 0x1caa35e

Based on this, the drive has a pool signature on it already, and I did test the
replacement disk in another machine... crap...

Based on the size of the drive and number of blocks (0x3a3037ff) and block size
(0x200) I calculated that the end of the drive less a couple of megabytes is at
offset 959854591.  Then I ran the following command:

vault:/#dd if=/dev/zero of=/dev/dsk/c5t6d0 bs=512 count=32000 oseek=959854591
32000+0 records in
32000+0 records out

That wiped the last couple of megabytes on the disk (where the vdev label is
stored).  I also ran the same command without the oseek to clear the front
couple of megabytes.

Then I did a "zpool status" to re-list the pool.  It still showed the
drive as unavailable.  I ran "zpool replace pool c5t6d0" and this time
it said "cannot replace c5t6d0 with c5t6d0: c5t6d0 is busy".

Grumble.
 
 
This message posted from opensolaris.org

Eric Hill

2007-Jan-05 21:13 UTC

head link

[zfs-discuss] Re: zfs pool in degraded state,

Ok, now I''m getting somewhere.

vault:/#dd if=/dev/zero of=/dev/dsk/c5t6d0 bs=512 count=64000
64000+0 records in
64000+0 records out
vault:/#dd if=/dev/zero of=/dev/dsk/c5t6d0 bs=512 count=64000 oseek=976174591
64000+0 records in
64000+0 records out
vault:/#zpool replace pool c5t6d0
vault:/#

Looks like I didn''t do the steps in the right order before.  Zeroing
out the first and last 32MB and immediately running a zpool replace worked.

vault:/#zpool status
  pool: pool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 54.48% done, 0h2m to go
config:

        NAME              STATE     READ WRITE CKSUM
        pool              DEGRADED     0     0     0
          raidz2          ONLINE       0     0     0
...
            c4t10d0       ONLINE       0     0     0
            c5t2d0        ONLINE       0     0     0
            replacing     DEGRADED     0     0     0
              c5t6d0s0/o  UNAVAIL      0     0     0  cannot open
              c5t6d0      ONLINE       0     0     0
            c5t10d0       ONLINE       0     0     0
...

[2 minutes later]

vault:/#zpool status
  pool: pool
 state: ONLINE
 scrub: resilver completed with 0 errors on Fri Jan  5 15:11:20 2007

Woohoo!

For the record, I got the block size from format <verify>:

format> verify

Volume name = <        >
ascii name  = <IFT-A24U-G2421-1-347G-465.51GB>
bytes/sector    =  512
sectors = 976238590
accessible sectors = 976238557
 
 
This message posted from opensolaris.org

zfs discuss - Jan 2007 - zfs pool in degraded state, zpool offline fails with no valid replicas

[zfs-discuss] zfs pool in degraded state, zpool offline fails with no valid replicas

[zfs-discuss] zfs pool in degraded state, zpool offline fails with no valid replicas

[zfs-discuss] Re: zfs pool in degraded state,

[zfs-discuss] Re: zfs pool in degraded state,

[zfs-discuss] Re: zfs pool in degraded state,