thr3ads.net - zfs discuss - [zfs-discuss] replacing a device with itself doesn''t work [Oct 2007]

If this information is useful, please help other people find it:
Share via:

2007-Oct-03 15:35 UTC

[zfs-discuss] replacing a device with itself doesn''t work

Hi,
I hope someone can help cos ATM zfs'' logic seems a little askew.
I just swapped a failing 200gb drive that was one half of a 400gb gstripe device
which I was using as one of the devices in a 3 device raidz1. When the OS came
back up after the drive had been changed, the necessary metadata was of course
not on the new drive so the stripe didn''t exist. Zfs understandably
complained it couldn''t open the stripe, however it did not show the
array as degraded. I didn''t save the output, but it was just like
described in this thread:

http://www.nabble.com/Shooting-yourself-in-the-foot-with-ZFS:-is-quite-easy-t4512790.html

I recreated the gstripe device under the same name stripe/str1 and assumed I
could just:

# zpool replace pool stripe/str1
invalid vdev specification
stripe/str1 is in use (r1w1e1)

It also told me to try -f, which I did, but was greeted with the same error.
Why can I not replace a device with itself?
As the man page describes just this procedure I''m a little confused.
Try as I might (online, offline, scrub) I could not get the array to rebuild,
just like was the guy described in that thread above. I eventually resorted to
recreating the stripe with a different name stripe/str2. I could then perform a:

# zpool replace pool stripe/str1 stripe/str2

Is there a reason I have to jump through these seemingly pointless hoops to
replace a device with itself?
Many thanks.
 
 
This message posted from opensolaris.org

Richard Elling

2007-Oct-03 17:36 UTC

head link

[zfs-discuss] replacing a device with itself doesn''t work

MP wrote:> Hi,
> I hope someone can help cos ATM zfs'' logic seems a little askew.
> I just swapped a failing 200gb drive that was one half of a 400gb gstripe
device which I was using as one of the devices in a 3 device raidz1. When the OS
came back up after the drive had been changed, the necessary metadata was of
course not on the new drive so the stripe didn''t exist. Zfs
understandably complained it couldn''t open the stripe, however it did
not show the array as degraded. I didn''t save the output, but it was
just like described in this thread:
> 
>
http://www.nabble.com/Shooting-yourself-in-the-foot-with-ZFS:-is-quite-easy-t4512790.html
> 
> I recreated the gstripe device under the same name stripe/str1 and assumed
I could just:
> 
> # zpool replace pool stripe/str1
> invalid vdev specification
> stripe/str1 is in use (r1w1e1)
> 
> It also told me to try -f, which I did, but was greeted with the same
error.
> Why can I not replace a device with itself?
> As the man page describes just this procedure I''m a little
confused.
> Try as I might (online, offline, scrub) I could not get the array to
rebuild, just like was the guy described in that thread above. I eventually
resorted to recreating the stripe with a different name stripe/str2. I could
then perform a:
> 
> # zpool replace pool stripe/str1 stripe/str2
> 
> Is there a reason I have to jump through these seemingly pointless hoops to
replace a device with itself?
> Many thanks.
Yes.  From the fine manual on zpool:
      zpool replace [-f] pool old_device [new_device]

          Replaces old_device with new_device. This is  equivalent
          to attaching new_device, waiting for it to resilver, and
          then detaching old_device.
...
          If  new_device  is  not  specified,   it   defaults   to
          old_device.  This form of replacement is useful after an
          existing  disk  has  failed  and  has  been   physically
          replaced.  In  this case, the new disk may have the same
          /dev/dsk path as the old device, even though it is actu-
          ally a different disk. ZFS recognizes this.

For a stripe, you don''t have redundancy, so you cannot replace the
disk with itself.  You would have to specify the [new_device]
I''ve submitted CR6612596 for a better error message and CR6612605
to mention this in the man page.
  -- richard

Richard Elling

2007-Oct-03 19:10 UTC

head link

[zfs-discuss] replacing a device with itself doesn''t work

more below...

MP wrote:> On 03/10/2007, *Richard Elling* <Richard.Elling at sun.com 
> <mailto:Richard.Elling at sun.com>> wrote:
> 
>     Yes.  From the fine manual on zpool:
>           zpool replace [-f] pool old_device [new_device]
> 
>               Replaces old_device with new_device. This is  equivalent
>               to attaching new_device, waiting for it to resilver, and
>               then detaching old_device.
>     ...
>               If  new_device  is  not  specified,   it   defaults   to
>               old_device.  This form of replacement is useful after an
>               existing  disk  has  failed  and  has  been   physically
>               replaced.  In  this case, the new disk may have the same
>               /dev/dsk path as the old device, even though it is actu-
>               ally a different disk. ZFS recognizes this.
> 
>     For a stripe, you don''t have redundancy, so you cannot replace
the
>     disk with itself. 
> 
> 
> I don''t see how a stripe makes a difference. It''s just 2
drives joined together logically to make a
> new device. It can be used by the system just like a normal hard drive. 
Just like a normal hard
> drive it too has no redundancy?
Correct.  It would be redundant if it were a mirror, raidz, or raidz2.  In the
case of stripes of mirrors, raidz, or raidz2 vdevs, they are redundant.
>     You would have to specify the [new_device]
>     I''ve submitted CR6612596 for a better error message and
CR6612605
>     to mention this in the man page.
> 
> 
> Perhaps I was a little unclear. Zfs did a few things during this whole 
> escapade which seemed wrong.
> 
> # mdconfig -a -tswap -s64m
> md0
> # mdconfig -a -tswap -s64m
> md1
> # mdconfig -a -tswap -s64m
> md2
I presume you''re not running Solaris, so please excuse me if I take a
Solaris view to this problem.
> # zpool create tank raidz md0 md1 md2
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             md0     ONLINE       0     0     0
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> # zpool offline tank md0
> Bringing device md0 offline
> # dd if=/dev/zero of=/dev/md0 bs=1m
> dd: /dev/md0: end of device
> 65+0 records in
> 64+0 records out
> 67108864 bytes transferred in 0.044925 secs (1493798602 bytes/sec)
> # zpool status -v tank
>   pool: tank
>  state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
>         Sufficient replicas exist for the pool to continue functioning in a
>         degraded state.
> action: Online the device using ''zpool online'' or replace
the device with
>         ''zpool replace''.
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        DEGRADED     0     0     0
>           raidz1    DEGRADED     0     0     0
>             md0     OFFLINE      0     0     0
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> 
> --------------------
> At this point where the drive is offline a ''zpool replace tank
md0'' will
> fix the array.
Correct.  The pool is redundant.
> However, if instead the other advice given; ''zpool online tank
md0'' is
> used then problems start to occur:
> --------------------
> 
> # zpool online tank md0
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
> status: One or more devices could not be used because the label is 
> missing or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using ''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: resilver completed with 0 errors on Wed Oct  3 18:44:22 2007
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             md0     UNAVAIL      0     0     0  corrupted data
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> 
> -------------
> ^^^^^^^
> Surely this is wrong? Zpool shows the pool as ''ONLINE'' 
and not
> degraded. Whereas the status explanation
> says that it is degraded and ''zpool replace'' is required.
That''s just
> confusing.
I agree, I would expect the STATE to be DEGRADED.
> -------------
> 
> # zpool scrub tank
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
> status: One or more devices could not be used because the label is 
> missing or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using ''zpool replace''.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: resilver completed with 0 errors on Wed Oct  3 18:45:06 2007
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             md0     UNAVAIL      0     0     0  corrupted data
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> # zpool replace tank md0
> invalid vdev specification
> use ''-f'' to override the following errors:
> md0 is in use (r1w1e1)
> # zpool replace -f tank md0
> invalid vdev specification
> the following errors must be manually repaired:
> md0 is in use (r1w1e1)
> 
> -----------------
> Well the advice of ''zpool replace'' doesn''t work.
At this point the user
> is now stuck. There seems to
> be just no way to now use the existing device md0.
In Solaris NV b72, this works as you expect.
# zpool replace zwimming /dev/ramdisk/rd1
# zpool status -v zwimming
   pool: zwimming
  state: DEGRADED
  scrub: resilver completed with 0 errors on Wed Oct  3 11:55:36 2007
config:

         NAME                        STATE     READ WRITE CKSUM
         zwimming                    DEGRADED     0     0     0
           raidz1                    DEGRADED     0     0     0
             replacing               DEGRADED     0     0     0
               /dev/ramdisk/rd1/old  FAULTED      0     0     0  corrupted data
               /dev/ramdisk/rd1      ONLINE       0     0     0
             /dev/ramdisk/rd2        ONLINE       0     0     0
             /dev/ramdisk/rd3        ONLINE       0     0     0

errors: No known data errors
# zpool status -v zwimming
   pool: zwimming
  state: ONLINE
  scrub: resilver completed with 0 errors on Wed Oct  3 11:55:36 2007
config:

         NAME                  STATE     READ WRITE CKSUM
         zwimming              ONLINE       0     0     0
           raidz1              ONLINE       0     0     0
             /dev/ramdisk/rd1  ONLINE       0     0     0
             /dev/ramdisk/rd2  ONLINE       0     0     0
             /dev/ramdisk/rd3  ONLINE       0     0     0

errors: No known data errors

> -----------------
> # mdconfig -a -tswap -s64m
> md3
> # zpool replace -f tank md0 md3
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
>  scrub: resilver completed with 0 errors on Wed Oct  3 18:45:52 2007
> config:
> 
>         NAME           STATE     READ WRITE CKSUM
>         tank           ONLINE       0     0     0
>           raidz1       ONLINE       0     0     0
>             replacing  ONLINE       0     0     0
>               md0      UNAVAIL      0     0     0  corrupted data
>               md3      ONLINE       0     0     0
>             md1        ONLINE       0     0     0
>             md2        ONLINE       0     0     0
> 
> errors: No known data errors
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
>  scrub: resilver completed with 0 errors on Wed Oct  3 18:45:52 2007
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         tank        ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             md3     ONLINE       0     0     0
>             md1     ONLINE       0     0     0
>             md2     ONLINE       0     0     0
> 
> errors: No known data errors
> 
> --------------------
> 
> Only changing the device name of the failed component can get zfs to 
> rebuild the array. That seems
> wrong to me.
> 
> 1. Why does zpool status say ''ONLINE'' when the pool is
obviously degraded?
IMHO, bug.
> 2. Why is the 1st advice given ''zpool online'', which does
not work?
In Solaris I see:
# zpool online zwimming /dev/ramdisk/rd1
warning: device ''/dev/ramdisk/rd1'' onlined, but remains in
faulted state
use ''zpool replace'' to replace devices that are no longer
present
> 3. Why is the 2nd advice given ''zpool replace'', when that
doesn''t work
> after the 1st advice has been performed?
Works in Solaris.  Hopefully it is in the pipeline for *BSD.
> 4. Why do I have to use a device with a different name to get this to 
> work? Surely
>     what I did above mimics exactly what happens when a drive fails, and 
> the manual
>     says that ''zpool replace <pool>
<failed-device>'' will fix it?
In such cases, I would not try this while online, I would have offlined
it before attempting replace.  But I see your point, it is confusing.
But given that Solaris seems to handle this differently, I think it is
just a matter of your release catching up.
> 5. If zfs can access all the necessary devices in the pool, then why 
> doesn''t scrub fix the array?
You destroyed all of the data on the device, including the uberblocks.
AFAIK, scrub does not attempt to recreate uberblocks, which is why the
replace command exists.  I think you''ve identified a user interface
problem that can be corrected more automatically.  What do others think?
Should a scrub perform a replace if the uberblocks are nonexistent?
  -- richard

Pawel Jakub Dawidek

2007-Oct-03 20:02 UTC

head link

[zfs-discuss] replacing a device with itself doesn''t work

On Wed, Oct 03, 2007 at 12:10:19PM -0700, Richard Elling
wrote:> > -------------
> > 
> > # zpool scrub tank
> > # zpool status -v tank
> >   pool: tank
> >  state: ONLINE
> > status: One or more devices could not be used because the label is 
> > missing or
> >         invalid.  Sufficient replicas exist for the pool to continue
> >         functioning in a degraded state.
> > action: Replace the device using ''zpool replace''.
> >    see: http://www.sun.com/msg/ZFS-8000-4J
> >  scrub: resilver completed with 0 errors on Wed Oct  3 18:45:06 2007
> > config:
> > 
> >         NAME        STATE     READ WRITE CKSUM
> >         tank        ONLINE       0     0     0
> >           raidz1    ONLINE       0     0     0
> >             md0     UNAVAIL      0     0     0  corrupted data
> >             md1     ONLINE       0     0     0
> >             md2     ONLINE       0     0     0
> > 
> > errors: No known data errors
> > # zpool replace tank md0
> > invalid vdev specification
> > use ''-f'' to override the following errors:
> > md0 is in use (r1w1e1)
> > # zpool replace -f tank md0
> > invalid vdev specification
> > the following errors must be manually repaired:
> > md0 is in use (r1w1e1)
> > 
> > -----------------
> > Well the advice of ''zpool replace'' doesn''t
work. At this point the user
> > is now stuck. There seems to
> > be just no way to now use the existing device md0.
> 
> In Solaris NV b72, this works as you expect.
> # zpool replace zwimming /dev/ramdisk/rd1
> # zpool status -v zwimming
>    pool: zwimming
>   state: DEGRADED
>   scrub: resilver completed with 0 errors on Wed Oct  3 11:55:36 2007
> config:
> 
>          NAME                        STATE     READ WRITE CKSUM
>          zwimming                    DEGRADED     0     0     0
>            raidz1                    DEGRADED     0     0     0
>              replacing               DEGRADED     0     0     0
>                /dev/ramdisk/rd1/old  FAULTED      0     0     0  corrupted
data
>                /dev/ramdisk/rd1      ONLINE       0     0     0
>              /dev/ramdisk/rd2        ONLINE       0     0     0
>              /dev/ramdisk/rd3        ONLINE       0     0     0
> 
> errors: No known data errors
> # zpool status -v zwimming
>    pool: zwimming
>   state: ONLINE
>   scrub: resilver completed with 0 errors on Wed Oct  3 11:55:36 2007
> config:
> 
>          NAME                  STATE     READ WRITE CKSUM
>          zwimming              ONLINE       0     0     0
>            raidz1              ONLINE       0     0     0
>              /dev/ramdisk/rd1  ONLINE       0     0     0
>              /dev/ramdisk/rd2  ONLINE       0     0     0
>              /dev/ramdisk/rd3  ONLINE       0     0     0
> 
> errors: No known data errors
Good to know, but I think it''s still a bit of ZFS fault. The error
message ''md0 is in use (r1w1e1)'' means that something
(I''m quite sure
it''s ZFS) keeps device open. Why does it keeps it open when it
doesn''t
recognize it? Or maybe it tries to open it twice for write (exclusively)
when replacing, which is not allowed in GEOM in FreeBSD.

I can take a look if this is the former or the latter, but it should be
fixed in ZFS itself, IMHO.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071003/ddd273d8/attachment.bin>

2007-Oct-03 20:12 UTC

head link

[zfs-discuss] replacing a device with itself doesn''t work

I think I might have run into the same problem.  At the time I assumed I was
doing something wrong, but...

I made a b72 raidz out of three new 1gb virtual disks in vmware.  I shut the vm
off, replaced one of the disks with a new 1.5gb virtual disk.  No matter what
command I tried, I couldn''t get the new disk into the array.  The docs
said that replacing the vdev with itself would work, but it didn''t. 
Nor did setting the ''automatic replace'' feature on the pool
and plugging a new device in.  I recall most of the errors being "device in
use".

Maybe I wasn''t the problem after all?  0_o
 
 
This message posted from opensolaris.org

2007-Oct-06 12:38 UTC

head link

[zfs-discuss] replacing a device with itself doesn''t work

Pawel,
  Is this a problem with ZFS trying to open the device twice?

Richard,
  Yes a scrub should fix the device. One of zfs'' faetures is ease of
administration. It seems to defy logic that a scrub does not fix all devices, if
possible. Why make it any harder for the admin?

Cheers.
 
 
This message posted from opensolaris.org

Pawel Jakub Dawidek

2007-Oct-08 13:07 UTC

head link

[zfs-discuss] replacing a device with itself doesn''t work

On Wed, Oct 03, 2007 at 10:02:03PM +0200, Pawel Jakub Dawidek
wrote:> On Wed, Oct 03, 2007 at 12:10:19PM -0700, Richard Elling wrote:
> > > -------------
> > > 
> > > # zpool scrub tank
> > > # zpool status -v tank
> > >   pool: tank
> > >  state: ONLINE
> > > status: One or more devices could not be used because the label
is
> > > missing or
> > >         invalid.  Sufficient replicas exist for the pool to
continue
> > >         functioning in a degraded state.
> > > action: Replace the device using ''zpool
replace''.
> > >    see: http://www.sun.com/msg/ZFS-8000-4J
> > >  scrub: resilver completed with 0 errors on Wed Oct  3 18:45:06
2007
> > > config:
> > > 
> > >         NAME        STATE     READ WRITE CKSUM
> > >         tank        ONLINE       0     0     0
> > >           raidz1    ONLINE       0     0     0
> > >             md0     UNAVAIL      0     0     0  corrupted data
> > >             md1     ONLINE       0     0     0
> > >             md2     ONLINE       0     0     0
> > > 
> > > errors: No known data errors
> > > # zpool replace tank md0
> > > invalid vdev specification
> > > use ''-f'' to override the following errors:
> > > md0 is in use (r1w1e1)
> > > # zpool replace -f tank md0
> > > invalid vdev specification
> > > the following errors must be manually repaired:
> > > md0 is in use (r1w1e1)
> > > 
> > > -----------------
> > > Well the advice of ''zpool replace''
doesn''t work. At this point the user
> > > is now stuck. There seems to
> > > be just no way to now use the existing device md0.
> > 
> > In Solaris NV b72, this works as you expect.
> > # zpool replace zwimming /dev/ramdisk/rd1
> > # zpool status -v zwimming
> >    pool: zwimming
> >   state: DEGRADED
> >   scrub: resilver completed with 0 errors on Wed Oct  3 11:55:36 2007
> > config:
> > 
> >          NAME                        STATE     READ WRITE CKSUM
> >          zwimming                    DEGRADED     0     0     0
> >            raidz1                    DEGRADED     0     0     0
> >              replacing               DEGRADED     0     0     0
> >                /dev/ramdisk/rd1/old  FAULTED      0     0     0 
corrupted data
> >                /dev/ramdisk/rd1      ONLINE       0     0     0
> >              /dev/ramdisk/rd2        ONLINE       0     0     0
> >              /dev/ramdisk/rd3        ONLINE       0     0     0
> > 
> > errors: No known data errors
> > # zpool status -v zwimming
> >    pool: zwimming
> >   state: ONLINE
> >   scrub: resilver completed with 0 errors on Wed Oct  3 11:55:36 2007
> > config:
> > 
> >          NAME                  STATE     READ WRITE CKSUM
> >          zwimming              ONLINE       0     0     0
> >            raidz1              ONLINE       0     0     0
> >              /dev/ramdisk/rd1  ONLINE       0     0     0
> >              /dev/ramdisk/rd2  ONLINE       0     0     0
> >              /dev/ramdisk/rd3  ONLINE       0     0     0
> > 
> > errors: No known data errors
> 
> Good to know, but I think it''s still a bit of ZFS fault. The error
> message ''md0 is in use (r1w1e1)'' means that something
(I''m quite sure
> it''s ZFS) keeps device open. Why does it keeps it open when it
doesn''t
> recognize it? Or maybe it tries to open it twice for write (exclusively)
> when replacing, which is not allowed in GEOM in FreeBSD.
> 
> I can take a look if this is the former or the latter, but it should be
> fixed in ZFS itself, IMHO.
Ok, it seems that it was fixed in ZFS itself already:

	/*
	 * If we are setting the vdev state to anything but an open state, then
	 * always close the underlying device.  Otherwise, we keep accessible
	 * but invalid devices open forever.  We don''t call vdev_close()
itself,
	 * because that implies some extra checks (offline, etc) that we don''t
	 * want here.  This is limited to leaf devices, because otherwise
	 * closing the device will affect other children.
	 */
	if (vdev_is_dead(vd) && vd->vdev_ops->vdev_op_leaf)
		vd->vdev_ops->vdev_op_close(vd);

The ZFS version from FreeBSD-CURRENT doesn''t have this code yet,
it''s only in
my perforce branch for now. I''ll verify later today if it really fixes
the
problem and I''ll report back if not.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071008/40f00fdf/attachment.bin>

zfs discuss - Oct 2007 - replacing a device with itself doesn''t work

[zfs-discuss] replacing a device with itself doesn''t work

[zfs-discuss] replacing a device with itself doesn''t work

[zfs-discuss] replacing a device with itself doesn''t work

[zfs-discuss] replacing a device with itself doesn''t work

[zfs-discuss] replacing a device with itself doesn''t work

[zfs-discuss] replacing a device with itself doesn''t work

[zfs-discuss] replacing a device with itself doesn''t work