thr3ads.net - zfs discuss - [zfs-discuss] zfs replace problems please please help [Aug 2010]

If this information is useful, please help other people find it:
Share via:

seth keith

2010-Aug-10 18:50 UTC

[zfs-discuss] zfs replace problems please please help

first off I don''t have the exact failure messages here, and I did not
take good notes of the failures, so I will do the best I can. Please try and
give me advice anyway.

 I have a 7 drive raidz1 pool with 500G drives, and I wanted to replace them all
with 2TB drives. Immediately I ran into trouble. If I tired:

   zpool offline brick <device>

I got a message like: insufficient replicas 

I tried to 

    zpool replace brick <old device> <new device>

and I got something like: <new device> must be a single disk

I finally got replace and offline to work by:

    zpool export brick
    [reboot]
    zpool import brick

now
 
    zpool offline brick <old device>
    zpool replace brick <old device> <new device>

This worked. zpool status showed replacing in progress, and then after about 26
hours of resilvering, everything looked fine. The <old device> was gone,
and no errors in the pool. Now I tried to do it again with the next device. I
missed the "zpool offline" part however. Immediately, I started
getting disk errors on both the drive I was replacing and the first drive I
replaced. At this point I was starting to panic, so I shut down the machine to
make sure the drive cables were plugged in properly.

When the machine came back up, the replace that just started was done. There was
a resilver in progress but the replace was gone. Now I am getting disk errors on
two drives, and I have more than 200,000 data errors ( zpool status -v )

I have the two original drives, they are in good shape and should still have all
the data on them, can I somehow put my original zpool back. How? Please help!

Also, what do you think went wrong here?
-- 
This message posted from opensolaris.org

Mark J Musante

2010-Aug-10 19:26 UTC

head link

[zfs-discuss] zfs replace problems please please help

On Tue, 10 Aug 2010, seth keith wrote:
> first off I don''t have the exact failure messages here, and I did
not take good notes of the failures, so I will do the best I can. Please try and
give me advice anyway.
>
> I have a 7 drive raidz1 pool with 500G drives, and I wanted to replace them
all with 2TB drives. Immediately I ran into trouble. If I tired:
>
>   zpool offline brick <device>
Were you doing an in-place replace?  i.e. pulling out the old disk and 
putting in the new one?
> I got a message like: insufficient replicas
This means that there was a problem with the pool already.  When ZFS opens 
a pool, it looks at the disks that are part of that pool.  For raidz1, if 
more than one disk is unopenable, then the pool will report that there are 
"no valid replicas", which is probably the error message you saw.

If that''s the case, then your pool already had one failed drive in, and
you were attempting to disable a second drive.  Do you have a copy of the 
output from "zpool status brick" from before you tried your
experiment?
>
> I tried to
>
>    zpool replace brick <old device> <new device>
>
> and I got something like: <new device> must be a single disk
Unfortunately, this just means that we got back an EINVAL from the kernel, 
which could mean any one of a number of things, but probably there was an 
issue with calculating the drive size.  I''d try plugging it separately
and
using ''format'' to see how big solaris thinks the drive is.
>
> I finally got replace and offline to work by:
>
>    zpool export brick
>    [reboot]
>    zpool import brick
Probably didn''t need to reboot there.
> now
>
>    zpool offline brick <old device>
>    zpool replace brick <old device> <new device>
If you use this form for the replace command, you don''t need to offline
the old disk first.  You only need to offline a disk if you''re going to
pull it out.  And then you can do an in-place replace just by issuing 
"zpool replace brick <device-you-swapped>"
> This worked. zpool status showed replacing in progress, and then after 
> about 26 hours of resilvering, everything looked fine. The <old
device>
> was gone, and no errors in the pool. Now I tried to do it again with the 
> next device. I missed the "zpool offline" part however.
Immediately, I
> started getting disk errors on both the drive I was replacing and the 
> first drive I replaced.
Read errors?  Write errors?  Checksum errors?  Sounds like a full scrub 
would have been a good idea prior to replacing the second disk.
> I have the two original drives, they are in good shape and should still 
> have all the data on them, can I somehow put my original zpool back. 
> How? Please help!
You can try exporting the pool, plugging in the original drives, and then 
do a recovery on it.  See the zpool manpage under "zpool import" for
the
recovery options and what the flags mean.

seth keith

2010-Aug-11 03:45 UTC

head link

[zfs-discuss] zfs replace problems please please help

First off double thanks for replying to my post. I tried to your advice but
something is way wrong. I have all 2TB drives disconnected, and the 7 500GB
drives connected. All 7 show up in bios and in format. Here all the drives are
the original 7 500Mb drives:

   # format
    Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c3d0 <DEFAULT cyl 4859 alt 2 hd 255 sec 63>
          /pci at 0,0/pci8086,3a40 at 1c/pci-ide at 0/ide at 1/cmdk at 0,0
       1. c4d0 <Maxtor 7-H81AYZ5-0001-465.76GB>
          /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 0,0
       2. c4d1 <WDC WD50-  WD-WCAS8323204-0001-465.76GB>
          /pci at 0,0/pci-ide at 1f,2/ide at 0/cmdk at 1,0
       3. c6d0 <WDC WD50-  WD-WCAS8510568-0001-465.76GB>
          /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 0,0
       4. c6d1 <WDC WD50-  WD-WCAUF149175-0001-465.76GB>
          /pci at 0,0/pci-ide at 1f,2/ide at 1/cmdk at 1,0
       5. c7d0 <Maxtor 7-H81DM5X-0001-465.76GB>
          /pci at 0,0/pci-ide at 1f,5/ide at 0/cmdk at 0,0
       6. c12d0 <WDC WD50-  WD-WCAUH024469-0001-465.76GB>
          /pci at 0,0/pci8086,244e at 1e/pci-ide at 1/ide at 1/cmdk at 0,0
       7. c13d0 <WDC WD50-  WD-WCAS8415731-0001-465.76GB>
          /pci at 0,0/pci8086,244e at 1e/pci-ide at 1/ide at 0/cmdk at 0,0


Now clear out brick:

    # zpool export brick
    # zpool status
      pool: rpool
      state: ONLINE
      scrub: none requested
      config:

            NAME        STATE     READ WRITE CKSUM
            rpool       ONLINE       0     0     0
                c3d0s0  ONLINE       0     0     0

    errors: No known data errors


Then an error on the import:

    # zpool import -F brick
    cannot open ''brick'': I/O error

Now there is a pool but the drives are wrong:

# zpool status
  pool: brick
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        brick          UNAVAIL      0     0     0  insufficient replicas
          raidz1       UNAVAIL      0     0     0  insufficient replicas
            c13d0      ONLINE       0     0     0
            c4d0       ONLINE       0     0     0
            c7d0       ONLINE       0     0     0
            c4d1       ONLINE       0     0     0
            replacing  UNAVAIL      0     0     0  insufficient replicas
              c15t0d0  UNAVAIL      0     0     0  cannot open
              c11t0d0  UNAVAIL      0     0     0  cannot open
            c12d0      FAULTED      0     0     0  corrupted data
            c6d0       ONLINE       0     0     0



What I want is to remove c15t0d0 and c11t0d0 and replace with the original c6d1.
Suggestions?
-- 
This message posted from opensolaris.org

Mark J Musante

2010-Aug-11 12:03 UTC

head link

[zfs-discuss] zfs replace problems please please help

On Tue, 10 Aug 2010, seth keith wrote:
> # zpool status
>  pool: brick
> state: UNAVAIL
> status: One or more devices could not be used because the label is missing
>        or invalid.  There are insufficient replicas for the pool to
continue
>        functioning.
> action: Destroy and re-create the pool from a backup source.
>   see: http://www.sun.com/msg/ZFS-8000-5E
> scrub: none requested
> config:
>
>        NAME           STATE     READ WRITE CKSUM
>        brick          UNAVAIL      0     0     0  insufficient replicas
>          raidz1       UNAVAIL      0     0     0  insufficient replicas
>            c13d0      ONLINE       0     0     0
>            c4d0       ONLINE       0     0     0
>            c7d0       ONLINE       0     0     0
>            c4d1       ONLINE       0     0     0
>            replacing  UNAVAIL      0     0     0  insufficient replicas
>              c15t0d0  UNAVAIL      0     0     0  cannot open
>              c11t0d0  UNAVAIL      0     0     0  cannot open
>            c12d0      FAULTED      0     0     0  corrupted data
>            c6d0       ONLINE       0     0     0
>
> What I want is to remove c15t0d0 and c11t0d0 and replace with the original
c6d1. Suggestions?
Do the labels still exist on c6d1?  e.g. what do you get from "zdb -l 
/dev/rdsk/c6d1s0"?

If the label still exists, and the pool guid is the same as the labels on 
the other disks, you could try doing a "zpool detach brick c15t0d0"
(or
c11t0d0), then export & try re-importing.  ZFS may find c6d1 at that 
point.  There''s no way to guarantee that''ll work.

Seth Keith

2010-Aug-11 18:41 UTC

head link

[zfs-discuss] zfs replace problems please please help

> -----Original Message-----
> From: Mark J Musante [mailto:Mark.Musante at oracle.com]
> Sent: Wednesday, August 11, 2010 5:03 AM
> To: Seth Keith
> Cc: zfs-discuss at opensolaris.org
> Subject: Re: [zfs-discuss] zfs replace problems please please help
> 
> On Tue, 10 Aug 2010, seth keith wrote:
> 
> > # zpool status
> >  pool: brick
> > state: UNAVAIL
> > status: One or more devices could not be used because the label is
missing
> >        or invalid.  There are insufficient replicas for the pool to
continue
> >        functioning.
> > action: Destroy and re-create the pool from a backup source.
> >   see: http://www.sun.com/msg/ZFS-8000-5E
> > scrub: none requested
> > config:
> >
> >        NAME           STATE     READ WRITE CKSUM
> >        brick          UNAVAIL      0     0     0  insufficient
replicas
> >          raidz1       UNAVAIL      0     0     0  insufficient
replicas
> >            c13d0      ONLINE       0     0     0
> >            c4d0       ONLINE       0     0     0
> >            c7d0       ONLINE       0     0     0
> >            c4d1       ONLINE       0     0     0
> >            replacing  UNAVAIL      0     0     0  insufficient
replicas
> >              c15t0d0  UNAVAIL      0     0     0  cannot open
> >              c11t0d0  UNAVAIL      0     0     0  cannot open
> >            c12d0      FAULTED      0     0     0  corrupted data
> >            c6d0       ONLINE       0     0     0
> >
> > What I want is to remove c15t0d0 and c11t0d0 and replace with the
original c6d1.
> Suggestions?
> 
> Do the labels still exist on c6d1?  e.g. what do you get from "zdb -l
> /dev/rdsk/c6d1s0"?
> 
> If the label still exists, and the pool guid is the same as the labels on
> the other disks, you could try doing a "zpool detach brick
c15t0d0" (or
> c11t0d0), then export & try re-importing.  ZFS may find c6d1 at that
> point.  There''s no way to guarantee that''ll work.
When I do a zdb -l /dev/rdsk/<any device> I get the same output for all my
drives in the pool, but I don''t think it looks right:

# zdb -l /dev/rdsk/c4d0
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3


If I try this zpool deatch action,  can it be reversed if there is a problem?

Mark J Musante

2010-Aug-11 18:44 UTC

head link

[zfs-discuss] zfs replace problems please please help

On Wed, 11 Aug 2010, Seth Keith wrote:
>
> When I do a zdb -l /dev/rdsk/<any device> I get the same output for
all my drives in the pool, but I don''t think it looks right:
>
> # zdb -l /dev/rdsk/c4d0
What about /dev/rdsk/c4d0s0?

seth keith

2010-Aug-11 19:34 UTC

head link

[zfs-discuss] zfs replace problems please please help

this is for newbies like myself: I used using ''zdb -l'' wrong,
just using the drive name from ''zpool status'' or format which
is like c6d1, didn''t work. I needed to add s0 to the end:

    zdb -l /dev/dsk/c6d1s0

gives me a good looking label ( I think ). The pool_guid values are the same for
all the drives. I see the first 500GB drive I replaced has "children"
that are all 500GB drives. The second 500GB drive I replaced has 1 2TB child.
All the other drives have 2 2TB children.

I managed to detach one of the drives being replaced, but I count not detach the
other two 2TB drives. I exported and imported, now my pool looks like

  pool: brick
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        brick                     DEGRADED     0     0     0
          raidz1                  DEGRADED     0     0     0
            c13d0                 ONLINE       0     0     0
            c4d0                  ONLINE       0     0     0
            c7d0                  ONLINE       0     0     0
            c4d1                  ONLINE       0     0     0
            14607330800900413650  UNAVAIL      0     0     0  was
/dev/dsk/c15t0d0s0
            c11t1d0               ONLINE       0     0     0
            c6d0                  ONLINE       0     0     0

errors: 352808 data errors, use ''-v'' for a list

I there someway I can take the original zpool label from the first 500GB drive I
replaced and use it to fix up the other drives in the pool?  What are my options
here...
-- 
This message posted from opensolaris.org

Mark J Musante

2010-Aug-11 19:45 UTC

head link

[zfs-discuss] zfs replace problems please please help

On Wed, 11 Aug 2010, seth keith wrote:
>        NAME                      STATE     READ WRITE CKSUM
>        brick                     DEGRADED     0     0     0
>          raidz1                  DEGRADED     0     0     0
>            c13d0                 ONLINE       0     0     0
>            c4d0                  ONLINE       0     0     0
>            c7d0                  ONLINE       0     0     0
>            c4d1                  ONLINE       0     0     0
>            14607330800900413650  UNAVAIL      0     0     0  was
/dev/dsk/c15t0d0s0
>            c11t1d0               ONLINE       0     0     0
>            c6d0                  ONLINE       0     0     0
OK, that''s good - your missing disk can be replaced with a brand new
disk
using "zpool replace brick 14607330800900413650 <disk name>". 
Then wait
for the resilver to complete and do a full scrub to be on the safe side.
> errors: 352808 data errors, use ''-v'' for a list
>
> I there someway I can take the original zpool label from the first 500GB 
> drive I replaced and use it to fix up the other drives in the pool?
No.  The files with errors can only be restored from any backups you made. 
If there is an original disk that''s not part of your pool, you might
want
to try making a backup of it, plug it in, and see if a zpool export/zpool 
import will find it.  But it will only find it if zdb -l shows four valid 
labels.

zfs discuss - Aug 2010 - zfs replace problems please please help

[zfs-discuss] zfs replace problems please please help

[zfs-discuss] zfs replace problems please please help

[zfs-discuss] zfs replace problems please please help

[zfs-discuss] zfs replace problems please please help

[zfs-discuss] zfs replace problems please please help

[zfs-discuss] zfs replace problems please please help

[zfs-discuss] zfs replace problems please please help

[zfs-discuss] zfs replace problems please please help