thr3ads.net - freebsd stable - ZFS: drive replacement performance [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Mahlon E. Smith

2009-Jul-07 20:23 UTC

ZFS: drive replacement performance

I've got a 9 sata drive raidz1 array, started at version 6, upgraded to
version 13.  I had an apparent drive failure, and then at some point, a
kernel panic (unrelated to ZFS.)  The reboot caused the device numbers
to shuffle, so I did an 'export/import' to re-read the metadata and get
the array back up.

Once I swapped drives, I issued a 'zpool replace'.

That was 4 days ago now.  The progress in a 'zpool status' looks like
this, as of right now:

 scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go

... which is a little concerning, since a) it appears to have not moved
since I started it, and b) I'm in a DEGRADED state until it finishes...
if it finishes.

So, I reach out to the list!

 - Is the resilver progress notification in a known weird state under
   FreeBSD?

 - Anything I can do to kick this in the pants?  Tuning params?

 - This was my first drive failure under ZFS -- anything I should have
   done differently?  Such as NOT doing the export/import? (Not sure
   what else I could have done there.)


Some additional info is below.  Drives are at about 20% busy, according
to vmstat.  Seem to have bandwidth to spare.

This is a FreeBSD 7.2-STABLE system from the end of May -- 32 bit, 2G of
RAM.  I have the luxury of this being a test machine (for exactly stuff
like this), so I'm willing to try whatever without worrying about
production data or SLA.  :)

--
Mahlon E. Smith  
http://www.martini.nu/contact.html



-----------------------------------------------------------------------

% zfs list store
NAME    USED  AVAIL  REFER  MOUNTPOINT
store  1.22T  2.36T  32.0K  none

-----------------------------------------------------------------------

% cat /boot/loader.conf
vm.kmem_size_max="768M"
vm.kmem_size="768M"
vfs.zfs.arc_max="256M"

-----------------------------------------------------------------------

% zpool status store
  pool: store
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go
config:

        NAME                       STATE     READ WRITE CKSUM
        store                      DEGRADED     0     0     0
          raidz1                   DEGRADED     0     0     0
            da0                    ONLINE       0     0     0  274K resilvered
            da1                    ONLINE       0     0     0  282K resilvered
            replacing              DEGRADED     0     0     0
              2025342973333799752  UNAVAIL      3 4.11K     0  was /dev/da2
              da8                  ONLINE       0     0     0  418K resilvered
            da2                    ONLINE       0     0     0  280K resilvered
            da3                    ONLINE       0     0     0  269K resilvered
            da4                    ONLINE       0     0     0  266K resilvered
            da5                    ONLINE       0     0     0  270K resilvered
            da6                    ONLINE       0     0     0  270K resilvered
            da7                    ONLINE       0     0     0  267K resilvered

errors: No known data errors


-----------------------------------------------------------------------


% zpool iostat -v
                              capacity     operations    bandwidth
pool                        used  avail   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
store                      1.37T  2.72T     49    106   138K   543K
  raidz1                   1.37T  2.72T     49    106   138K   543K
    da0                        -      -     15     62  1017K  79.9K
    da1                        -      -     15     62  1020K  80.3K
    replacing                  -      -      0    103      0  88.3K
      2025342973333799752      -      -      0      0  1.45K    261
      da8                      -      -      0     79  1.45K  98.2K
    da2                        -      -     14     62   948K  80.3K
    da3                        -      -     13     62   894K  80.0K
    da4                        -      -     14     63   942K  80.3K
    da5                        -      -     15     62   992K  80.4K
    da6                        -      -     15     62  1000K  80.1K
    da7                        -      -     15     62  1022K  80.1K
-------------------------  -----  -----  -----  -----  -----  -----

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 155 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20090707/7ac75b29/attachment.pgp

Freddie Cash

2009-Jul-07 20:54 UTC

head link

ZFS: drive replacement performance

On Tue, Jul 7, 2009 at 12:56 PM, Mahlon E. Smith <mahlon@martini.nu>
wrote:
> I've got a 9 sata drive raidz1 array, started at version 6, upgraded to
> version 13.  I had an apparent drive failure, and then at some point, a
> kernel panic (unrelated to ZFS.)  The reboot caused the device numbers
> to shuffle, so I did an 'export/import' to re-read the metadata and
get
> the array back up.
>
This is why we've started using glabel(8) to label our drives, and then add
the labels to the pool:
  # zpool create store raidz1 label/disk01 label/disk02 label/disk03

That way, it does matter where the kernel detects the drives or what the
physical device node is called, GEOM picks up the label, and ZFS uses the
label.

> Once I swapped drives, I issued a 'zpool replace'.
>
See comment at the end:  what's the replace command that you used?

>
> That was 4 days ago now.  The progress in a 'zpool status' looks
like
> this, as of right now:
>
>  scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go
>
> ... which is a little concerning, since a) it appears to have not moved
> since I started it, and b) I'm in a DEGRADED state until it finishes...
> if it finishes.
>
There's something wrong here.  It definitely should be incrementing.  Even
when we did the foolish thing of creating a 24-drive raidz2 vdev and had to
replace a drive, the progress bar did change.  Never got above 39% as it
kept restarting, but it did increment.

>
> So, I reach out to the list!
>
>  - Is the resilver progress notification in a known weird state under
>   FreeBSD?
>
>  - Anything I can do to kick this in the pants?  Tuning params?
>
I'd redo the replace command, and check the output of "zpool
status" to make
sure it's showing the proper device node and not some random string of
numbers like it is.

>  - This was my first drive failure under ZFS -- anything I should have
>   done differently?  Such as NOT doing the export/import? (Not sure
>   what else I could have done there.)
>
If you knew which drive it was, I'd have shutdown the server and replaced
it, so that the drives came back up renumbered correctly.

This happened to us once when I was playing around with simulating dead
drives (pulling drives) and rebooting.  That's when I moved over to using
glabels.


% zpool status store>  pool: store
>  state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>        continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go
> config:
>
>        NAME                       STATE     READ WRITE CKSUM
>        store                      DEGRADED     0     0     0
>          raidz1                   DEGRADED     0     0     0
>            da0                    ONLINE       0     0     0  274K
> resilvered
>            da1                    ONLINE       0     0     0  282K
> resilvered
>            replacing              DEGRADED     0     0     0
>              2025342973333799752  UNAVAIL      3 4.11K     0  was /dev/da2
>              da8                  ONLINE       0     0     0  418K
> resilvered
>            da2                    ONLINE       0     0     0  280K
> resilvered
>            da3                    ONLINE       0     0     0  269K
> resilvered
>            da4                    ONLINE       0     0     0  266K
> resilvered
>            da5                    ONLINE       0     0     0  270K
> resilvered
>            da6                    ONLINE       0     0     0  270K
> resilvered
>            da7                    ONLINE       0     0     0  267K
> resilvered
>
> errors: No known data errors
>
>
> -----------------------------------------------------------------------
>
>
> % zpool iostat -v
>                              capacity     operations    bandwidth
> pool                        used  avail   read  write   read  write
> -------------------------  -----  -----  -----  -----  -----  -----
> store                      1.37T  2.72T     49    106   138K   543K
>  raidz1                   1.37T  2.72T     49    106   138K   543K
>    da0                        -      -     15     62  1017K  79.9K
>    da1                        -      -     15     62  1020K  80.3K
>    replacing                  -      -      0    103      0  88.3K
>      2025342973333799752      -      -      0      0  1.45K    261
>      da8                      -      -      0     79  1.45K  98.2K
>    da2                        -      -     14     62   948K  80.3K
>    da3                        -      -     13     62   894K  80.0K
>    da4                        -      -     14     63   942K  80.3K
>    da5                        -      -     15     62   992K  80.4K
>    da6                        -      -     15     62  1000K  80.1K
>    da7                        -      -     15     62  1022K  80.1K
> -------------------------  -----  -----  -----  -----  -----  -----
>
That definitely doesn't look right.   It should be showing the device name
there in the "replacing" section.

What's the exact "zpool replace" command that you used?



-- 
Freddie Cash
fjwcash@gmail.com

Brooks Davis

2009-Jul-07 21:17 UTC

head link

ZFS: drive replacement performance

On Tue, Jul 07, 2009 at 12:56:14PM -0700, Mahlon E. Smith
wrote:> 
> I've got a 9 sata drive raidz1 array, started at version 6, upgraded to
> version 13.  I had an apparent drive failure, and then at some point, a
> kernel panic (unrelated to ZFS.)  The reboot caused the device numbers
> to shuffle, so I did an 'export/import' to re-read the metadata and
get
> the array back up.
> 
> Once I swapped drives, I issued a 'zpool replace'.
> 
> That was 4 days ago now.  The progress in a 'zpool status' looks
like
> this, as of right now:
> 
>  scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go
> 
> ... which is a little concerning, since a) it appears to have not moved
> since I started it, and b) I'm in a DEGRADED state until it finishes...
> if it finishes.
> 
> So, I reach out to the list!
> 
>  - Is the resilver progress notification in a known weird state under
>    FreeBSD?
> 
>  - Anything I can do to kick this in the pants?  Tuning params?
> 
>  - This was my first drive failure under ZFS -- anything I should have
>    done differently?  Such as NOT doing the export/import? (Not sure
>    what else I could have done there.)
I'm seeing essentially the same think on an 8.0-BETA1 box with an 8-disk
raidz1 pool.  Every once in a while the system makes it to 0.05% done
and gives a vaguely reasonable rebuild time, but it quickly drops back
to reports 0.00% and it's basically not making any forward progress.  In
my case this is a copy of a mirror so while it would be a bit annoying
to rebuild, the system could be rebuilt fairly easily.

On thing I did just notice is that my zpool version is 13, but my file
systems are all v1 rather than the latest (v3).  I don't know if this is
relevant or not.

-- Brooks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20090707/d66e898b/attachment.pgp

Gabor Radnai

2009-Aug-03 21:47 UTC

head link

ZFS: drive replacement performance

> On Tue, Jul 7, 2009 at 3:26 PM, Mahlon E. Smith <mahlon at martini.nu
<http://lists.freebsd.org/mailman/listinfo/freebsd-stable>> wrote:
*> > On Tue, Jul 07, 2009, Freddie Cash wrote:
*> >* >
*> >* > This is why we've started using glabel(8) to label our
drives, and then
*> >* add
*> >* > the labels to the pool:
*> >* >   # zpool create store raidz1 label/disk01 label/disk02
label/disk03
*> >* >
*> >* > That way, it does matter where the kernel detects the drives or
what the
*> >* > physical device node is called, GEOM picks up the label, and
ZFS uses the
*> >* > label.
*> >*
*> >* Ah, slick.  I'll definitely be doing that moving forward. 
Wonder if I
*> >* could do it piecemeal now via a shell game, labeling and replacing
each
*> >* individual drive?  Will put that on my "try it" list.
*> >*
*> Yes, this can be done piecemeal, after the fact, on an already configured
> pool.  That's how I did it on one of our servers.  It was originally
> configured using the device node names (da0, da1, etc).  Then I set up the
> second server, but used labels.  Then I went back to the first server,
> labelled the drives, and did "zpool replace storage da0
label/disk01" for
> each drive.  Doesn't take long to resilver, as it knows that it's
the same
> device.
It seems a very good practice but how did you do actually on your
first already configured server?
I am struggling with the followings:

1. on online disk, member of configured raidz pool, "glabel label"
fails with error message "operation not permitted".
2. if I make disk offline, glabel succeed, but making disk back to
online clears label.

3. if I make disk offline, glabel succeed, but "zpool replace <pool>
<dev> <label>" fails with "/dev/label/<label> is
part of
 active pool <pool>"
4. export <pool>, glabel, import <pool> neither works.

A detailed guide in "for dummies" style would be appreciated.

Thanks,
Gabor

freebsd stable - Jul 2009 - ZFS: drive replacement performance

ZFS: drive replacement performance

ZFS: drive replacement performance

ZFS: drive replacement performance

ZFS: drive replacement performance