thr3ads.net - zfs discuss - [zfs-discuss] zfs resilvering [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Mikael Kjerrman

2008-Sep-26 08:27 UTC

[zfs-discuss] zfs resilvering

Hi,

I''ve searched without luck, so I''m asking instead.

I have a Solaris 10 box,

# cat /etc/release
                       Solaris 10 11/06 s10s_u3wos_10 SPARC
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 14 November 2006

this box was rebooted this morning and after the boot I noticed a resilver was
in progress. But the suggested time seemed a bit long, so is this a problem
which can be patched or remediated in another way?

# zpool status -x
  pool: zonedata
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.04% done, [b]4398h43m[/b] to go
config:

        NAME                                       STATE     READ WRITE CKSUM
        zonedata                                   ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B000010A0d0  ONLINE       0     0     0
            c6t60060E800428330000002833000010A0d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B000010A1d0  ONLINE       0     0     0
            c6t60060E800428330000002833000010A1d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B000010A2d0  ONLINE       0     0     0
            c6t60060E800428330000002833000010A2d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B000010A4d0  ONLINE       0     0     0
            c6t60060E800428330000002833000010A4d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B000010A5d0  ONLINE       0     0     0
            c6t60060E800428330000002833000010A5d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B000010A6d0  ONLINE       0     0     0
            c6t60060E800428330000002833000010A6d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B00002022d0  ONLINE       0     0     0
            c6t60060E80042833000000283300002022d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B00002023d0  ONLINE       0     0     0
            c6t60060E80042833000000283300002024d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t60060E8004282B000000282B00002024d0  ONLINE       0     0     0
            c6t60060E80042833000000283300002023d0  ONLINE       0     0     0


I also have a question about sharing a zfs from the global zone to a local zone.
Are there any issues with this? We had an unfortunate sysadmin who did this and
our systems hung. We have no logs that show anyhing at all, but I thought
I''d ask just be sure.

cheers,

//Mike
--
This message posted from opensolaris.org

Brent Jones

2008-Sep-26 12:53 UTC

head link

[zfs-discuss] zfs resilvering

On Fri, Sep 26, 2008 at 1:27 AM, Mikael Kjerrman
<mikael.kjerrman at gmail.com> wrote:> Hi,
>
> I''ve searched without luck, so I''m asking instead.
>
> I have a Solaris 10 box,
>
> # cat /etc/release
>                       Solaris 10 11/06 s10s_u3wos_10 SPARC
>           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
>                        Use is subject to license terms.
>                           Assembled 14 November 2006
>
> this box was rebooted this morning and after the boot I noticed a resilver
was in progress. But the suggested time seemed a bit long, so is this a problem
which can be patched or remediated in another way?
>
> # zpool status -x
>  pool: zonedata
>  state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>        continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>  scrub: resilver in progress, 0.04% done, [b]4398h43m[/b] to go
> config:
>
>        NAME                                       STATE     READ WRITE
CKSUM
>        zonedata                                   ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B000010A0d0  ONLINE       0     0    
0
>            c6t60060E800428330000002833000010A0d0  ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B000010A1d0  ONLINE       0     0    
0
>            c6t60060E800428330000002833000010A1d0  ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B000010A2d0  ONLINE       0     0    
0
>            c6t60060E800428330000002833000010A2d0  ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B000010A4d0  ONLINE       0     0    
0
>            c6t60060E800428330000002833000010A4d0  ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B000010A5d0  ONLINE       0     0    
0
>            c6t60060E800428330000002833000010A5d0  ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B000010A6d0  ONLINE       0     0    
0
>            c6t60060E800428330000002833000010A6d0  ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B00002022d0  ONLINE       0     0    
0
>            c6t60060E80042833000000283300002022d0  ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B00002023d0  ONLINE       0     0    
0
>            c6t60060E80042833000000283300002024d0  ONLINE       0     0    
0
>          mirror                                   ONLINE       0     0    
0
>            c6t60060E8004282B000000282B00002024d0  ONLINE       0     0    
0
>            c6t60060E80042833000000283300002023d0  ONLINE       0     0    
0
>
>
> I also have a question about sharing a zfs from the global zone to a local
zone. Are there any issues with this? We had an unfortunate sysadmin who did
this and our systems hung. We have no logs that show anyhing at all, but I
thought I''d ask just be sure.
>
> cheers,
>
> //Mike
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
Do you have a lot of competing I/O''s on the box which would slow down
the resilver?


-- 
Brent Jones
brent at servuhome.net

Mikael Kjerrman

2008-Sep-26 13:06 UTC

head link

[zfs-discuss] zfs resilvering

define a lot :-)

We are doing about 7-8M per second which I don''t think is a lot but
perhaps it is enough to screw up the estimates? Anyhow the resilvering completed
about 4386h earlier than expected so everything is ok now, but I still feel that
the way it figures out the number is wrong.

Any thoughts on my other issue?

cheers,

//Mike
--
This message posted from opensolaris.org

Johan Hartzenberg

2008-Sep-26 13:52 UTC

head link

[zfs-discuss] zfs resilvering

On Fri, Sep 26, 2008 at 4:02 PM, <jonathan at kc8onw.net> wrote:
>
> Note the progress so far "0.04%."  In my experience the time
estimate has
> no basis in reality until it''s about 1% do or so.  I think there
is some
> bookkeeping or something ZFS does at the start of a scrub or resilver that
> throws off the time estimate for a while.  Thats just my experience with
> it but it''s been like that pretty consistently for me.
>
> Jonathan Stewart

I agree here.

I''ve watched iostat -xnc 5 while I start scrubbing a few times, and the
first minute or so is spend doing very little IO.  There after the transfers
shoot up to near what I think is the maximum the drive can do an stays there
until the scrub is completed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080926/a8b69f8c/attachment.html>

jonathan at kc8onw.net

2008-Sep-26 14:02 UTC

head link

[zfs-discuss] zfs resilvering

> On Fri, Sep 26, 2008 at 1:27 AM, Mikael Kjerrman
> <mikael.kjerrman at gmail.com> wrote:
[snip]>> this box was rebooted this morning and after the boot I noticed a
>> resilver was in progress. But the suggested time seemed a bit long, so
>> is this a problem which can be patched or remediated in another way?
>>
>> # zpool status -x
>>  pool: zonedata
>>  state: ONLINE
>> status: One or more devices is currently being resilvered.  The pool
>> will
>>        continue to function, possibly in a degraded state.
>> action: Wait for the resilver to complete.
>>  scrub: resilver in progress, 0.04% done, [b]4398h43m[/b] to go
[snip]> Do you have a lot of competing I/O''s on the box which would slow
down
> the resilver?
Note the progress so far "0.04%."  In my experience the time estimate
has
no basis in reality until it''s about 1% do or so.  I think there is
some
bookkeeping or something ZFS does at the start of a scrub or resilver that
throws off the time estimate for a while.  Thats just my experience with
it but it''s been like that pretty consistently for me.

Jonathan Stewart

Richard Elling

2008-Sep-26 17:03 UTC

head link

[zfs-discuss] zfs resilvering

Mikael Kjerrman wrote:> define a lot :-)
>
> We are doing about 7-8M per second which I don''t think is a lot
but perhaps it is enough to screw up the estimates? Anyhow the resilvering
completed about 4386h earlier than expected so everything is ok now, but I still
feel that the way it figures out the number is wrong.
>   
Yes, the algorithm is conservative and very often wrong until you
get close to the end.  In part this is because resilvering works in time
order, not spatial distance. In ZFS, the oldest data is resilvered first.
This is also why you will see a lot of "thinking" before you see a
lot of I/O because ZFS is determining the order to resilver the data.
Unfortunately, this makes time completion prediction somewhat
difficult to get right.
> Any thoughts on my other issue?
>   
Try the zones-discuss forum
 -- richard

Johan Hartzenberg

2008-Sep-27 08:05 UTC

head link

[zfs-discuss] zfs resilvering

On Fri, Sep 26, 2008 at 7:03 PM, Richard Elling <Richard.Elling at
sun.com>wrote:
> Mikael Kjerrman wrote:
> > define a lot :-)
> >
> > We are doing about 7-8M per second which I don''t think is a
lot but
> perhaps it is enough to screw up the estimates? Anyhow the resilvering
> completed about 4386h earlier than expected so everything is ok now, but I
> still feel that the way it figures out the number is wrong.
> >
>
> Yes, the algorithm is conservative and very often wrong until you
> get close to the end.  In part this is because resilvering works in time
> order, not spatial distance. In ZFS, the oldest data is resilvered first.
> This is also why you will see a lot of "thinking" before you see
a
> lot of I/O because ZFS is determining the order to resilver the data.
> Unfortunately, this makes time completion prediction somewhat
> difficult to get right.
>
Hi Richard,

Would it not make more sense then for the program to say something like "No
Estimate Yet" during the early part of the process, at least?

Cheers,
  _hartz
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080927/9f7ff2a6/attachment.html>

Ian Collins

2008-Sep-28 00:55 UTC

head link

[zfs-discuss] zfs resilvering

Mikael Kjerrman wrote:>
> I also have a question about sharing a zfs from the global zone to a local
zone. Are there any issues with this? We had an unfortunate sysadmin who did
this and our systems hung. We have no logs that show anyhing at all, but I
thought I''d ask just be sure.
>
>   How was it shared, as an fs or a dataset?  I''m using both and I
haven''t
seen any problems with either.

Ian

Richard Elling

2008-Sep-28 16:26 UTC

head link

[zfs-discuss] zfs resilvering

Johan Hartzenberg wrote:>
>
> On Fri, Sep 26, 2008 at 7:03 PM, Richard Elling 
> <Richard.Elling at sun.com <mailto:Richard.Elling at sun.com>>
wrote:
>
>     Mikael Kjerrman wrote:
>     > define a lot :-)
>     >
>     > We are doing about 7-8M per second which I don''t think is
a lot
>     but perhaps it is enough to screw up the estimates? Anyhow the
>     resilvering completed about 4386h earlier than expected so
>     everything is ok now, but I still feel that the way it figures out
>     the number is wrong.
>     >
>
>     Yes, the algorithm is conservative and very often wrong until you
>     get close to the end.  In part this is because resilvering works
>     in time
>     order, not spatial distance. In ZFS, the oldest data is resilvered
>     first.
>     This is also why you will see a lot of "thinking" before you
see a
>     lot of I/O because ZFS is determining the order to resilver the data.
>     Unfortunately, this makes time completion prediction somewhat
>     difficult to get right.
>
>
> Hi Richard,
>
> Would it not make more sense then for the program to say something 
> like "No Estimate Yet" during the early part of the process, at
least?
Yes.  That would be a good idea.  Sounds like a good, quick opportunity
for a community contributor :-)
 -- richard
>
> Cheers,
>   _hartz

Mikael Kjerrman

2008-Sep-29 10:04 UTC

head link

[zfs-discuss] zfs resilvering

Hi,

it was actually shared both as a dataset and a NFS-share.

we had zonedata/prodlogs set up as a dataset and then
we had zonedata/tmp mounted as a NFS filesystem within the zone.

//Mike
--
This message posted from opensolaris.org

Mikael Kjerrman

2008-Sep-29 10:06 UTC

head link

[zfs-discuss] zfs resilvering

Richard,

thanks alot for that answer. It can be argued back and forth what is right, but
it helps knowing the reason behind the problem. Again, thanks alot...

//Mike
--
This message posted from opensolaris.org

zfs discuss - Sep 2008 - zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering

[zfs-discuss] zfs resilvering