thr3ads.net - zfs discuss - [zfs-discuss] Why did resilvering restart? [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Albert Chin

2007-Nov-20 15:58 UTC

[zfs-discuss] Why did resilvering restart?

On b66:
  # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \
  c0t600A0B8000299CCC000006734741CD4Ed0
  < some hours later>
  # zpool status tww
    pool: tww
   state: DEGRADED
  status: One or more devices is currently being resilvered.  The pool will
          continue to function, possibly in a degraded state.
  action: Wait for the resilver to complete.
   scrub: resilver in progress, 62.90% done, 4h26m to go
  < some hours later>
  # zpool status tww
    pool: tww
   state: DEGRADED
  status: One or more devices is currently being resilvered.  The pool will
          continue to function, possibly in a degraded state.
  action: Wait for the resilver to complete.
   scrub: resilver in progress, 3.85% done, 18h49m to go

  # zpool history tww | tail -1
  2007-11-20.02:37:13 zpool replace tww c0t600A0B80002999660000059E4668CBD3d0
  c0t600A0B8000299CCC000006734741CD4Ed0

So, why did resilvering restart when no zfs operations occurred? I
just ran zpool status again and now I get:
  # zpool status tww
    pool: tww
   state: DEGRADED
  status: One or more devices is currently being resilvered.  The pool will
          continue to function, possibly in a degraded state.
  action: Wait for the resilver to complete.
   scrub: resilver in progress, 0.00% done, 134h45m to go

What''s going on?

-- 
albert chin (china at thewrittenword.com)

Wade.Stuart at fallon.com

2007-Nov-20 16:01 UTC

head link

[zfs-discuss] Why did resilvering restart?

Resilver and scrub are broken and restart when a snapshot is created -- the
current workaround is to disable snaps while resilvering,  the ZFS team is
working on the issue for a long term fix.

-Wade

zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 09:58:19 AM:
> On b66:
>   # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \
>   c0t600A0B8000299CCC000006734741CD4Ed0
>   < some hours later>
>   # zpool status tww
>     pool: tww
>    state: DEGRADED
>   status: One or more devices is currently being resilvered.  The pool
will>           continue to function, possibly in a degraded state.
>   action: Wait for the resilver to complete.
>    scrub: resilver in progress, 62.90% done, 4h26m to go
>   < some hours later>
>   # zpool status tww
>     pool: tww
>    state: DEGRADED
>   status: One or more devices is currently being resilvered.  The pool
will>           continue to function, possibly in a degraded state.
>   action: Wait for the resilver to complete.
>    scrub: resilver in progress, 3.85% done, 18h49m to go
>
>   # zpool history tww | tail -1
>   2007-11-20.02:37:13 zpool replace tww
c0t600A0B80002999660000059E4668CBD3d0>   c0t600A0B8000299CCC000006734741CD4Ed0
>
> So, why did resilvering restart when no zfs operations occurred? I
> just ran zpool status again and now I get:
>   # zpool status tww
>     pool: tww
>    state: DEGRADED
>   status: One or more devices is currently being resilvered.  The pool
will>           continue to function, possibly in a degraded state.
>   action: Wait for the resilver to complete.
>    scrub: resilver in progress, 0.00% done, 134h45m to go
>
> What''s going on?
>
> --
> albert chin (china at thewrittenword.com)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Albert Chin

2007-Nov-20 16:11 UTC

head link

[zfs-discuss] Why did resilvering restart?

On Tue, Nov 20, 2007 at 10:01:49AM -0600, Wade.Stuart at fallon.com
wrote:> Resilver and scrub are broken and restart when a snapshot is created
> -- the current workaround is to disable snaps while resilvering,
> the ZFS team is working on the issue for a long term fix.
But, no snapshot was taken. If so, zpool history would have shown
this. So, in short, _no_ ZFS operations are going on during the
resilvering. Yet, it is restarting.
> -Wade
> 
> zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 09:58:19 AM:
> 
> > On b66:
> >   # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \
> >   c0t600A0B8000299CCC000006734741CD4Ed0
> >   < some hours later>
> >   # zpool status tww
> >     pool: tww
> >    state: DEGRADED
> >   status: One or more devices is currently being resilvered.  The pool
> will
> >           continue to function, possibly in a degraded state.
> >   action: Wait for the resilver to complete.
> >    scrub: resilver in progress, 62.90% done, 4h26m to go
> >   < some hours later>
> >   # zpool status tww
> >     pool: tww
> >    state: DEGRADED
> >   status: One or more devices is currently being resilvered.  The pool
> will
> >           continue to function, possibly in a degraded state.
> >   action: Wait for the resilver to complete.
> >    scrub: resilver in progress, 3.85% done, 18h49m to go
> >
> >   # zpool history tww | tail -1
> >   2007-11-20.02:37:13 zpool replace tww
> c0t600A0B80002999660000059E4668CBD3d0
> >   c0t600A0B8000299CCC000006734741CD4Ed0
> >
> > So, why did resilvering restart when no zfs operations occurred? I
> > just ran zpool status again and now I get:
> >   # zpool status tww
> >     pool: tww
> >    state: DEGRADED
> >   status: One or more devices is currently being resilvered.  The pool
> will
> >           continue to function, possibly in a degraded state.
> >   action: Wait for the resilver to complete.
> >    scrub: resilver in progress, 0.00% done, 134h45m to go
> >
> > What''s going on?
> >
> > --
> > albert chin (china at thewrittenword.com)
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
-- 
albert chin (china at thewrittenword.com)

Wade.Stuart at fallon.com

2007-Nov-20 17:10 UTC

head link

[zfs-discuss] Why did resilvering restart?

zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 10:11:50 AM:
> On Tue, Nov 20, 2007 at 10:01:49AM -0600, Wade.Stuart at fallon.com wrote:
> > Resilver and scrub are broken and restart when a snapshot is created
> > -- the current workaround is to disable snaps while resilvering,
> > the ZFS team is working on the issue for a long term fix.
>
> But, no snapshot was taken. If so, zpool history would have shown
> this. So, in short, _no_ ZFS operations are going on during the
> resilvering. Yet, it is restarting.
>
Does 2007-11-20.02:37:13 actually match the expected timestamp of the
original zpool replace command before the first zpool status output listed
below?  Is it possible that another zpool replace is further up on your
pool history (ie it was rerun by an admin or automatically from some
service)?

-Wade

> >
> > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 09:58:19
AM:
> >
> > > On b66:
> > >   # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \
> > >   c0t600A0B8000299CCC000006734741CD4Ed0
> > >   < some hours later>
> > >   # zpool status tww
> > >     pool: tww
> > >    state: DEGRADED
> > >   status: One or more devices is currently being resilvered.  The
pool> > will
> > >           continue to function, possibly in a degraded state.
> > >   action: Wait for the resilver to complete.
> > >    scrub: resilver in progress, 62.90% done, 4h26m to go
> > >   < some hours later>
> > >   # zpool status tww
> > >     pool: tww
> > >    state: DEGRADED
> > >   status: One or more devices is currently being resilvered.  The
pool> > will
> > >           continue to function, possibly in a degraded state.
> > >   action: Wait for the resilver to complete.
> > >    scrub: resilver in progress, 3.85% done, 18h49m to go
> > >
> > >   # zpool history tww | tail -1
> > >   2007-11-20.02:37:13 zpool replace tww
> > c0t600A0B80002999660000059E4668CBD3d0
> > >   c0t600A0B8000299CCC000006734741CD4Ed0
> > >
> > > So, why did resilvering restart when no zfs operations occurred?
I
> > > just ran zpool status again and now I get:
> > >   # zpool status tww
> > >     pool: tww
> > >    state: DEGRADED
> > >   status: One or more devices is currently being resilvered.  The
pool> > will
> > >           continue to function, possibly in a degraded state.
> > >   action: Wait for the resilver to complete.
> > >    scrub: resilver in progress, 0.00% done, 134h45m to go
> > >
> > > What''s going on?
> > >
> > > --
> > > albert chin (china at thewrittenword.com)
> > > _______________________________________________
> > > zfs-discuss mailing list
> > > zfs-discuss at opensolaris.org
> > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> >
>
> --
> albert chin (china at thewrittenword.com)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Albert Chin

2007-Nov-20 17:39 UTC

head link

[zfs-discuss] Why did resilvering restart?

On Tue, Nov 20, 2007 at 11:10:20AM -0600, Wade.Stuart at fallon.com
wrote:> 
> zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 10:11:50 AM:
> 
> > On Tue, Nov 20, 2007 at 10:01:49AM -0600, Wade.Stuart at fallon.com
wrote:
> > > Resilver and scrub are broken and restart when a snapshot is
created
> > > -- the current workaround is to disable snaps while resilvering,
> > > the ZFS team is working on the issue for a long term fix.
> >
> > But, no snapshot was taken. If so, zpool history would have shown
> > this. So, in short, _no_ ZFS operations are going on during the
> > resilvering. Yet, it is restarting.
> >
> 
> Does 2007-11-20.02:37:13 actually match the expected timestamp of
> the original zpool replace command before the first zpool status
> output listed below?
No. We ran some ''zpool status'' commands after the last
''zpool
replace''. The ''zpool status'' output in the initial
email is from this
morning. The only ZFS command we''ve been running is ''zfs
list'', ''zpool
list tww'', ''zpool status'', or ''zpool status
-v'' after the last ''zpool
replace''.

Server is on GMT time.
> Is it possible that another zpool replace is further up on your
> pool history (ie it was rerun by an admin or automatically from some
> service)?
Yes, but a zpool replace for the same bad disk:
  2007-11-20.00:57:40 zpool replace tww c0t600A0B80002999660000059E4668CBD3d0
  c0t600A0B8000299966000006584741C7C3d0
  2007-11-20.02:35:22 zpool detach tww c0t600A0B8000299966000006584741C7C3d0
  2007-11-20.02:37:13 zpool replace tww c0t600A0B80002999660000059E4668CBD3d0
  c0t600A0B8000299CCC000006734741CD4Ed0

We accidentally removed c0t600A0B8000299966000006584741C7C3d0 from the
array, hence the ''zpool detach''.

The last ''zpool replace'' has been running for 15h now.
> -Wade
> 
> 
> > >
> > > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007
09:58:19 AM:
> > >
> > > > On b66:
> > > >   # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0
\
> > > >   c0t600A0B8000299CCC000006734741CD4Ed0
> > > >   < some hours later>
> > > >   # zpool status tww
> > > >     pool: tww
> > > >    state: DEGRADED
> > > >   status: One or more devices is currently being resilvered.
The
> pool
> > > will
> > > >           continue to function, possibly in a degraded
state.
> > > >   action: Wait for the resilver to complete.
> > > >    scrub: resilver in progress, 62.90% done, 4h26m to go
> > > >   < some hours later>
> > > >   # zpool status tww
> > > >     pool: tww
> > > >    state: DEGRADED
> > > >   status: One or more devices is currently being resilvered.
The
> pool
> > > will
> > > >           continue to function, possibly in a degraded
state.
> > > >   action: Wait for the resilver to complete.
> > > >    scrub: resilver in progress, 3.85% done, 18h49m to go
> > > >
> > > >   # zpool history tww | tail -1
> > > >   2007-11-20.02:37:13 zpool replace tww
> > > c0t600A0B80002999660000059E4668CBD3d0
> > > >   c0t600A0B8000299CCC000006734741CD4Ed0
> > > >
> > > > So, why did resilvering restart when no zfs operations
occurred? I
> > > > just ran zpool status again and now I get:
> > > >   # zpool status tww
> > > >     pool: tww
> > > >    state: DEGRADED
> > > >   status: One or more devices is currently being resilvered.
The
> pool
> > > will
> > > >           continue to function, possibly in a degraded
state.
> > > >   action: Wait for the resilver to complete.
> > > >    scrub: resilver in progress, 0.00% done, 134h45m to go
> > > >
> > > > What''s going on?
> > > >
> > > > --
> > > > albert chin (china at thewrittenword.com)
> > > > _______________________________________________
> > > > zfs-discuss mailing list
> > > > zfs-discuss at opensolaris.org
> > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > >
> > > _______________________________________________
> > > zfs-discuss mailing list
> > > zfs-discuss at opensolaris.org
> > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > >
> > >
> >
> > --
> > albert chin (china at thewrittenword.com)
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
-- 
albert chin (china at thewrittenword.com)

Albert Chin

2007-Nov-22 07:12 UTC

head link

[zfs-discuss] Why did resilvering restart?

On Tue, Nov 20, 2007 at 11:39:30AM -0600, Albert Chin
wrote:> On Tue, Nov 20, 2007 at 11:10:20AM -0600, Wade.Stuart at fallon.com wrote:
> > 
> > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 10:11:50
AM:
> > 
> > > On Tue, Nov 20, 2007 at 10:01:49AM -0600, Wade.Stuart at
fallon.com wrote:
> > > > Resilver and scrub are broken and restart when a snapshot is
created
> > > > -- the current workaround is to disable snaps while
resilvering,
> > > > the ZFS team is working on the issue for a long term fix.
> > >
> > > But, no snapshot was taken. If so, zpool history would have shown
> > > this. So, in short, _no_ ZFS operations are going on during the
> > > resilvering. Yet, it is restarting.
> > >
> > 
> > Does 2007-11-20.02:37:13 actually match the expected timestamp of
> > the original zpool replace command before the first zpool status
> > output listed below?
> 
> No. We ran some ''zpool status'' commands after the last
''zpool
> replace''. The ''zpool status'' output in the
initial email is from this
> morning. The only ZFS command we''ve been running is ''zfs
list'', ''zpool
> list tww'', ''zpool status'', or ''zpool
status -v'' after the last ''zpool
> replace''.
I think the ''zpool status'' command was resetting the
resilvering. We
upgraded to b77 this morning which did not exhibit this problem.
Resilvering is now done.
> Server is on GMT time.
> 
> > Is it possible that another zpool replace is further up on your
> > pool history (ie it was rerun by an admin or automatically from some
> > service)?
> 
> Yes, but a zpool replace for the same bad disk:
>   2007-11-20.00:57:40 zpool replace tww
c0t600A0B80002999660000059E4668CBD3d0
>   c0t600A0B8000299966000006584741C7C3d0
>   2007-11-20.02:35:22 zpool detach tww
c0t600A0B8000299966000006584741C7C3d0
>   2007-11-20.02:37:13 zpool replace tww
c0t600A0B80002999660000059E4668CBD3d0
>   c0t600A0B8000299CCC000006734741CD4Ed0
> 
> We accidentally removed c0t600A0B8000299966000006584741C7C3d0 from the
> array, hence the ''zpool detach''.
> 
> The last ''zpool replace'' has been running for 15h now.
> 
> > -Wade
> > 
> > 
> > > >
> > > > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007
09:58:19 AM:
> > > >
> > > > > On b66:
> > > > >   # zpool replace tww
c0t600A0B80002999660000059E4668CBD3d0 \
> > > > >   c0t600A0B8000299CCC000006734741CD4Ed0
> > > > >   < some hours later>
> > > > >   # zpool status tww
> > > > >     pool: tww
> > > > >    state: DEGRADED
> > > > >   status: One or more devices is currently being
resilvered.  The
> > pool
> > > > will
> > > > >           continue to function, possibly in a degraded
state.
> > > > >   action: Wait for the resilver to complete.
> > > > >    scrub: resilver in progress, 62.90% done, 4h26m to
go
> > > > >   < some hours later>
> > > > >   # zpool status tww
> > > > >     pool: tww
> > > > >    state: DEGRADED
> > > > >   status: One or more devices is currently being
resilvered.  The
> > pool
> > > > will
> > > > >           continue to function, possibly in a degraded
state.
> > > > >   action: Wait for the resilver to complete.
> > > > >    scrub: resilver in progress, 3.85% done, 18h49m to
go
> > > > >
> > > > >   # zpool history tww | tail -1
> > > > >   2007-11-20.02:37:13 zpool replace tww
> > > > c0t600A0B80002999660000059E4668CBD3d0
> > > > >   c0t600A0B8000299CCC000006734741CD4Ed0
> > > > >
> > > > > So, why did resilvering restart when no zfs operations
occurred? I
> > > > > just ran zpool status again and now I get:
> > > > >   # zpool status tww
> > > > >     pool: tww
> > > > >    state: DEGRADED
> > > > >   status: One or more devices is currently being
resilvered.  The
> > pool
> > > > will
> > > > >           continue to function, possibly in a degraded
state.
> > > > >   action: Wait for the resilver to complete.
> > > > >    scrub: resilver in progress, 0.00% done, 134h45m to
go
> > > > >
> > > > > What''s going on?
> > > > >
> > > > > --
> > > > > albert chin (china at thewrittenword.com)
> > > > > _______________________________________________
> > > > > zfs-discuss mailing list
> > > > > zfs-discuss at opensolaris.org
> > > > >
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > > >
> > > > _______________________________________________
> > > > zfs-discuss mailing list
> > > > zfs-discuss at opensolaris.org
> > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > > >
> > > >
> > >
> > > --
> > > albert chin (china at thewrittenword.com)
> > > _______________________________________________
> > > zfs-discuss mailing list
> > > zfs-discuss at opensolaris.org
> > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > 
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> > 
> > 
> 
> -- 
> albert chin (china at thewrittenword.com)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
-- 
albert chin (china at thewrittenword.com)

zfs discuss - Nov 2007 - Why did resilvering restart?

[zfs-discuss] Why did resilvering restart?

[zfs-discuss] Why did resilvering restart?

[zfs-discuss] Why did resilvering restart?

[zfs-discuss] Why did resilvering restart?

[zfs-discuss] Why did resilvering restart?

[zfs-discuss] Why did resilvering restart?