On b66: # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \ c0t600A0B8000299CCC000006734741CD4Ed0 < some hours later> # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 62.90% done, 4h26m to go < some hours later> # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 3.85% done, 18h49m to go # zpool history tww | tail -1 2007-11-20.02:37:13 zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 c0t600A0B8000299CCC000006734741CD4Ed0 So, why did resilvering restart when no zfs operations occurred? I just ran zpool status again and now I get: # zpool status tww pool: tww state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.00% done, 134h45m to go What''s going on? -- albert chin (china at thewrittenword.com)
Wade.Stuart at fallon.com
2007-Nov-20 16:01 UTC
[zfs-discuss] Why did resilvering restart?
Resilver and scrub are broken and restart when a snapshot is created -- the current workaround is to disable snaps while resilvering, the ZFS team is working on the issue for a long term fix. -Wade zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 09:58:19 AM:> On b66: > # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \ > c0t600A0B8000299CCC000006734741CD4Ed0 > < some hours later> > # zpool status tww > pool: tww > state: DEGRADED > status: One or more devices is currently being resilvered. The poolwill> continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress, 62.90% done, 4h26m to go > < some hours later> > # zpool status tww > pool: tww > state: DEGRADED > status: One or more devices is currently being resilvered. The poolwill> continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress, 3.85% done, 18h49m to go > > # zpool history tww | tail -1 > 2007-11-20.02:37:13 zpool replace twwc0t600A0B80002999660000059E4668CBD3d0> c0t600A0B8000299CCC000006734741CD4Ed0 > > So, why did resilvering restart when no zfs operations occurred? I > just ran zpool status again and now I get: > # zpool status tww > pool: tww > state: DEGRADED > status: One or more devices is currently being resilvered. The poolwill> continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress, 0.00% done, 134h45m to go > > What''s going on? > > -- > albert chin (china at thewrittenword.com) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, Nov 20, 2007 at 10:01:49AM -0600, Wade.Stuart at fallon.com wrote:> Resilver and scrub are broken and restart when a snapshot is created > -- the current workaround is to disable snaps while resilvering, > the ZFS team is working on the issue for a long term fix.But, no snapshot was taken. If so, zpool history would have shown this. So, in short, _no_ ZFS operations are going on during the resilvering. Yet, it is restarting.> -Wade > > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 09:58:19 AM: > > > On b66: > > # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \ > > c0t600A0B8000299CCC000006734741CD4Ed0 > > < some hours later> > > # zpool status tww > > pool: tww > > state: DEGRADED > > status: One or more devices is currently being resilvered. The pool > will > > continue to function, possibly in a degraded state. > > action: Wait for the resilver to complete. > > scrub: resilver in progress, 62.90% done, 4h26m to go > > < some hours later> > > # zpool status tww > > pool: tww > > state: DEGRADED > > status: One or more devices is currently being resilvered. The pool > will > > continue to function, possibly in a degraded state. > > action: Wait for the resilver to complete. > > scrub: resilver in progress, 3.85% done, 18h49m to go > > > > # zpool history tww | tail -1 > > 2007-11-20.02:37:13 zpool replace tww > c0t600A0B80002999660000059E4668CBD3d0 > > c0t600A0B8000299CCC000006734741CD4Ed0 > > > > So, why did resilvering restart when no zfs operations occurred? I > > just ran zpool status again and now I get: > > # zpool status tww > > pool: tww > > state: DEGRADED > > status: One or more devices is currently being resilvered. The pool > will > > continue to function, possibly in a degraded state. > > action: Wait for the resilver to complete. > > scrub: resilver in progress, 0.00% done, 134h45m to go > > > > What''s going on? > > > > -- > > albert chin (china at thewrittenword.com) > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- albert chin (china at thewrittenword.com)
Wade.Stuart at fallon.com
2007-Nov-20 17:10 UTC
[zfs-discuss] Why did resilvering restart?
zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 10:11:50 AM:> On Tue, Nov 20, 2007 at 10:01:49AM -0600, Wade.Stuart at fallon.com wrote: > > Resilver and scrub are broken and restart when a snapshot is created > > -- the current workaround is to disable snaps while resilvering, > > the ZFS team is working on the issue for a long term fix. > > But, no snapshot was taken. If so, zpool history would have shown > this. So, in short, _no_ ZFS operations are going on during the > resilvering. Yet, it is restarting. >Does 2007-11-20.02:37:13 actually match the expected timestamp of the original zpool replace command before the first zpool status output listed below? Is it possible that another zpool replace is further up on your pool history (ie it was rerun by an admin or automatically from some service)? -Wade> > > > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 09:58:19 AM: > > > > > On b66: > > > # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \ > > > c0t600A0B8000299CCC000006734741CD4Ed0 > > > < some hours later> > > > # zpool status tww > > > pool: tww > > > state: DEGRADED > > > status: One or more devices is currently being resilvered. Thepool> > will > > > continue to function, possibly in a degraded state. > > > action: Wait for the resilver to complete. > > > scrub: resilver in progress, 62.90% done, 4h26m to go > > > < some hours later> > > > # zpool status tww > > > pool: tww > > > state: DEGRADED > > > status: One or more devices is currently being resilvered. Thepool> > will > > > continue to function, possibly in a degraded state. > > > action: Wait for the resilver to complete. > > > scrub: resilver in progress, 3.85% done, 18h49m to go > > > > > > # zpool history tww | tail -1 > > > 2007-11-20.02:37:13 zpool replace tww > > c0t600A0B80002999660000059E4668CBD3d0 > > > c0t600A0B8000299CCC000006734741CD4Ed0 > > > > > > So, why did resilvering restart when no zfs operations occurred? I > > > just ran zpool status again and now I get: > > > # zpool status tww > > > pool: tww > > > state: DEGRADED > > > status: One or more devices is currently being resilvered. Thepool> > will > > > continue to function, possibly in a degraded state. > > > action: Wait for the resilver to complete. > > > scrub: resilver in progress, 0.00% done, 134h45m to go > > > > > > What''s going on? > > > > > > -- > > > albert chin (china at thewrittenword.com) > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > -- > albert chin (china at thewrittenword.com) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, Nov 20, 2007 at 11:10:20AM -0600, Wade.Stuart at fallon.com wrote:> > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 10:11:50 AM: > > > On Tue, Nov 20, 2007 at 10:01:49AM -0600, Wade.Stuart at fallon.com wrote: > > > Resilver and scrub are broken and restart when a snapshot is created > > > -- the current workaround is to disable snaps while resilvering, > > > the ZFS team is working on the issue for a long term fix. > > > > But, no snapshot was taken. If so, zpool history would have shown > > this. So, in short, _no_ ZFS operations are going on during the > > resilvering. Yet, it is restarting. > > > > Does 2007-11-20.02:37:13 actually match the expected timestamp of > the original zpool replace command before the first zpool status > output listed below?No. We ran some ''zpool status'' commands after the last ''zpool replace''. The ''zpool status'' output in the initial email is from this morning. The only ZFS command we''ve been running is ''zfs list'', ''zpool list tww'', ''zpool status'', or ''zpool status -v'' after the last ''zpool replace''. Server is on GMT time.> Is it possible that another zpool replace is further up on your > pool history (ie it was rerun by an admin or automatically from some > service)?Yes, but a zpool replace for the same bad disk: 2007-11-20.00:57:40 zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 c0t600A0B8000299966000006584741C7C3d0 2007-11-20.02:35:22 zpool detach tww c0t600A0B8000299966000006584741C7C3d0 2007-11-20.02:37:13 zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 c0t600A0B8000299CCC000006734741CD4Ed0 We accidentally removed c0t600A0B8000299966000006584741C7C3d0 from the array, hence the ''zpool detach''. The last ''zpool replace'' has been running for 15h now.> -Wade > > > > > > > > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 09:58:19 AM: > > > > > > > On b66: > > > > # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \ > > > > c0t600A0B8000299CCC000006734741CD4Ed0 > > > > < some hours later> > > > > # zpool status tww > > > > pool: tww > > > > state: DEGRADED > > > > status: One or more devices is currently being resilvered. The > pool > > > will > > > > continue to function, possibly in a degraded state. > > > > action: Wait for the resilver to complete. > > > > scrub: resilver in progress, 62.90% done, 4h26m to go > > > > < some hours later> > > > > # zpool status tww > > > > pool: tww > > > > state: DEGRADED > > > > status: One or more devices is currently being resilvered. The > pool > > > will > > > > continue to function, possibly in a degraded state. > > > > action: Wait for the resilver to complete. > > > > scrub: resilver in progress, 3.85% done, 18h49m to go > > > > > > > > # zpool history tww | tail -1 > > > > 2007-11-20.02:37:13 zpool replace tww > > > c0t600A0B80002999660000059E4668CBD3d0 > > > > c0t600A0B8000299CCC000006734741CD4Ed0 > > > > > > > > So, why did resilvering restart when no zfs operations occurred? I > > > > just ran zpool status again and now I get: > > > > # zpool status tww > > > > pool: tww > > > > state: DEGRADED > > > > status: One or more devices is currently being resilvered. The > pool > > > will > > > > continue to function, possibly in a degraded state. > > > > action: Wait for the resilver to complete. > > > > scrub: resilver in progress, 0.00% done, 134h45m to go > > > > > > > > What''s going on? > > > > > > > > -- > > > > albert chin (china at thewrittenword.com) > > > > _______________________________________________ > > > > zfs-discuss mailing list > > > > zfs-discuss at opensolaris.org > > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > > > > > -- > > albert chin (china at thewrittenword.com) > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- albert chin (china at thewrittenword.com)
On Tue, Nov 20, 2007 at 11:39:30AM -0600, Albert Chin wrote:> On Tue, Nov 20, 2007 at 11:10:20AM -0600, Wade.Stuart at fallon.com wrote: > > > > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 10:11:50 AM: > > > > > On Tue, Nov 20, 2007 at 10:01:49AM -0600, Wade.Stuart at fallon.com wrote: > > > > Resilver and scrub are broken and restart when a snapshot is created > > > > -- the current workaround is to disable snaps while resilvering, > > > > the ZFS team is working on the issue for a long term fix. > > > > > > But, no snapshot was taken. If so, zpool history would have shown > > > this. So, in short, _no_ ZFS operations are going on during the > > > resilvering. Yet, it is restarting. > > > > > > > Does 2007-11-20.02:37:13 actually match the expected timestamp of > > the original zpool replace command before the first zpool status > > output listed below? > > No. We ran some ''zpool status'' commands after the last ''zpool > replace''. The ''zpool status'' output in the initial email is from this > morning. The only ZFS command we''ve been running is ''zfs list'', ''zpool > list tww'', ''zpool status'', or ''zpool status -v'' after the last ''zpool > replace''.I think the ''zpool status'' command was resetting the resilvering. We upgraded to b77 this morning which did not exhibit this problem. Resilvering is now done.> Server is on GMT time. > > > Is it possible that another zpool replace is further up on your > > pool history (ie it was rerun by an admin or automatically from some > > service)? > > Yes, but a zpool replace for the same bad disk: > 2007-11-20.00:57:40 zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 > c0t600A0B8000299966000006584741C7C3d0 > 2007-11-20.02:35:22 zpool detach tww c0t600A0B8000299966000006584741C7C3d0 > 2007-11-20.02:37:13 zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 > c0t600A0B8000299CCC000006734741CD4Ed0 > > We accidentally removed c0t600A0B8000299966000006584741C7C3d0 from the > array, hence the ''zpool detach''. > > The last ''zpool replace'' has been running for 15h now. > > > -Wade > > > > > > > > > > > > zfs-discuss-bounces at opensolaris.org wrote on 11/20/2007 09:58:19 AM: > > > > > > > > > On b66: > > > > > # zpool replace tww c0t600A0B80002999660000059E4668CBD3d0 \ > > > > > c0t600A0B8000299CCC000006734741CD4Ed0 > > > > > < some hours later> > > > > > # zpool status tww > > > > > pool: tww > > > > > state: DEGRADED > > > > > status: One or more devices is currently being resilvered. The > > pool > > > > will > > > > > continue to function, possibly in a degraded state. > > > > > action: Wait for the resilver to complete. > > > > > scrub: resilver in progress, 62.90% done, 4h26m to go > > > > > < some hours later> > > > > > # zpool status tww > > > > > pool: tww > > > > > state: DEGRADED > > > > > status: One or more devices is currently being resilvered. The > > pool > > > > will > > > > > continue to function, possibly in a degraded state. > > > > > action: Wait for the resilver to complete. > > > > > scrub: resilver in progress, 3.85% done, 18h49m to go > > > > > > > > > > # zpool history tww | tail -1 > > > > > 2007-11-20.02:37:13 zpool replace tww > > > > c0t600A0B80002999660000059E4668CBD3d0 > > > > > c0t600A0B8000299CCC000006734741CD4Ed0 > > > > > > > > > > So, why did resilvering restart when no zfs operations occurred? I > > > > > just ran zpool status again and now I get: > > > > > # zpool status tww > > > > > pool: tww > > > > > state: DEGRADED > > > > > status: One or more devices is currently being resilvered. The > > pool > > > > will > > > > > continue to function, possibly in a degraded state. > > > > > action: Wait for the resilver to complete. > > > > > scrub: resilver in progress, 0.00% done, 134h45m to go > > > > > > > > > > What''s going on? > > > > > > > > > > -- > > > > > albert chin (china at thewrittenword.com) > > > > > _______________________________________________ > > > > > zfs-discuss mailing list > > > > > zfs-discuss at opensolaris.org > > > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > > > _______________________________________________ > > > > zfs-discuss mailing list > > > > zfs-discuss at opensolaris.org > > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > > > > > > > > > -- > > > albert chin (china at thewrittenword.com) > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > > -- > albert chin (china at thewrittenword.com) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- albert chin (china at thewrittenword.com)