I had a disk malfunction in a raidz pool today. I had an extra on in
the enclosure and performed a: zpool replace pool old new and several
unexpected behaviors have transpired:
the zpool replace command "hung" for 52 minutes during which no zpool
commands could be executed (like status, iostat or list).
When it finally returned, the drive was marked as "replacing" as I
expected from reading the man page. However, it''s progress counter
has not been monotonically increasing. It started at 1% and then
went to 5% and then back to 2%, etc. etc.
I just logged in to see if it was "done" and ran zpool status and
received:
pool: xsr_slow_2
state: ONLINE
status: One or more devices is currently being resilvered. The pool
will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 100.00% done, 0h0m to go
config:
NAME STATE READ WRITE CKSUM
xsr_slow_2 ONLINE 0 0 0
raidz ONLINE 0 0 0
c4t6000393000016A1Fd0s2 ONLINE 0 0 0
c4t6000393000016A1Fd1s2 ONLINE 0 0 0
c4t6000393000016A1Fd2s2 ONLINE 0 0 0
c4t6000393000016A1Fd3s2 ONLINE 0 0 0
replacing ONLINE 0 0 0
c4t6000393000016A1Fd4s2 ONLINE 2.87K 251 0
c4t6000393000016A1Fd6 ONLINE 0 0 0
c4t6000393000016A1Fd5s2 ONLINE 0 0 0
I thought to myself, if it is 100% done why is it still replacing? I
waited about 15 seconds and ran the command again to find something
rather disconcerting:
pool: xsr_slow_2
state: ONLINE
status: One or more devices is currently being resilvered. The pool
will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 0.45% done, 27h27m to go
config:
NAME STATE READ WRITE CKSUM
xsr_slow_2 ONLINE 0 0 0
raidz ONLINE 0 0 0
c4t6000393000016A1Fd0s2 ONLINE 0 0 0
c4t6000393000016A1Fd1s2 ONLINE 0 0 0
c4t6000393000016A1Fd2s2 ONLINE 0 0 0
c4t6000393000016A1Fd3s2 ONLINE 0 0 0
replacing ONLINE 0 0 0
c4t6000393000016A1Fd4s2 ONLINE 2.87K 251 0
c4t6000393000016A1Fd6 ONLINE 0 0 0
c4t6000393000016A1Fd5s2 ONLINE 0 0 0
WTF?!
Best regards,
Theo
// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
On Sat, 2006-12-02 at 00:08 -0500, Theo Schlossnagle wrote:> I had a disk malfunction in a raidz pool today. I had an extra on in > the enclosure and performed a: zpool replace pool old new and several > unexpected behaviors have transpired: > > the zpool replace command "hung" for 52 minutes during which no zpool > commands could be executed (like status, iostat or list).So, I''ve observed that zfs will continue to attempt to do I/O to the outgoing drive while a replacement is in progress. (seems counterintuitive - I''d expect that you''d want to touch the outgoing drive as little as possible, perhaps only attempting to read from it in the event that a block wasn''t recoverable from the healthy drives).> When it finally returned, the drive was marked as "replacing" as I > expected from reading the man page. However, it''s progress counter > has not been monotonically increasing. It started at 1% and then > went to 5% and then back to 2%, etc. etc.do you have any cron jobs set up to do periodic snapshots? If so, I think you''re seeing: 6343667 scrub/resilver has to start over when a snapshot is taken I ran into this myself this week - replaced a drive, and the resilver made it to 95% before a snapshot cron job fired and set things back to 0%. - Bill
On Dec 2, 2006, at 1:32 PM, Bill Sommerfeld wrote:> On Sat, 2006-12-02 at 00:08 -0500, Theo Schlossnagle wrote: >> I had a disk malfunction in a raidz pool today. I had an extra on in >> the enclosure and performed a: zpool replace pool old new and several >> unexpected behaviors have transpired: >> >> the zpool replace command "hung" for 52 minutes during which no zpool >> commands could be executed (like status, iostat or list). > > So, I''ve observed that zfs will continue to attempt to do I/O to the > outgoing drive while a replacement is in progress. (seems > counterintuitive - I''d expect that you''d want to touch the outgoing > drive as little as possible, perhaps only attempting to read from > it in > the event that a block wasn''t recoverable from the healthy drives). > >> When it finally returned, the drive was marked as "replacing" as I >> expected from reading the man page. However, it''s progress counter >> has not been monotonically increasing. It started at 1% and then >> went to 5% and then back to 2%, etc. etc. > > do you have any cron jobs set up to do periodic snapshots? > If so, I think you''re seeing: > > 6343667 scrub/resilver has to start over when a snapshot is taken > > I ran into this myself this week - replaced a drive, and the resilver > made it to 95% before a snapshot cron job fired and set things back to > 0%.Yesterday, a snapshot was taking to assist in backups -- that could be it. // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
I am having no luck replacing my drive as well. few days ago I replaced my drive
and its completly messed up now.
pool: mypool2
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 8.70% done, 8h19m to go
config:
NAME STATE READ WRITE CKSUM
mypool2 DEGRADED 0 0 0
raidz DEGRADED 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
replacing DEGRADED 0 0 0
c3t6d0s0/o UNAVAIL 0 0 0 cannot open
c3t6d0 ONLINE 0 0 0
errors: No known data errors
this is what I get, I am running Solaris 10 U2
two days ago I did see 2.00% range, and then like 10h remaining, now its still
going and its already at least few days since it started.
when I do: zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
mypool2 952G 684G 268G 71% DEGRADED -
I have almost 1TB of space.
when I do df -k it does show me only 277gb, it is better than only displaying
12gb as I did see yesterday.
mypool2/d3 277900047 12022884 265877163 5% /d/d3
when I do zfs list I get:
mypool2 684G 254G 52K /mypool2
mypool2/d 191G 254G 189G /mypool2/d
mypool2/d at day_01 653M - 145G -
mypool2/d at day_02 31.2M - 145G -
mypool2/d at day_03 36.8M - 144G -
mypool2/d at day_04 37.9M - 144G -
mypool2/d at day_05 31.7M - 145G -
mypool2/d at day_06 27.7M - 145G -
mypool2/d at day_07 34.0M - 146G -
mypool2/d at day_08 26.8M - 149G -
mypool2/d at day_09 34.4M - 151G -
mypool2/d at hour_14 141K - 189G -
mypool2/d3 492G 254G 11.5G legacy
I am so confused with all of this... Why its taking so long to replace that one
bad disk? Why such different results? What is going on? Is there a problem with
my zpool/zfs combination? Did I do anything wrong? Did I actually loose data on
my drive? If I knew it woul dbe this bad I would just destroy my whole zpool and
zfs and start from the beginning but I wanted to see how would it go trough
replacement to see whats the process... I am so happy I did not use zfs in my
production environment yet to be honest with you...
Chris
On Sat, 2 Dec 2006, Theo Schlossnagle wrote:
> I had a disk malfunction in a raidz pool today. I had an extra on in the
> enclosure and performed a: zpool replace pool old new and several
unexpected
> behaviors have transpired:
>
> the zpool replace command "hung" for 52 minutes during which no
zpool
> commands could be executed (like status, iostat or list).
>
> When it finally returned, the drive was marked as "replacing" as
I expected
> from reading the man page. However, it''s progress counter has not
been
> monotonically increasing. It started at 1% and then went to 5% and then
back
> to 2%, etc. etc.
>
> I just logged in to see if it was "done" and ran zpool status and
received:
>
> pool: xsr_slow_2
> state: ONLINE
> status: One or more devices is currently being resilvered. The pool will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress, 100.00% done, 0h0m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> xsr_slow_2 ONLINE 0 0 0
> raidz ONLINE 0 0 0
> c4t6000393000016A1Fd0s2 ONLINE 0 0 0
> c4t6000393000016A1Fd1s2 ONLINE 0 0 0
> c4t6000393000016A1Fd2s2 ONLINE 0 0 0
> c4t6000393000016A1Fd3s2 ONLINE 0 0 0
> replacing ONLINE 0 0 0
> c4t6000393000016A1Fd4s2 ONLINE 2.87K 251 0
> c4t6000393000016A1Fd6 ONLINE 0 0 0
> c4t6000393000016A1Fd5s2 ONLINE 0 0 0
>
>
> I thought to myself, if it is 100% done why is it still replacing? I waited
> about 15 seconds and ran the command again to find something rather
> disconcerting:
>
> pool: xsr_slow_2
> state: ONLINE
> status: One or more devices is currently being resilvered. The pool will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress, 0.45% done, 27h27m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> xsr_slow_2 ONLINE 0 0 0
> raidz ONLINE 0 0 0
> c4t6000393000016A1Fd0s2 ONLINE 0 0 0
> c4t6000393000016A1Fd1s2 ONLINE 0 0 0
> c4t6000393000016A1Fd2s2 ONLINE 0 0 0
> c4t6000393000016A1Fd3s2 ONLINE 0 0 0
> replacing ONLINE 0 0 0
> c4t6000393000016A1Fd4s2 ONLINE 2.87K 251 0
> c4t6000393000016A1Fd6 ONLINE 0 0 0
> c4t6000393000016A1Fd5s2 ONLINE 0 0 0
>
> WTF?!
>
> Best regards,
>
> Theo
>
> // Theo Schlossnagle
> // CTO -- http://www.omniti.com/~jesus/
> // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> !DSPAM:122,45710a8263849287932!
On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote:> mypool2/d at day_09 34.4M - 151G - > mypool2/d at hour_14 141K - 189G - > mypool2/d3 492G 254G 11.5G legacy > > I am so confused with all of this... Why its taking so long to replace that one > bad disk?To workaround a bug where a pool traverse gets "lost" when the snapshot configuration of a pool changes, both scrubs and resilvers will start over again any time you create or delete a snapshot. Unfortunately, this workaround has problems of its own -- If your inter-snapshot interval is less than the time required to complete a scrub, the resilver will never complete. The open bug is: 6343667 scrub/resilver has to start over when a snapshot is taken if it''s not going to be fixed any time soon, perhaps we need a better workaround: Ideas: - perhaps snapshots should be made to fail while a resilver (not scrub!) is in progress... - or maybe snapshots should fail when a *restarted* resilver is in progress -- that way, if you can complete the resilver between two snapshots times, you don''t miss any snapshots, but if it takes longer than that, snapshots are sacrificed in the name of pool integrity. - Bill
On 12/5/06, Bill Sommerfeld <sommerfeld at sun.com> wrote:> On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote: > > mypool2/d at day_09 34.4M - 151G - > > mypool2/d at hour_14 141K - 189G - > > mypool2/d3 492G 254G 11.5G legacy > > > > I am so confused with all of this... Why its taking so long to replace that one > > bad disk? > > To workaround a bug where a pool traverse gets "lost" when the snapshot > configuration of a pool changes, both scrubs and resilvers will start > over again any time you create or delete a snapshot. > > Unfortunately, this workaround has problems of its own -- If your > inter-snapshot interval is less than the time required to complete a > scrub, the resilver will never complete. > > The open bug is: > > 6343667 scrub/resilver has to start over when a snapshot is taken > > if it''s not going to be fixed any time soon, perhaps we need a better > workaround:Anyone internal working on this? -- Regards, Jeremy
Jeremy Teo wrote:> On 12/5/06, Bill Sommerfeld <sommerfeld at sun.com> wrote: > >> On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote: >> > mypool2/d at day_09 34.4M - 151G - >> > mypool2/d at hour_14 141K - 189G - >> > mypool2/d3 492G 254G 11.5G legacy >> > >> > I am so confused with all of this... Why its taking so long to >> replace that one >> > bad disk? >> >> To workaround a bug where a pool traverse gets "lost" when the snapshot >> configuration of a pool changes, both scrubs and resilvers will start >> over again any time you create or delete a snapshot. >> >> Unfortunately, this workaround has problems of its own -- If your >> inter-snapshot interval is less than the time required to complete a >> scrub, the resilver will never complete. >> >> The open bug is: >> >> 6343667 scrub/resilver has to start over when a snapshot is taken >> >> if it''s not going to be fixed any time soon, perhaps we need a better >> workaround: > > > Anyone internal working on this?Yes. But its going to be a few months. -Mark