I had a disk malfunction in a raidz pool today. I had an extra on in the enclosure and performed a: zpool replace pool old new and several unexpected behaviors have transpired: the zpool replace command "hung" for 52 minutes during which no zpool commands could be executed (like status, iostat or list). When it finally returned, the drive was marked as "replacing" as I expected from reading the man page. However, it''s progress counter has not been monotonically increasing. It started at 1% and then went to 5% and then back to 2%, etc. etc. I just logged in to see if it was "done" and ran zpool status and received: pool: xsr_slow_2 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM xsr_slow_2 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t6000393000016A1Fd0s2 ONLINE 0 0 0 c4t6000393000016A1Fd1s2 ONLINE 0 0 0 c4t6000393000016A1Fd2s2 ONLINE 0 0 0 c4t6000393000016A1Fd3s2 ONLINE 0 0 0 replacing ONLINE 0 0 0 c4t6000393000016A1Fd4s2 ONLINE 2.87K 251 0 c4t6000393000016A1Fd6 ONLINE 0 0 0 c4t6000393000016A1Fd5s2 ONLINE 0 0 0 I thought to myself, if it is 100% done why is it still replacing? I waited about 15 seconds and ran the command again to find something rather disconcerting: pool: xsr_slow_2 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 0.45% done, 27h27m to go config: NAME STATE READ WRITE CKSUM xsr_slow_2 ONLINE 0 0 0 raidz ONLINE 0 0 0 c4t6000393000016A1Fd0s2 ONLINE 0 0 0 c4t6000393000016A1Fd1s2 ONLINE 0 0 0 c4t6000393000016A1Fd2s2 ONLINE 0 0 0 c4t6000393000016A1Fd3s2 ONLINE 0 0 0 replacing ONLINE 0 0 0 c4t6000393000016A1Fd4s2 ONLINE 2.87K 251 0 c4t6000393000016A1Fd6 ONLINE 0 0 0 c4t6000393000016A1Fd5s2 ONLINE 0 0 0 WTF?! Best regards, Theo // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
On Sat, 2006-12-02 at 00:08 -0500, Theo Schlossnagle wrote:> I had a disk malfunction in a raidz pool today. I had an extra on in > the enclosure and performed a: zpool replace pool old new and several > unexpected behaviors have transpired: > > the zpool replace command "hung" for 52 minutes during which no zpool > commands could be executed (like status, iostat or list).So, I''ve observed that zfs will continue to attempt to do I/O to the outgoing drive while a replacement is in progress. (seems counterintuitive - I''d expect that you''d want to touch the outgoing drive as little as possible, perhaps only attempting to read from it in the event that a block wasn''t recoverable from the healthy drives).> When it finally returned, the drive was marked as "replacing" as I > expected from reading the man page. However, it''s progress counter > has not been monotonically increasing. It started at 1% and then > went to 5% and then back to 2%, etc. etc.do you have any cron jobs set up to do periodic snapshots? If so, I think you''re seeing: 6343667 scrub/resilver has to start over when a snapshot is taken I ran into this myself this week - replaced a drive, and the resilver made it to 95% before a snapshot cron job fired and set things back to 0%. - Bill
On Dec 2, 2006, at 1:32 PM, Bill Sommerfeld wrote:> On Sat, 2006-12-02 at 00:08 -0500, Theo Schlossnagle wrote: >> I had a disk malfunction in a raidz pool today. I had an extra on in >> the enclosure and performed a: zpool replace pool old new and several >> unexpected behaviors have transpired: >> >> the zpool replace command "hung" for 52 minutes during which no zpool >> commands could be executed (like status, iostat or list). > > So, I''ve observed that zfs will continue to attempt to do I/O to the > outgoing drive while a replacement is in progress. (seems > counterintuitive - I''d expect that you''d want to touch the outgoing > drive as little as possible, perhaps only attempting to read from > it in > the event that a block wasn''t recoverable from the healthy drives). > >> When it finally returned, the drive was marked as "replacing" as I >> expected from reading the man page. However, it''s progress counter >> has not been monotonically increasing. It started at 1% and then >> went to 5% and then back to 2%, etc. etc. > > do you have any cron jobs set up to do periodic snapshots? > If so, I think you''re seeing: > > 6343667 scrub/resilver has to start over when a snapshot is taken > > I ran into this myself this week - replaced a drive, and the resilver > made it to 95% before a snapshot cron job fired and set things back to > 0%.Yesterday, a snapshot was taking to assist in backups -- that could be it. // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
I am having no luck replacing my drive as well. few days ago I replaced my drive and its completly messed up now. pool: mypool2 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress, 8.70% done, 8h19m to go config: NAME STATE READ WRITE CKSUM mypool2 DEGRADED 0 0 0 raidz DEGRADED 0 0 0 c3t0d0 ONLINE 0 0 0 c3t1d0 ONLINE 0 0 0 c3t2d0 ONLINE 0 0 0 c3t3d0 ONLINE 0 0 0 c3t4d0 ONLINE 0 0 0 c3t5d0 ONLINE 0 0 0 replacing DEGRADED 0 0 0 c3t6d0s0/o UNAVAIL 0 0 0 cannot open c3t6d0 ONLINE 0 0 0 errors: No known data errors this is what I get, I am running Solaris 10 U2 two days ago I did see 2.00% range, and then like 10h remaining, now its still going and its already at least few days since it started. when I do: zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT mypool2 952G 684G 268G 71% DEGRADED - I have almost 1TB of space. when I do df -k it does show me only 277gb, it is better than only displaying 12gb as I did see yesterday. mypool2/d3 277900047 12022884 265877163 5% /d/d3 when I do zfs list I get: mypool2 684G 254G 52K /mypool2 mypool2/d 191G 254G 189G /mypool2/d mypool2/d at day_01 653M - 145G - mypool2/d at day_02 31.2M - 145G - mypool2/d at day_03 36.8M - 144G - mypool2/d at day_04 37.9M - 144G - mypool2/d at day_05 31.7M - 145G - mypool2/d at day_06 27.7M - 145G - mypool2/d at day_07 34.0M - 146G - mypool2/d at day_08 26.8M - 149G - mypool2/d at day_09 34.4M - 151G - mypool2/d at hour_14 141K - 189G - mypool2/d3 492G 254G 11.5G legacy I am so confused with all of this... Why its taking so long to replace that one bad disk? Why such different results? What is going on? Is there a problem with my zpool/zfs combination? Did I do anything wrong? Did I actually loose data on my drive? If I knew it woul dbe this bad I would just destroy my whole zpool and zfs and start from the beginning but I wanted to see how would it go trough replacement to see whats the process... I am so happy I did not use zfs in my production environment yet to be honest with you... Chris On Sat, 2 Dec 2006, Theo Schlossnagle wrote:> I had a disk malfunction in a raidz pool today. I had an extra on in the > enclosure and performed a: zpool replace pool old new and several unexpected > behaviors have transpired: > > the zpool replace command "hung" for 52 minutes during which no zpool > commands could be executed (like status, iostat or list). > > When it finally returned, the drive was marked as "replacing" as I expected > from reading the man page. However, it''s progress counter has not been > monotonically increasing. It started at 1% and then went to 5% and then back > to 2%, etc. etc. > > I just logged in to see if it was "done" and ran zpool status and received: > > pool: xsr_slow_2 > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress, 100.00% done, 0h0m to go > config: > > NAME STATE READ WRITE CKSUM > xsr_slow_2 ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c4t6000393000016A1Fd0s2 ONLINE 0 0 0 > c4t6000393000016A1Fd1s2 ONLINE 0 0 0 > c4t6000393000016A1Fd2s2 ONLINE 0 0 0 > c4t6000393000016A1Fd3s2 ONLINE 0 0 0 > replacing ONLINE 0 0 0 > c4t6000393000016A1Fd4s2 ONLINE 2.87K 251 0 > c4t6000393000016A1Fd6 ONLINE 0 0 0 > c4t6000393000016A1Fd5s2 ONLINE 0 0 0 > > > I thought to myself, if it is 100% done why is it still replacing? I waited > about 15 seconds and ran the command again to find something rather > disconcerting: > > pool: xsr_slow_2 > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress, 0.45% done, 27h27m to go > config: > > NAME STATE READ WRITE CKSUM > xsr_slow_2 ONLINE 0 0 0 > raidz ONLINE 0 0 0 > c4t6000393000016A1Fd0s2 ONLINE 0 0 0 > c4t6000393000016A1Fd1s2 ONLINE 0 0 0 > c4t6000393000016A1Fd2s2 ONLINE 0 0 0 > c4t6000393000016A1Fd3s2 ONLINE 0 0 0 > replacing ONLINE 0 0 0 > c4t6000393000016A1Fd4s2 ONLINE 2.87K 251 0 > c4t6000393000016A1Fd6 ONLINE 0 0 0 > c4t6000393000016A1Fd5s2 ONLINE 0 0 0 > > WTF?! > > Best regards, > > Theo > > // Theo Schlossnagle > // CTO -- http://www.omniti.com/~jesus/ > // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/ > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > !DSPAM:122,45710a8263849287932!
On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote:> mypool2/d at day_09 34.4M - 151G - > mypool2/d at hour_14 141K - 189G - > mypool2/d3 492G 254G 11.5G legacy > > I am so confused with all of this... Why its taking so long to replace that one > bad disk?To workaround a bug where a pool traverse gets "lost" when the snapshot configuration of a pool changes, both scrubs and resilvers will start over again any time you create or delete a snapshot. Unfortunately, this workaround has problems of its own -- If your inter-snapshot interval is less than the time required to complete a scrub, the resilver will never complete. The open bug is: 6343667 scrub/resilver has to start over when a snapshot is taken if it''s not going to be fixed any time soon, perhaps we need a better workaround: Ideas: - perhaps snapshots should be made to fail while a resilver (not scrub!) is in progress... - or maybe snapshots should fail when a *restarted* resilver is in progress -- that way, if you can complete the resilver between two snapshots times, you don''t miss any snapshots, but if it takes longer than that, snapshots are sacrificed in the name of pool integrity. - Bill
On 12/5/06, Bill Sommerfeld <sommerfeld at sun.com> wrote:> On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote: > > mypool2/d at day_09 34.4M - 151G - > > mypool2/d at hour_14 141K - 189G - > > mypool2/d3 492G 254G 11.5G legacy > > > > I am so confused with all of this... Why its taking so long to replace that one > > bad disk? > > To workaround a bug where a pool traverse gets "lost" when the snapshot > configuration of a pool changes, both scrubs and resilvers will start > over again any time you create or delete a snapshot. > > Unfortunately, this workaround has problems of its own -- If your > inter-snapshot interval is less than the time required to complete a > scrub, the resilver will never complete. > > The open bug is: > > 6343667 scrub/resilver has to start over when a snapshot is taken > > if it''s not going to be fixed any time soon, perhaps we need a better > workaround:Anyone internal working on this? -- Regards, Jeremy
Jeremy Teo wrote:> On 12/5/06, Bill Sommerfeld <sommerfeld at sun.com> wrote: > >> On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote: >> > mypool2/d at day_09 34.4M - 151G - >> > mypool2/d at hour_14 141K - 189G - >> > mypool2/d3 492G 254G 11.5G legacy >> > >> > I am so confused with all of this... Why its taking so long to >> replace that one >> > bad disk? >> >> To workaround a bug where a pool traverse gets "lost" when the snapshot >> configuration of a pool changes, both scrubs and resilvers will start >> over again any time you create or delete a snapshot. >> >> Unfortunately, this workaround has problems of its own -- If your >> inter-snapshot interval is less than the time required to complete a >> scrub, the resilver will never complete. >> >> The open bug is: >> >> 6343667 scrub/resilver has to start over when a snapshot is taken >> >> if it''s not going to be fixed any time soon, perhaps we need a better >> workaround: > > > Anyone internal working on this?Yes. But its going to be a few months. -Mark