thr3ads.net - zfs discuss - [zfs-discuss] replacing a drive in a raidz vdev [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Theo Schlossnagle

2006-Dec-02 05:08 UTC

[zfs-discuss] replacing a drive in a raidz vdev

I had a disk malfunction in a raidz pool today.  I had an extra on in  
the enclosure and performed a: zpool replace pool old new and several  
unexpected behaviors have transpired:

the zpool replace command "hung" for 52 minutes during which no zpool
commands could be executed (like status, iostat or list).

When it finally returned, the drive was marked as "replacing" as I  
expected from reading the man page.  However, it''s progress counter  
has not been monotonically increasing.  It started at 1% and then  
went to 5% and then back to 2%, etc. etc.

I just logged in to see if it was "done" and ran zpool status and  
received:

   pool: xsr_slow_2
state: ONLINE
status: One or more devices is currently being resilvered.  The pool  
will
         continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 100.00% done, 0h0m to go
config:

         NAME                           STATE     READ WRITE CKSUM
         xsr_slow_2                     ONLINE       0     0     0
           raidz                        ONLINE       0     0     0
             c4t6000393000016A1Fd0s2    ONLINE       0     0     0
             c4t6000393000016A1Fd1s2    ONLINE       0     0     0
             c4t6000393000016A1Fd2s2    ONLINE       0     0     0
             c4t6000393000016A1Fd3s2    ONLINE       0     0     0
             replacing                  ONLINE       0     0     0
               c4t6000393000016A1Fd4s2  ONLINE   2.87K   251     0
               c4t6000393000016A1Fd6    ONLINE       0     0     0
             c4t6000393000016A1Fd5s2    ONLINE       0     0     0


I thought to myself, if it is 100% done why is it still replacing? I  
waited about 15 seconds and ran the command again to find something  
rather disconcerting:

   pool: xsr_slow_2
state: ONLINE
status: One or more devices is currently being resilvered.  The pool  
will
         continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 0.45% done, 27h27m to go
config:

         NAME                           STATE     READ WRITE CKSUM
         xsr_slow_2                     ONLINE       0     0     0
           raidz                        ONLINE       0     0     0
             c4t6000393000016A1Fd0s2    ONLINE       0     0     0
             c4t6000393000016A1Fd1s2    ONLINE       0     0     0
             c4t6000393000016A1Fd2s2    ONLINE       0     0     0
             c4t6000393000016A1Fd3s2    ONLINE       0     0     0
             replacing                  ONLINE       0     0     0
               c4t6000393000016A1Fd4s2  ONLINE   2.87K   251     0
               c4t6000393000016A1Fd6    ONLINE       0     0     0
             c4t6000393000016A1Fd5s2    ONLINE       0     0     0

WTF?!

Best regards,

Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

Bill Sommerfeld

2006-Dec-02 18:32 UTC

head link

[zfs-discuss] replacing a drive in a raidz vdev

On Sat, 2006-12-02 at 00:08 -0500, Theo Schlossnagle
wrote:> I had a disk malfunction in a raidz pool today.  I had an extra on in  
> the enclosure and performed a: zpool replace pool old new and several  
> unexpected behaviors have transpired:
> 
> the zpool replace command "hung" for 52 minutes during which no
zpool
> commands could be executed (like status, iostat or list).
So, I''ve observed that zfs will continue to attempt to do I/O to the
outgoing drive while a replacement is in progress.  (seems
counterintuitive - I''d expect that you''d want to touch the
outgoing
drive as little as possible, perhaps only attempting to read from it in
the event that a block wasn''t recoverable from the healthy drives).
> When it finally returned, the drive was marked as "replacing" as
I
> expected from reading the man page.  However, it''s progress
counter
> has not been monotonically increasing.  It started at 1% and then  
> went to 5% and then back to 2%, etc. etc.
do you have any cron jobs set up to do periodic snapshots?
If so, I think you''re seeing:

6343667 scrub/resilver has to start over when a snapshot is taken

I ran into this myself this week - replaced a drive, and the resilver
made it to 95% before a snapshot cron job fired and set things back to
0%.

						- Bill

Theo Schlossnagle

2006-Dec-02 18:40 UTC

head link

[zfs-discuss] replacing a drive in a raidz vdev

On Dec 2, 2006, at 1:32 PM, Bill Sommerfeld wrote:
> On Sat, 2006-12-02 at 00:08 -0500, Theo Schlossnagle wrote:
>> I had a disk malfunction in a raidz pool today.  I had an extra on in
>> the enclosure and performed a: zpool replace pool old new and several
>> unexpected behaviors have transpired:
>>
>> the zpool replace command "hung" for 52 minutes during which
no zpool
>> commands could be executed (like status, iostat or list).
>
> So, I''ve observed that zfs will continue to attempt to do I/O to
the
> outgoing drive while a replacement is in progress.  (seems
> counterintuitive - I''d expect that you''d want to touch
the outgoing
> drive as little as possible, perhaps only attempting to read from  
> it in
> the event that a block wasn''t recoverable from the healthy
drives).
>
>> When it finally returned, the drive was marked as "replacing"
as I
>> expected from reading the man page.  However, it''s progress
counter
>> has not been monotonically increasing.  It started at 1% and then
>> went to 5% and then back to 2%, etc. etc.
>
> do you have any cron jobs set up to do periodic snapshots?
> If so, I think you''re seeing:
>
> 6343667 scrub/resilver has to start over when a snapshot is taken
>
> I ran into this myself this week - replaced a drive, and the resilver
> made it to 95% before a snapshot cron job fired and set things back to
> 0%.
Yesterday, a snapshot was taking to assist in backups -- that could  
be it.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/

Krzys

2006-Dec-04 18:56 UTC

head link

[zfs-discuss] replacing a drive in a raidz vdev

I am having no luck replacing my drive as well. few days ago I replaced my drive
and its completly messed up now.

   pool: mypool2
  state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
         continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scrub: resilver in progress, 8.70% done, 8h19m to go
config:

         NAME              STATE     READ WRITE CKSUM
         mypool2           DEGRADED     0     0     0
           raidz           DEGRADED     0     0     0
             c3t0d0        ONLINE       0     0     0
             c3t1d0        ONLINE       0     0     0
             c3t2d0        ONLINE       0     0     0
             c3t3d0        ONLINE       0     0     0
             c3t4d0        ONLINE       0     0     0
             c3t5d0        ONLINE       0     0     0
             replacing     DEGRADED     0     0     0
               c3t6d0s0/o  UNAVAIL      0     0     0  cannot open
               c3t6d0      ONLINE       0     0     0

errors: No known data errors

this is what I get, I am running Solaris 10 U2
two days ago I did see 2.00% range, and then like 10h remaining, now its still 
going and its already at least few days since it started.

when I do: zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
mypool2                 952G    684G    268G    71%  DEGRADED   -

I have almost 1TB of space.
when I do df -k it does show me only 277gb, it is better than only displaying 
12gb as I did see yesterday.
mypool2/d3           277900047  12022884 265877163   5% /d/d3

when I do zfs list I get:
mypool2                684G   254G    52K  /mypool2
mypool2/d              191G   254G   189G  /mypool2/d
mypool2/d at day_01       653M      -   145G  -
mypool2/d at day_02      31.2M      -   145G  -
mypool2/d at day_03      36.8M      -   144G  -
mypool2/d at day_04      37.9M      -   144G  -
mypool2/d at day_05      31.7M      -   145G  -
mypool2/d at day_06      27.7M      -   145G  -
mypool2/d at day_07      34.0M      -   146G  -
mypool2/d at day_08      26.8M      -   149G  -
mypool2/d at day_09      34.4M      -   151G  -
mypool2/d at hour_14      141K      -   189G  -
mypool2/d3             492G   254G  11.5G  legacy

I am so confused with all of this... Why its taking so long to replace that one 
bad disk? Why such different results? What is going on? Is there a problem with 
my zpool/zfs combination? Did I do anything wrong? Did I actually loose data on 
my drive? If I knew it woul dbe this bad I would just destroy my whole zpool and
zfs and start from the beginning but I wanted to see how would it go trough 
replacement to see whats the process... I am so happy I did not use zfs in my 
production environment yet to be honest with you...

Chris



On Sat, 2 Dec 2006, Theo Schlossnagle wrote:
> I had a disk malfunction in a raidz pool today.  I had an extra on in the 
> enclosure and performed a: zpool replace pool old new and several
unexpected
> behaviors have transpired:
>
> the zpool replace command "hung" for 52 minutes during which no
zpool
> commands could be executed (like status, iostat or list).
>
> When it finally returned, the drive was marked as "replacing" as
I expected
> from reading the man page.  However, it''s progress counter has not
been
> monotonically increasing.  It started at 1% and then went to 5% and then
back
> to 2%, etc. etc.
>
> I just logged in to see if it was "done" and ran zpool status and
received:
>
> pool: xsr_slow_2
> state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>       continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress, 100.00% done, 0h0m to go
> config:
>
>       NAME                           STATE     READ WRITE CKSUM
>       xsr_slow_2                     ONLINE       0     0     0
>         raidz                        ONLINE       0     0     0
>           c4t6000393000016A1Fd0s2    ONLINE       0     0     0
>           c4t6000393000016A1Fd1s2    ONLINE       0     0     0
>           c4t6000393000016A1Fd2s2    ONLINE       0     0     0
>           c4t6000393000016A1Fd3s2    ONLINE       0     0     0
>           replacing                  ONLINE       0     0     0
>             c4t6000393000016A1Fd4s2  ONLINE   2.87K   251     0
>             c4t6000393000016A1Fd6    ONLINE       0     0     0
>           c4t6000393000016A1Fd5s2    ONLINE       0     0     0
>
>
> I thought to myself, if it is 100% done why is it still replacing? I waited
> about 15 seconds and ran the command again to find something rather 
> disconcerting:
>
> pool: xsr_slow_2
> state: ONLINE
> status: One or more devices is currently being resilvered.  The pool will
>       continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress, 0.45% done, 27h27m to go
> config:
>
>       NAME                           STATE     READ WRITE CKSUM
>       xsr_slow_2                     ONLINE       0     0     0
>         raidz                        ONLINE       0     0     0
>           c4t6000393000016A1Fd0s2    ONLINE       0     0     0
>           c4t6000393000016A1Fd1s2    ONLINE       0     0     0
>           c4t6000393000016A1Fd2s2    ONLINE       0     0     0
>           c4t6000393000016A1Fd3s2    ONLINE       0     0     0
>           replacing                  ONLINE       0     0     0
>             c4t6000393000016A1Fd4s2  ONLINE   2.87K   251     0
>             c4t6000393000016A1Fd6    ONLINE       0     0     0
>           c4t6000393000016A1Fd5s2    ONLINE       0     0     0
>
> WTF?!
>
> Best regards,
>
> Theo
>
> // Theo Schlossnagle
> // CTO -- http://www.omniti.com/~jesus/
> // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> !DSPAM:122,45710a8263849287932!

Bill Sommerfeld

2006-Dec-04 19:36 UTC

head link

[zfs-discuss] replacing a drive in a raidz vdev

On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote:> mypool2/d at day_09      34.4M      -   151G  -
> mypool2/d at hour_14      141K      -   189G  -
> mypool2/d3             492G   254G  11.5G  legacy
> 
> I am so confused with all of this... Why its taking so long to replace that
one
> bad disk?
To workaround a bug where a pool traverse gets "lost" when the
snapshot
configuration of a pool changes, both scrubs and resilvers will start
over again any time you create or delete a snapshot.

Unfortunately, this workaround has problems of its own -- If your
inter-snapshot interval is less than the time required to complete a
scrub, the resilver will never complete.  

The open bug is:

6343667 scrub/resilver has to start over when a snapshot is taken

if it''s not going to be fixed any time soon, perhaps we need a better
workaround:

Ideas:
  - perhaps snapshots should be made to fail while a resilver (not
scrub!) is in progress...

  - or maybe snapshots should fail when a *restarted* resilver is in
progress -- that way, if you can complete the resilver between two
snapshots times, you don''t miss any snapshots, but if it takes longer
than that, snapshots are sacrificed in the name of pool integrity.


							- Bill

Jeremy Teo

2006-Dec-05 09:18 UTC

head link

[zfs-discuss] replacing a drive in a raidz vdev

On 12/5/06, Bill Sommerfeld <sommerfeld at sun.com>
wrote:> On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote:
> > mypool2/d at day_09      34.4M      -   151G  -
> > mypool2/d at hour_14      141K      -   189G  -
> > mypool2/d3             492G   254G  11.5G  legacy
> >
> > I am so confused with all of this... Why its taking so long to replace
that one
> > bad disk?
>
> To workaround a bug where a pool traverse gets "lost" when the
snapshot
> configuration of a pool changes, both scrubs and resilvers will start
> over again any time you create or delete a snapshot.
>
> Unfortunately, this workaround has problems of its own -- If your
> inter-snapshot interval is less than the time required to complete a
> scrub, the resilver will never complete.
>
> The open bug is:
>
> 6343667 scrub/resilver has to start over when a snapshot is taken
>
> if it''s not going to be fixed any time soon, perhaps we need a
better
> workaround:
Anyone internal working on this?
-- 
Regards,
Jeremy

Mark Maybee

2006-Dec-05 18:35 UTC

head link

[zfs-discuss] replacing a drive in a raidz vdev

Jeremy Teo wrote:> On 12/5/06, Bill Sommerfeld <sommerfeld at sun.com> wrote:
> 
>> On Mon, 2006-12-04 at 13:56 -0500, Krzys wrote:
>> > mypool2/d at day_09      34.4M      -   151G  -
>> > mypool2/d at hour_14      141K      -   189G  -
>> > mypool2/d3             492G   254G  11.5G  legacy
>> >
>> > I am so confused with all of this... Why its taking so long to 
>> replace that one
>> > bad disk?
>>
>> To workaround a bug where a pool traverse gets "lost" when
the snapshot
>> configuration of a pool changes, both scrubs and resilvers will start
>> over again any time you create or delete a snapshot.
>>
>> Unfortunately, this workaround has problems of its own -- If your
>> inter-snapshot interval is less than the time required to complete a
>> scrub, the resilver will never complete.
>>
>> The open bug is:
>>
>> 6343667 scrub/resilver has to start over when a snapshot is taken
>>
>> if it''s not going to be fixed any time soon, perhaps we need a
better
>> workaround:
> 
> 
> Anyone internal working on this?
Yes.  But its going to be a few months.

-Mark

Jeremy Teo

2006-Dec-09 16:28 UTC

head link

[zfs-discuss] replacing a drive in a raidz vdev

> Yes.  But its going to be a few months.
i''ll presume that we will get background disk scrubbing for free once
you guys get bookmarking  done. :)

-- 
Regards,
Jeremy

zfs discuss - Dec 2006 - replacing a drive in a raidz vdev

[zfs-discuss] replacing a drive in a raidz vdev

[zfs-discuss] replacing a drive in a raidz vdev

[zfs-discuss] replacing a drive in a raidz vdev

[zfs-discuss] replacing a drive in a raidz vdev

[zfs-discuss] replacing a drive in a raidz vdev

[zfs-discuss] replacing a drive in a raidz vdev

[zfs-discuss] replacing a drive in a raidz vdev

[zfs-discuss] replacing a drive in a raidz vdev