thr3ads.net - zfs discuss - [zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134) [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Mark Sandrock

2010-Nov-01 19:33 UTC

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

Hello,

	I''m working with someone who replaced a failed 1TB drive (50%
utilized),
on an X4540 running OS build 134, and I think something must be wrong.

Last Tuesday afternoon, zpool status reported:

scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go

and a week being 168 hours, that put completion at sometime tomorrow night.

However, he just reported zpool status shows:

scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go

so it''s looking more like 2011 now. That can''t be right.

I''m hoping for a suggestion or two on this issue.

I''d search the archives, but they don''t seem searchable. Or am
I wrong about that?

Thanks.
Mark (subscription pending)


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101101/aae975ff/attachment.html>

Ross Walker

2010-Nov-01 22:55 UTC

head link

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

On Nov 1, 2010, at 3:33 PM, Mark Sandrock <Mark.Sandrock at oracle.com>
wrote:
> Hello,
> 
> 	I''m working with someone who replaced a failed 1TB drive (50%
utilized),
> on an X4540 running OS build 134, and I think something must be wrong.
> 
> Last Tuesday afternoon, zpool status reported:
> 
> scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
> 
> and a week being 168 hours, that put completion at sometime tomorrow night.
> 
> However, he just reported zpool status shows:
> 
> scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
> 
> so it''s looking more like 2011 now. That can''t be right.
> 
> I''m hoping for a suggestion or two on this issue.
> 
> I''d search the archives, but they don''t seem searchable.
Or am I wrong about that?
Some zpool versions have an issue where snapshot creation/deletion during a
resilver causes it to start over.

Try suspending all snapshot activity during the resilver.

-Ross

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101101/e04fce81/attachment.html>

Edward Ned Harvey

2010-Nov-01 23:06 UTC

head link

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Mark Sandrock
> 
> 	I''m working with someone who replaced a failed 1TB drive (50%
> utilized),
> on an X4540 running OS build 134, and I think something must be wrong.
> 
> Last Tuesday afternoon, zpool status reported:
> 
> scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
> 
> and a week being 168 hours, that put completion at sometime tomorrow
> night.
> 
> However, he just reported zpool status shows:
> 
> scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
> 
> so it''s looking more like 2011 now. That can''t be right.
> 
> I''m hoping for a suggestion or two on this issue.
For a typical live system, which has been in production for a long time with
files being created, snapshotted, partially overwritten, snapshots
destroyed, etc etc...  the blocks written to disk tend to be largely written
in random order.  And at least for now, the order of resilvering blocks is
by creation time, not disk location.  So resilver time is typically limited
by IOPS for random IO, and the number of records that are in the affected
vdev.  

To reduce the number of records in an affected vdev, it is effective to
build the pool using mirrors instead of raidz... Or use smaller vdevs of
raidz1 instead of large raidz3.  Unfortunately, you''re not going to be
able
to change that with an existing system.  Roughly speaking, a 23-disk raidz3
with capacity of 20 disks would take 40x longer to resilver than one of the
mirrors in a 40-disk stripe of mirrors with capacity of 20 disks.  In rough
numbers, that might be 20 days instead of 12 hours.

To reduce the IOPS time...  (for background info: Under normal
circumstances, you should disable the HBA WriteBack cache if you have
dedicated log present (on the X4275 that is done via realtek HBA utility, I
don''t know about X4540) ) ... But during resilver, you might enable the
WriteBack for the drive that''s being resilvered.  I don''t know
for sure if
that will help, but I think it should make some difference, because the
logic which led to the disabling of WB does not apply to resilver writes.

To reduce the number of records to resilver...
* If possible, disable the creation of new snapshots while resilver is
running.  
* If possible, delete files and destroy old snapshots that are not needed
anymore
* If possible, limit new writes to the system.

By the way, I''m sorry to say ... Also don''t trust the progress
indicator.
You''re likely to reach 100% completed, and stay there for a long time. 
Even
2T resilvered on a 1T disk... This is an ugly area which looks bad in the
face, but it''s actually physically correct because the filesystem is in
use,
and performing new writes during the resilver...

To reduce the IOPS time...
* If possible, limit the "live" IO to the system.  Resilver has lower
priority and therefore gets delayed a lot for production systems.
* Definitely DON''T scrub the pool while it''s resilvering.

Maybe you might be able to offload some of the IO by adding cache devices,
dedicated log, or ram?  Meaning... I know it''s sound in principle, but
YMMV
immensely, depending on your workload.

All of the above is likely to be not amazingly effective.  There''s not
much
you can do, if you started with a huge raidz3, for example.  The most
important thing you can do to affect resilver time is choose to use mirrors
instead of raidz, at the time of pool creation.

So, as a last ditch effort ...  If you "zfs send" the pool to some
other
storage, and then recreate your local pool, which will be empty and
therefore resilver completed, because zfs only resilvers used blocks... and
then "zfs send" the data back to restore the pool...  Then besides the
fact
that your resilver has been forcibly completed, your received data will also
be ordered on disk optimally, which will greatly help in case another
resilver is needed in the near future... and you create an opportunity to
revisit the pool architecture, possibly in favor of mirrors instead of
raidz.

Ian Collins

2010-Nov-02 05:10 UTC

head link

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

On 11/ 2/10 08:33 AM, Mark Sandrock wrote:> Hello,
>
> I''m working with someone who replaced a failed 1TB drive (50%
utilized),
> on an X4540 running OS build 134, and I think something must be wrong.
>
> Last Tuesday afternoon, zpool status reported:
>
> scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
>
> and a week being 168 hours, that put completion at sometime tomorrow 
> night.
>
> However, he just reported zpool status shows:
>
> scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
>
> so it''s looking more like 2011 now. That can''t be right.
>
> I''m hoping for a suggestion or two on this issue.
>
> I''d search the archives, but they don''t seem searchable.
Or am I wrong
> about that?
How is the pool configured?

I look after a very busy x5400 with 500G drives configured as 8 drive 
raidz2 and these take about 100 hours to resilver.  The workload on this 
box is probably worst case for resivering, it receives a steady stream 
of snapshots.

-- 
Ian.

Ian Collins

2010-Nov-02 05:11 UTC

head link

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

On 11/ 2/10 11:55 AM, Ross Walker wrote:> On Nov 1, 2010, at 3:33 PM, Mark Sandrock <Mark.Sandrock at oracle.com 
> <mailto:Mark.Sandrock at oracle.com>> wrote:
>
>> Hello,
>>
>> I''m working with someone who replaced a failed 1TB drive (50%
utilized),
>> on an X4540 running OS build 134, and I think something must be wrong.
>>
>> Last Tuesday afternoon, zpool status reported:
>>
>> scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
>>
>> and a week being 168 hours, that put completion at sometime tomorrow 
>> night.
>>
>> However, he just reported zpool status shows:
>>
>> scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
>>
>> so it''s looking more like 2011 now. That can''t be
right.
>>
>> I''m hoping for a suggestion or two on this issue.
>>
>> I''d search the archives, but they don''t seem
searchable. Or am I
>> wrong about that?
>
> Some zpool versions have an issue where snapshot creation/deletion 
> during a resilver causes it to start over.
>That was fixed long before build 134.
> Try suspending all snapshot activity during the resilver.
>This always helps!

-- 
Ian.

Mark Sandrock

2010-Nov-02 10:41 UTC

head link

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

On Nov 2, 2010, at 12:10 AM, Ian Collins wrote:
> On 11/ 2/10 08:33 AM, Mark Sandrock wrote:
>> 
>> 
>> I''m working with someone who replaced a failed 1TB drive (50%
utilized),
>> on an X4540 running OS build 134, and I think something must be wrong.
>> 
>> Last Tuesday afternoon, zpool status reported:
>> 
>> scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
>> 
>> and a week being 168 hours, that put completion at sometime tomorrow
night.
>> 
>> However, he just reported zpool status shows:
>> 
>> scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
>> 
>> so it''s looking more like 2011 now. That can''t be
right.
>> 
>> 
> How is the pool configured?
Both 10 and 12 disk RAIDZ-2. That, plus too much other io
must be the problem. I''m thinking 5 x (7-2) would be better,
assuming he doesn''t want to go RAID-10.

Thanks much for all the helpful replies.

Mark> 
> I look after a very busy x5400 with 500G drives configured as 8 drive
raidz2 and these take about 100 hours to resilver.  The workload on this box is
probably worst case for resivering, it receives a steady stream of snapshots.
> 
> -- 
> Ian.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Richard Elling

2010-Nov-16 09:02 UTC

head link

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

Measure the I/O performance with iostat.  You should see something that
looks sorta like (iostat -zxCn 10):
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 5948.9  349.3 40322.3 5238.1 0.1 16.7    0.0    2.7   0 330 c9
    3.7    0.0  230.7    0.0  0.0  0.1    0.0   13.5   0   2 c9t1d0
  845.0    0.0 5497.4    0.0  0.0  0.9    0.0    1.1   1  32 c9t2d0
    3.8    0.0  230.7    0.0  0.0  0.0    0.0   10.6   0   1 c9t3d0
  845.2    0.0 5495.4    0.0  0.0  0.9    0.0    1.1   1  32 c9t4d0
    3.8    0.0  237.1    0.0  0.0  0.0    0.0   10.4   0   1 c9t5d0
  841.4    0.0 5519.7    0.0  0.0  0.9    0.0    1.1   1  32 c9t6d0
    3.8    0.0  237.3    0.0  0.0  0.0    0.0    9.2   0   1 c9t7d0
  843.5    0.0 5485.2    0.0  0.0  0.9    0.0    1.1   1  31 c9t8d0
    3.7    0.0  230.8    0.0  0.0  0.1    0.0   15.2   0   2 c9t9d0
  850.2    0.0 5488.6    0.0  0.0  0.9    0.0    1.1   1  31 c9t10d0
    3.1    0.0  211.2    0.0  0.0  0.0    0.0   13.2   0   1 c9t11d0
  847.9    0.0 5523.4    0.0  0.0  0.9    0.0    1.1   1  31 c9t12d0
    3.1    0.0  204.9    0.0  0.0  0.0    0.0    9.6   0   1 c9t13d0
  847.2    0.0 5506.0    0.0  0.0  0.9    0.0    1.1   1  31 c9t14d0
    3.4    0.0  224.1    0.0  0.0  0.0    0.0   12.3   0   1 c9t15d0
    0.0  349.3    0.0 5238.1  0.0  9.9    0.0   28.4   1 100 c9t16d0

Here you can clearly see a raidz2 resilver in progress. c9t16d0
is the disk being resilvered (write workload) and half of the 
others are being read to generate the resilvering data.  Note
the relative performance and the ~30% busy for the surviving
disks.  If you see iostat output that looks significantly different
than this, then you might be seeing one of two common causes:

1. Your version of ZFS has the new resilver throttle *and* the
  pool is otherwise servicing I/O.

2. Disks are throwing errors or responding very slowly.  Use
  fmdump -eV to observe error reports.

 -- richard

On Nov 1, 2010, at 12:33 PM, Mark Sandrock wrote:
> Hello,
> 
> 	I''m working with someone who replaced a failed 1TB drive (50%
utilized),
> on an X4540 running OS build 134, and I think something must be wrong.
> 
> Last Tuesday afternoon, zpool status reported:
> 
> scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
> 
> and a week being 168 hours, that put completion at sometime tomorrow night.
> 
> However, he just reported zpool status shows:
> 
> scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
> 
> so it''s looking more like 2011 now. That can''t be right.
> 
> I''m hoping for a suggestion or two on this issue.
> 
> I''d search the archives, but they don''t seem searchable.
Or am I wrong about that?
> 
> Thanks.
> Mark (subscription pending)
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
ZFS and performance consulting
http://www.RichardElling.com

Possibly Parallel Threads

Search for more reasonably related threads

zfs discuss - Nov 2010 - Excruciatingly slow resilvering on X4540 (build 134)

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

Possibly Parallel Threads