thr3ads.net - zfs discuss - [zfs-discuss] Long resilver time [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Jason J. W. Williams

2010-Sep-26 14:17 UTC

[zfs-discuss] Long resilver time

I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x raid-z2
stripes with 6 disks per raid-z2. Disks are 500gb in size. No checksum errors.

It seems like an exorbitantly long time. The other 5 disks in the stripe with
the replaced disk were at 90% busy and ~150io/s each during the resilver. Does
this seem unusual to anyone else? Could it be due to heavy fragmentation or do I
have a disk in the stripe going bad? Post-resilver no disk is above 30% util or
noticeably higher than any other disk.

Thank you in advance. (kernel is snv123)

-J

Sent via iPhone

Is your e-mail Premiere?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100926/b2e4920d/attachment.html>

Edward Ned Harvey

2010-Sep-26 16:00 UTC

head link

[zfs-discuss] Long resilver time

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jason J. W. Williams
> 
> I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x
> raid-z2 stripes with 6 disks per raid-z2. Disks are 500gb in size. No
> checksum errors.
27G on a 6-disk raidz2 means approx 6.75G per disk.  Ideally, the disk could
write 7G = 56 Gbit in a couple minutes if it were all sequential and no other
activity in the system.  So you''re right to suspect something is
suboptimal, but the root cause is inefficient resilvering code in zfs
specifically for raidzN.  The resilver code spends a *lot* of time seeking,
because it''s not optimized by disk layout.  This may change some day,
but not in the near future.

Mirrors don''t suffer the same effect.  At least, if they do,
it''s far less dramatic.

For now, all you can do is:  (a) factor this into your decision to use mirror
versus raidz, and (b) ensure no snapshots, and minimal IO during the resilver,
and (c) if you opt for raidz, keep the number of disks in a raidz to a minimum. 
It is preferable to use 3 vdev''s each of 7-disk raidz, instead of using
a 21-disk raidz3.

Your setup of 3x raidz2 is pretty reasonable, and 4h resilver, although slow, is
successful.  Which is more than you could say if you had a 21-disk raidz3.

Roy Sigurd Karlsbakk

2010-Sep-26 16:22 UTC

head link

[zfs-discuss] Long resilver time

----- Original Message -----

I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x raid-z2
stripes with 6 disks per raid-z2. Disks are 500gb in size. No checksum errors.

It seems like an exorbitantly long time. The other 5 disks in the stripe with
the replaced disk were at 90% busy and ~150io/s each during the resilver. Does
this seem unusual to anyone else? Could it be due to heavy fragmentation or do I
have a disk in the stripe going bad? Post-resilver no disk is above 30% util or
noticeably higher than any other disk.

Thank you in advance. (kernel is snv123)
It surely seems a long time for 27 gigs. Scrub takes its time, but for this 50TB
setup with currently ~29TB used, on WD Green drives (yeah, I know
they''re bad, but I didn''t know that at the time I installed
the box, and they have worked flawlessly for a year or so), scrub takes a bit of
time, but nothing comparible to what you''re reporting

scrub: scrub completed after 47h57m with 0 errors on Fri Sep 3 16:57:26 2010

Also, snv123 is quite old, is upgrading to 134 an option?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100926/697fa762/attachment-0001.html>

Bob Friesenhahn

2010-Sep-26 16:31 UTC

head link

[zfs-discuss] Long resilver time

On Sun, 26 Sep 2010, Edward Ned Harvey wrote:
> 27G on a 6-disk raidz2 means approx 6.75G per disk.  Ideally, the 
> disk could write 7G = 56 Gbit in a couple minutes if it were all 
> sequential and no other activity in the system.  So you''re right
to
> suspect something is suboptimal, but the root cause is inefficient 
> resilvering code in zfs specifically for raidzN.  The resilver code 
> spends a *lot* of time seeking, because it''s not optimized by disk
> layout.  This may change some day, but not in the near future.
Part of the problem is that the zfs designers decided that the 
filesystems should remain up and usable during a resilver.  Without 
this requirement things would be a lot easier.  For example, we could 
just run some utility and wait many hours (perhaps fewer hours than 
zfs resilver) before the filesystems are allowed to be usable.  Few of 
us want to return to that scenario.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Jason J. W. Williams

2010-Sep-26 18:03 UTC

head link

[zfs-discuss] Long resilver time

Upgrading is definitely an option. What is the current snv favorite for ZFS
stability? I apologize, with all the Oracle/Sun changes I haven''t been
paying as close attention to big reports on zfs-discuss as I used to.

-J

Sent via iPhone

Is your e-mail Premiere?

On Sep 26, 2010, at 10:22, Roy Sigurd Karlsbakk <roy at karlsbakk.net>
wrote:
> 
> I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x
raid-z2 stripes with 6 disks per raid-z2. Disks are 500gb in size. No checksum
errors.
> 
> It seems like an exorbitantly long time. The other 5 disks in the stripe
with the replaced disk were at 90% busy and ~150io/s each during the resilver.
Does this seem unusual to anyone else? Could it be due to heavy fragmentation or
do I have a disk in the stripe going bad? Post-resilver no disk is above 30%
util or noticeably higher than any other disk.
> 
> Thank you in advance. (kernel is snv123)
> It surely seems a long time for 27 gigs. Scrub takes its time, but for this
50TB setup with currently ~29TB used, on WD Green drives (yeah, I know
they''re bad, but I didn''t know that at the time I installed
the box, and they have worked flawlessly for a year or so), scrub takes a bit of
time, but nothing comparible to what you''re reporting
> 
>    scrub: scrub completed after 47h57m with 0 errors on Fri Sep  3 16:57:26
2010
> 
> Also, snv123 is quite old, is upgrading to 134 an option?
>  
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> roy at karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det
er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100926/95fa8d35/attachment.html>

Richard Elling

2010-Sep-26 19:16 UTC

head link

[zfs-discuss] Long resilver time

On Sep 26, 2010, at 11:03 AM, Jason J. W. Williams
wrote:> Upgrading is definitely an option. What is the current snv favorite for ZFS
stability? I apologize, with all the Oracle/Sun changes I haven''t been
paying as close attention to big reports on zfs-discuss as I used to.
OpenIndiana b147 is the latest binary release, but it also includes the fix for
CR6494473, ZFS needs a way to slow down resilvering 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
http://www.openindiana.org
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
richard at nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com

Roy Sigurd Karlsbakk

2010-Sep-26 20:16 UTC

head link

[zfs-discuss] Long resilver time

> > Upgrading is definitely an option. What is the current snv favorite
> > for ZFS stability? I apologize, with all the Oracle/Sun changes I
> > haven''t been paying as close attention to big reports on
zfs-discuss
> > as I used to.
> 
> OpenIndiana b147 is the latest binary release, but it also includes
> the fix for
> CR6494473, ZFS needs a way to slow down resilvering
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
> http://www.openindiana.org
Are you sure upgrading to OI is safe at this point? 134 is stable unless you
start fiddling with dedup, and OI is hardly tested. For a production setup,
I''d recommend 134
 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Richard Elling

2010-Sep-26 20:21 UTC

head link

[zfs-discuss] Long resilver time

On Sep 26, 2010, at 1:16 PM, Roy Sigurd Karlsbakk wrote:>>> Upgrading is definitely an option. What is the current snv favorite
>>> for ZFS stability? I apologize, with all the Oracle/Sun changes I
>>> haven''t been paying as close attention to big reports on
zfs-discuss
>>> as I used to.
>> 
>> OpenIndiana b147 is the latest binary release, but it also includes
>> the fix for
>> CR6494473, ZFS needs a way to slow down resilvering
>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
>> http://www.openindiana.org
> 
> Are you sure upgrading to OI is safe at this point? 134 is stable unless
you start fiddling with dedup, and OI is hardly tested. For a production setup,
I''d recommend 134
For a production setup?  For production I''d recommend something that is
supported, preferably NexentaStor 3 (which is b134 + important ZFS fixes :-)
 -- richard 

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
richard at nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com

Jason J. W. Williams

2010-Sep-27 18:02 UTC

head link

[zfs-discuss] Long resilver time

134 it is. This is an OpenSolaris rig that''s going to be replaced
within the
next 60 days, so just need to get it to something that won''t through
false
checksum errors like the 120-123 builds do and has decent rebuild times.

Future boxes will be NexentaStor.

Thank you guys. :)

-J

On Sun, Sep 26, 2010 at 2:21 PM, Richard Elling <Richard at nexenta.com>
wrote:
> On Sep 26, 2010, at 1:16 PM, Roy Sigurd Karlsbakk wrote:
> >>> Upgrading is definitely an option. What is the current snv
favorite
> >>> for ZFS stability? I apologize, with all the Oracle/Sun
changes I
> >>> haven''t been paying as close attention to big reports
on zfs-discuss
> >>> as I used to.
> >>
> >> OpenIndiana b147 is the latest binary release, but it also
includes
> >> the fix for
> >> CR6494473, ZFS needs a way to slow down resilvering
> >> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
> >> http://www.openindiana.org
> >
> > Are you sure upgrading to OI is safe at this point? 134 is stable
unless
> you start fiddling with dedup, and OI is hardly tested. For a production
> setup, I''d recommend 134
>
> For a production setup?  For production I''d recommend something
that is
> supported, preferably NexentaStor 3 (which is b134 + important ZFS fixes
> :-)
>  -- richard
>
> --
> OpenStorage Summit, October 25-27, Palo Alto, CA
> http://nexenta-summit2010.eventbrite.com
>
> Richard Elling
> richard at nexenta.com   +1-760-896-4422
> Enterprise class storage for everyone
> www.nexenta.com
>
>
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100927/1301cad0/attachment.html>

Jason J. W. Williams

2010-Sep-27 18:02 UTC

head link

[zfs-discuss] Long resilver time

Err...I meant Nexenta Core.

-J

On Mon, Sep 27, 2010 at 12:02 PM, Jason J. W. Williams <
jasonjwwilliams at gmail.com> wrote:
> 134 it is. This is an OpenSolaris rig that''s going to be replaced
within
> the next 60 days, so just need to get it to something that won''t
through
> false checksum errors like the 120-123 builds do and has decent rebuild
> times.
>
> Future boxes will be NexentaStor.
>
> Thank you guys. :)
>
> -J
>
> On Sun, Sep 26, 2010 at 2:21 PM, Richard Elling <Richard at
nexenta.com>wrote:
>
>> On Sep 26, 2010, at 1:16 PM, Roy Sigurd Karlsbakk wrote:
>> >>> Upgrading is definitely an option. What is the current snv
favorite
>> >>> for ZFS stability? I apologize, with all the Oracle/Sun
changes I
>> >>> haven''t been paying as close attention to big
reports on zfs-discuss
>> >>> as I used to.
>> >>
>> >> OpenIndiana b147 is the latest binary release, but it also
includes
>> >> the fix for
>> >> CR6494473, ZFS needs a way to slow down resilvering
>> >>
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
>> >> http://www.openindiana.org
>> >
>> > Are you sure upgrading to OI is safe at this point? 134 is stable
unless
>> you start fiddling with dedup, and OI is hardly tested. For a
production
>> setup, I''d recommend 134
>>
>> For a production setup?  For production I''d recommend
something that is
>> supported, preferably NexentaStor 3 (which is b134 + important ZFS
fixes
>> :-)
>>  -- richard
>>
>> --
>> OpenStorage Summit, October 25-27, Palo Alto, CA
>> http://nexenta-summit2010.eventbrite.com
>>
>> Richard Elling
>> richard at nexenta.com   +1-760-896-4422
>> Enterprise class storage for everyone
>> www.nexenta.com
>>
>>
>>
>>
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100927/0d48f574/attachment-0001.html>

Jackson Wang

2010-Dec-21 16:18 UTC

head link

[zfs-discuss] Long resilver time

Dear Richard,
I am a Nexenta user and now I meet the same problem of the resilver spend too
long time. I try to find out solution from the link on your content that
"zfs set resilver_speed=10% pool_name" but the Nexenta without the
property of resiler_speed. How can I slove my issue on Nexenta? Please advise.
Thanks!
-- 
This message posted from opensolaris.org

Jackson Wang

2010-Dec-21 16:20 UTC

head link

[zfs-discuss] Long resilver time

Dear Richard,
How can I update the important ZFS fixes on NexentaStor? Now my version of
NexentsStor is v3.0.4 enterprise.
-- 
This message posted from opensolaris.org

Richard Elling

2010-Dec-26 05:24 UTC

head link

[zfs-discuss] Long resilver time

On Dec 21, 2010, at 8:18 AM, Jackson Wang wrote:> Dear Richard,
> I am a Nexenta user and now I meet the same problem of the resilver spend
too long time. I try to find out solution from the link on your content that
"zfs set resilver_speed=10% pool_name" but the Nexenta without the
property of resiler_speed. How can I slove my issue on Nexenta? Please advise.
Thanks!
In general, resilver will take as long as needed. If your resilver is going
very, very slow, then there could be other issues causing the slowness.
Has the system been logging error messages related to the I/O subsystem
during the resilver?
 -- richard

Jackson Wang

2010-Dec-26 13:33 UTC

head link

[zfs-discuss] Long resilver time

Dear Richard,
Thanks for your reply.

Actually there is NO any other disk/controlller fault in this system. An
engineer of NexentaStor, Andrew, just add a line in /kernel/drv/sd.conf of
"allow-bus-device-reset=0" of the NexentaStor system and then the
resilver
speed get high. Before the parameter add-on, the system had resilver more
than 2 days and not complete yet. After the engineer add-on that line and
reboot the system, the reslver just spend about 10 hours to complete. Do you
know what happen about it? Thanks!!

On Sun, Dec 26, 2010 at 1:24 PM, Richard Elling <richard.elling at
gmail.com>wrote:
> On Dec 21, 2010, at 8:18 AM, Jackson Wang wrote:
> > Dear Richard,
> > I am a Nexenta user and now I meet the same problem of the resilver
spend
> too long time. I try to find out solution from the link on your content
that
> "zfs set resilver_speed=10% pool_name" but the Nexenta without
the property
> of resiler_speed. How can I slove my issue on Nexenta? Please advise.
> Thanks!
>
> In general, resilver will take as long as needed. If your resilver is going
> very, very slow, then there could be other issues causing the slowness.
> Has the system been logging error messages related to the I/O subsystem
> during the resilver?
>  -- richard
>
>

-- 
InfoTech Technology Corp.
????????
http://www.infowize.com.tw

Jackson Wang ???
M: 0916163480
T:02-26791430 / 03-5834432 / 070-1020-9886
F:0940-472248

Tech Supp:  support at infowize.com.tw
Sales Supp:  sales at infowize.com.tw
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101226/bd90fc79/attachment.html>

Khushil Dep

2010-Dec-26 16:41 UTC

head link

[zfs-discuss] Long resilver time

Do you have SSD in? Which ones and any errors on those?
On 26 Dec 2010 13:35, "Jackson Wang" <jcwang at infowize.com.tw>
wrote:> Dear Richard,
> Thanks for your reply.
>
> Actually there is NO any other disk/controlller fault in this system. An
> engineer of NexentaStor, Andrew, just add a line in /kernel/drv/sd.conf of
> "allow-bus-device-reset=0" of the NexentaStor system and then the
resilver
> speed get high. Before the parameter add-on, the system had resilver more
> than 2 days and not complete yet. After the engineer add-on that line and
> reboot the system, the reslver just spend about 10 hours to complete. Do
you> know what happen about it? Thanks!!
>
>
> On Sun, Dec 26, 2010 at 1:24 PM, Richard Elling <richard.elling at
gmail.com
>wrote:
>
>> On Dec 21, 2010, at 8:18 AM, Jackson Wang wrote:
>> > Dear Richard,
>> > I am a Nexenta user and now I meet the same problem of the
resilver
spend>> too long time. I try to find out solution from the link on your content
that>> "zfs set resilver_speed=10% pool_name" but the Nexenta
without the
property>> of resiler_speed. How can I slove my issue on Nexenta? Please advise.
>> Thanks!
>>
>> In general, resilver will take as long as needed. If your resilver is
going>> very, very slow, then there could be other issues causing the slowness.
>> Has the system been logging error messages related to the I/O subsystem
>> during the resilver?
>> -- richard
>>
>>
>
>
> --
> InfoTech Technology Corp.
> ????????
> http://www.infowize.com.tw
>
> Jackson Wang ???
> M: 0916163480
> T:02-26791430 / 03-5834432 / 070-1020-9886
> F:0940-472248
>
> Tech Supp: support at infowize.com.tw
> Sales Supp: sales at infowize.com.tw-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101226/3f1c2834/attachment.html>

Richard Elling

2010-Dec-26 22:23 UTC

head link

[zfs-discuss] Long resilver time

On Dec 26, 2010, at 5:33 AM, Jackson Wang wrote:
> Dear Richard,
> Thanks for your reply.
> 
> Actually there is NO any other disk/controlller fault in this system. An
engineer of NexentaStor, Andrew, just add a line in /kernel/drv/sd.conf of
"allow-bus-device-reset=0" of the NexentaStor system and then the
resilver speed get high. Before the parameter add-on, the system had resilver
more than 2 days and not complete yet. After the engineer add-on that line and
reboot the system, the reslver just spend about 10 hours to complete. Do you
know what happen about it? Thanks!!
This occurs when a device is misbehaving and not responding to commands.
When a device does not respond to commands for more than 60 seconds, the 
sd driver will issue a bus reset, which affects other devices on the
"bus."  This
can happen regardless of the I/O workload.  The workaround disables the bus
resets, as described in the sd man page.
 -- richard

zfs discuss - Sep 2010 - Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time

[zfs-discuss] Long resilver time