thr3ads.net - zfs discuss - [zfs-discuss] Extremely slow zpool scrub performance [May 2011]

If this information is useful, please help other people find it:
Share via:

Donald Stahl

2011-May-13 18:25 UTC

[zfs-discuss] Extremely slow zpool scrub performance

Running a zpool scrub on our production pool is showing a scrub rate
of about 400K/s. (When this pool was first set up we saw rates in the
MB/s range during a scrub).

Both zpool iostat and an iostat -Xn show lots of idle disk times, no
above average service times, no abnormally high busy percentages.

Load on the box is .59.

8 x 3GHz, 32GB ram, 96 spindles arranged into raidz zdevs on OI 147.

Known hardware errors:
- 1 of 8 SAS lanes is down- though we''ve seen the same poor
performance when using the backup where all 8 lanes work.
- Target 44 occasionally throws an error (less than once a week). When
this happens the pool will become unresponsive for a second, then
continue working normally.

Read performance when we read off the file system (including cache and
using dd with a 1meg block size) shows 1.6GB/sec. zpool iostat will
show numerous reads of 500 MB/s when doing this test.

I''m willing to consider that hardware could be the culprit here- but I
would expect to see signs if that were the case. The lack of any slow
service times, the lack of any effort at disk IO all seem to point
elsewhere.

I will provide any additional information people might find helpful
and will, if possible, test any suggestions.

Thanks in advance,
-Don

Edward Ned Harvey

2011-May-14 12:08 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Donald Stahl
> 
> Running a zpool scrub on our production pool is showing a scrub rate
> of about 400K/s. (When this pool was first set up we saw rates in the
> MB/s range during a scrub).
Wait longer, and keep watching it.  Or just wait till it''s done and
look at
the total time required.  It is normal to have periods of high and low
during scrub.  I don''t know why.

Andrew Gabriel

2011-May-14 13:04 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

On 05/14/11 01:08 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Donald Stahl
>>
>> Running a zpool scrub on our production pool is showing a scrub rate
>> of about 400K/s. (When this pool was first set up we saw rates in the
>> MB/s range during a scrub).
> Wait longer, and keep watching it.  Or just wait till it''s done
and look at
> the total time required.  It is normal to have periods of high and low
> during scrub.  I don''t know why.
Check the IOPS per drive - you may be maxing out on one of them if it''s
in an area where there are lots of small blocks.

-- 
Andrew

Richard Elling

2011-May-14 13:55 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

On May 13, 2011, at 11:25 AM, Donald Stahl <don at blacksun.org> wrote:
> Running a zpool scrub on our production pool is showing a scrub rate
> of about 400K/s. (When this pool was first set up we saw rates in the
> MB/s range during a scrub).
> 
The scrub I/O has lower priority than other I/O.

In later ZFS releases, scrub I/O is also throttled. When the throttle
kicks in, the scrub can drop to 5-10 IOPS. This shouldn''t be much of 
an issue, scrubs do not need to be, and are not intended to be, run
very often -- perhaps once a quarter or so.
 -- richard
> Both zpool iostat and an iostat -Xn show lots of idle disk times, no
> above average service times, no abnormally high busy percentages.
> 
> Load on the box is .59.
> 
> 8 x 3GHz, 32GB ram, 96 spindles arranged into raidz zdevs on OI 147.
> 
> Known hardware errors:
> - 1 of 8 SAS lanes is down- though we''ve seen the same poor
> performance when using the backup where all 8 lanes work.
> - Target 44 occasionally throws an error (less than once a week). When
> this happens the pool will become unresponsive for a second, then
> continue working normally.
> 
> Read performance when we read off the file system (including cache and
> using dd with a 1meg block size) shows 1.6GB/sec. zpool iostat will
> show numerous reads of 500 MB/s when doing this test.
> 
> I''m willing to consider that hardware could be the culprit here-
but I
> would expect to see signs if that were the case. The lack of any slow
> service times, the lack of any effort at disk IO all seem to point
> elsewhere.
> 
> I will provide any additional information people might find helpful
> and will, if possible, test any suggestions.
> 
> Thanks in advance,
> -Don
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Donald Stahl

2011-May-14 15:29 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> The scrub I/O has lower priority than other I/O.
>
> In later ZFS releases, scrub I/O is also throttled. When the throttle
> kicks in, the scrub can drop to 5-10 IOPS. This shouldn''t be much
of
> an issue, scrubs do not need to be, and are not intended to be, run
> very often -- perhaps once a quarter or so.I understand the lower priority I/O and such but what confuses me is this:
On my primary head:
 scan: scrub in progress since Fri May 13 14:04:46 2011
    24.5G scanned out of 14.2T at 340K/s, (scan is slow, no estimated time)
    0 repaired, 0.17% done

I have a second NAS head, also running OI 147 on the same type of
server, with the same SAS card, connected to the same type of disk
shelf- and a zpool scrub over there is showing :
 scan: scrub in progress since Sat May 14 11:10:51 2011
    29.0G scanned out of 670G at 162M/s, 1h7m to go
    0 repaired, 4.33% done

Obviously there is less data on the second server- but the first
server has 88 x SAS drives and the second one has 10 x 7200 SATA
drives. I would expect those 88 SAS drives to be able to outperform 10
SATA drives- but they aren''t.

On the first server iostat -Xn is showing 30-40 IOPS max per drive,
while on the second server iostat -Xn is showing 400 IOPS per drive.

On the first server the disk busy numbers never climb higher than 30%
while on the secondary they will spike to 96%.

This performance problem isn''t just related to scrubbing either. I see
mediocre performance when trying to write to the array as well. If I
were seeing hardware errors, high service times, high load, or other
errors, then that might make sense. Unfortunately I seem to have
mostly idle disks that don''t get used. It''s almost as if ZFS
is just
sitting around twiddling its thumbs instead of writing data.

I''m happy to provide real numbers, suffice it to say none of these
numbers make any sense to me.

The array actually has 88 disks + 4 hot spares (1 each of two sizes
per controller channel) + 4 Intel X-25E 32GB SSD''s (2 x 2 way mirror
split across controller channels).

Any ideas or things I should test and I will gladly look into them.

-Don

George Wilson

2011-May-16 06:36 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

Can you share your ''zpool status'' output for both pools?

Also you may want to run the following a few times in a loop and
provide the output:

# echo "::walk spa | ::print spa_t spa_name spa_last_io
spa_scrub_inflight" | mdb -k

Thanks,
George

On Sat, May 14, 2011 at 8:29 AM, Donald Stahl <don at blacksun.org>
wrote:>> The scrub I/O has lower priority than other I/O.
>>
>> In later ZFS releases, scrub I/O is also throttled. When the throttle
>> kicks in, the scrub can drop to 5-10 IOPS. This shouldn''t be
much of
>> an issue, scrubs do not need to be, and are not intended to be, run
>> very often -- perhaps once a quarter or so.
> I understand the lower priority I/O and such but what confuses me is this:
> On my primary head:
> ?scan: scrub in progress since Fri May 13 14:04:46 2011
> ? ?24.5G scanned out of 14.2T at 340K/s, (scan is slow, no estimated time)
> ? ?0 repaired, 0.17% done
>
> I have a second NAS head, also running OI 147 on the same type of
> server, with the same SAS card, connected to the same type of disk
> shelf- and a zpool scrub over there is showing :
> ?scan: scrub in progress since Sat May 14 11:10:51 2011
> ? ?29.0G scanned out of 670G at 162M/s, 1h7m to go
> ? ?0 repaired, 4.33% done
>
> Obviously there is less data on the second server- but the first
> server has 88 x SAS drives and the second one has 10 x 7200 SATA
> drives. I would expect those 88 SAS drives to be able to outperform 10
> SATA drives- but they aren''t.
>
> On the first server iostat -Xn is showing 30-40 IOPS max per drive,
> while on the second server iostat -Xn is showing 400 IOPS per drive.
>
> On the first server the disk busy numbers never climb higher than 30%
> while on the secondary they will spike to 96%.
>
> This performance problem isn''t just related to scrubbing either. I
see
> mediocre performance when trying to write to the array as well. If I
> were seeing hardware errors, high service times, high load, or other
> errors, then that might make sense. Unfortunately I seem to have
> mostly idle disks that don''t get used. It''s almost as if
ZFS is just
> sitting around twiddling its thumbs instead of writing data.
>
> I''m happy to provide real numbers, suffice it to say none of these
> numbers make any sense to me.
>
> The array actually has 88 disks + 4 hot spares (1 each of two sizes
> per controller channel) + 4 Intel X-25E 32GB SSD''s (2 x 2 way
mirror
> split across controller channels).
>
> Any ideas or things I should test and I will gladly look into them.
>
> -Don
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com

Donald Stahl

2011-May-16 14:41 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> Can you share your ''zpool status'' output for both pools?Faster, smaller server:
~# zpool status pool0
 pool: pool0
 state: ONLINE
 scan: scrub repaired 0 in 2h18m with 0 errors on Sat May 14 13:28:58 2011

Much larger, more capable server:
~# zpool status pool0 | head
 pool: pool0
 state: ONLINE
 scan: scrub in progress since Fri May 13 14:04:46 2011
    173G scanned out of 14.2T at 737K/s, (scan is slow, no estimated time)
    43K repaired, 1.19% done

The only other relevant line is:
            c5t9d0          ONLINE       0     0     0  (repairing)

(That''s new as of this morning- though it was still very slow before
that)
> Also you may want to run the following a few times in a loop and
> provide the output:
>
> # echo "::walk spa | ::print spa_t spa_name spa_last_io
> spa_scrub_inflight" | mdb -k~# echo "::walk spa | ::print spa_t spa_name
spa_last_io> spa_scrub_inflight" | mdb -kspa_name = [ "pool0" ]
spa_last_io = 0x159b275a
spa_name = [ "rpool" ]
spa_last_io = 0x159b210a
mdb: failed to dereference symbol: unknown symbol name

I''m pretty sure that''s not the output you were looking for :)

On the same theme- is there a good reference for all of the various
ZFS debugging commands and mdb options?

I''d love to spend a lot of time just looking at the data available to
me but every time I turn around someone suggests a new and interesting
mdb query I''ve never seen before.

Thanks,
-Don

George Wilson

2011-May-16 17:26 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

Don,

Can you send the entire ''zpool status'' output? I wanted to see
your
pool configuration. Also run the mdb command in a loop (at least 5
tiimes) so we can see if spa_last_io is changing. I''m surprised
you''re
not finding the symbol for ''spa_scrub_inflight'' too.  Can you
check
that you didn''t mistype this?

Thanks,
George

On Mon, May 16, 2011 at 7:41 AM, Donald Stahl <don at blacksun.org>
wrote:>> Can you share your ''zpool status'' output for both
pools?
> Faster, smaller server:
> ~# zpool status pool0
> ?pool: pool0
> ?state: ONLINE
> ?scan: scrub repaired 0 in 2h18m with 0 errors on Sat May 14 13:28:58 2011
>
> Much larger, more capable server:
> ~# zpool status pool0 | head
> ?pool: pool0
> ?state: ONLINE
> ?scan: scrub in progress since Fri May 13 14:04:46 2011
> ? ?173G scanned out of 14.2T at 737K/s, (scan is slow, no estimated time)
> ? ?43K repaired, 1.19% done
>
> The only other relevant line is:
> ? ? ? ? ? ?c5t9d0 ? ? ? ? ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 ?(repairing)
>
> (That''s new as of this morning- though it was still very slow
before that)
>
>> Also you may want to run the following a few times in a loop and
>> provide the output:
>>
>> # echo "::walk spa | ::print spa_t spa_name spa_last_io
>> spa_scrub_inflight" | mdb -k
> ~# echo "::walk spa | ::print spa_t spa_name spa_last_io
>> spa_scrub_inflight" | mdb -k
> spa_name = [ "pool0" ]
> spa_last_io = 0x159b275a
> spa_name = [ "rpool" ]
> spa_last_io = 0x159b210a
> mdb: failed to dereference symbol: unknown symbol name
>
> I''m pretty sure that''s not the output you were looking
for :)
>
> On the same theme- is there a good reference for all of the various
> ZFS debugging commands and mdb options?
>
> I''d love to spend a lot of time just looking at the data available
to
> me but every time I turn around someone suggests a new and interesting
> mdb query I''ve never seen before.
>
> Thanks,
> -Don
>


-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com

Donald Stahl

2011-May-16 17:35 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> Can you send the entire ''zpool status'' output? I wanted
to see your
> pool configuration. Also run the mdb command in a loop (at least 5
> tiimes) so we can see if spa_last_io is changing. I''m surprised
you''re
> not finding the symbol for ''spa_scrub_inflight'' too. ?Can
you check
> that you didn''t mistype this?I copy and pasted to make sure that wasn''t the issue :)

I will run it in a loop this time. I didn''t do it last time because of
the error.

This box was running only raidz sets originally. After running into
performance problems we added a bunch of mirrors to try to improve the
iops. The logs are not mirrored right now as we were testing adding
the other two as cache disks to see if that helped. We''ve also tested
using a ramdisk ZIL to see if that made any difference- it did not.

The performance on this box was excellent until it started to fill up
(somewhere around 70%)- then performance degraded significantly. We
added more disks, and copied the data around to rebalance things. It
seems to have helped somewhat- but it is nothing like when we first
created the array.

config:

        NAME                STATE     READ WRITE CKSUM
        pool0               ONLINE       0     0     0
          raidz1-0          ONLINE       0     0     0
            c5t5d0          ONLINE       0     0     0
            c5t6d0          ONLINE       0     0     0
            c5t7d0          ONLINE       0     0     0
            c5t8d0          ONLINE       0     0     0
          raidz1-1          ONLINE       0     0     0
            c5t9d0          ONLINE       0     0     0  (repairing)
            c5t10d0         ONLINE       0     0     0
            c5t11d0         ONLINE       0     0     0
            c5t12d0         ONLINE       0     0     0
          raidz1-2          ONLINE       0     0     0
            c5t13d0         ONLINE       0     0     0
            c5t14d0         ONLINE       0     0     0
            c5t15d0         ONLINE       0     0     0
            c5t16d0         ONLINE       0     0     0
          raidz1-3          ONLINE       0     0     0
            c5t21d0         ONLINE       0     0     0
            c5t22d0         ONLINE       0     0     0
            c5t23d0         ONLINE       0     0     0
            c5t24d0         ONLINE       0     0     0
          raidz1-4          ONLINE       0     0     0
            c5t25d0         ONLINE       0     0     0
            c5t26d0         ONLINE       0     0     0
            c5t27d0         ONLINE       0     0     0
            c5t28d0         ONLINE       0     0     0
          raidz1-5          ONLINE       0     0     0
            c5t29d0         ONLINE       0     0     0
            c5t30d0         ONLINE       0     0     0
            c5t31d0         ONLINE       0     0     0
            c5t32d0         ONLINE       0     0     0
          raidz1-6          ONLINE       0     0     0
            c5t33d0         ONLINE       0     0     0
            c5t34d0         ONLINE       0     0     0
            c5t35d0         ONLINE       0     0     0
            c5t36d0         ONLINE       0     0     0
          raidz1-7          ONLINE       0     0     0
            c5t37d0         ONLINE       0     0     0
            c5t38d0         ONLINE       0     0     0
            c5t39d0         ONLINE       0     0     0
            c5t40d0         ONLINE       0     0     0
          raidz1-8          ONLINE       0     0     0
            c5t41d0         ONLINE       0     0     0
            c5t42d0         ONLINE       0     0     0
            c5t43d0         ONLINE       0     0     0
            c5t44d0         ONLINE       0     0     0
          raidz1-10         ONLINE       0     0     0
            c5t45d0         ONLINE       0     0     0
            c5t46d0         ONLINE       0     0     0
            c5t47d0         ONLINE       0     0     0
            c5t48d0         ONLINE       0     0     0
          raidz1-11         ONLINE       0     0     0
            c5t49d0         ONLINE       0     0     0
            c5t50d0         ONLINE       0     0     0
            c5t51d0         ONLINE       0     0     0
            c5t52d0         ONLINE       0     0     0
          raidz1-12         ONLINE       0     0     0
            c5t53d0         ONLINE       0     0     0
            c5t54d0         ONLINE       0     0     0
            c5t55d0         ONLINE       0     0     0
            c5t56d0         ONLINE       0     0     0
          raidz1-13         ONLINE       0     0     0
            c5t57d0         ONLINE       0     0     0
            c5t58d0         ONLINE       0     0     0
            c5t59d0         ONLINE       0     0     0
            c5t60d0         ONLINE       0     0     0
          raidz1-14         ONLINE       0     0     0
            c5t61d0         ONLINE       0     0     0
            c5t62d0         ONLINE       0     0     0
            c5t63d0         ONLINE       0     0     0
            c5t64d0         ONLINE       0     0     0
          raidz1-15         ONLINE       0     0     0
            c5t65d0         ONLINE       0     0     0
            c5t66d0         ONLINE       0     0     0
            c5t67d0         ONLINE       0     0     0
            c5t68d0         ONLINE       0     0     0
          raidz1-16         ONLINE       0     0     0
            c5t69d0         ONLINE       0     0     0
            c5t70d0         ONLINE       0     0     0
            c5t71d0         ONLINE       0     0     0
            c5t72d0         ONLINE       0     0     0
          raidz1-17         ONLINE       0     0     0
            c5t73d0         ONLINE       0     0     0
            c5t74d0         ONLINE       0     0     0
            c5t75d0         ONLINE       0     0     0
            c5t76d0         ONLINE       0     0     0
          raidz1-18         ONLINE       0     0     0
            c5t77d0         ONLINE       0     0     0
            c5t78d0         ONLINE       0     0     0
            c5t79d0         ONLINE       0     0     0
            c5t80d0         ONLINE       0     0     0
          mirror-20         ONLINE       0     0     0
            c5t81d0         ONLINE       0     0     0
            c5t82d0         ONLINE       0     0     0
          mirror-21         ONLINE       0     0     0
            c5t83d0         ONLINE       0     0     0
            c5t84d0         ONLINE       0     0     0
          mirror-22         ONLINE       0     0     0
            c5t85d0         ONLINE       0     0     0
            c5t86d0         ONLINE       0     0     0
          mirror-23         ONLINE       0     0     0
            c5t87d0         ONLINE       0     0     0
            c5t97d0         ONLINE       0     0     0
          mirror-24         ONLINE       0     0     0
            c5t89d0         ONLINE       0     0     0
            c5t90d0         ONLINE       0     0     0
          mirror-25         ONLINE       0     0     0
            c5t91d0         ONLINE       0     0     0
            c5t92d0         ONLINE       0     0     0
          mirror-26         ONLINE       0     0     0
            c5t93d0         ONLINE       0     0     0
            c5t94d0         ONLINE       0     0     0
          mirror-27         ONLINE       0     0     0
            c5t95d0         ONLINE       0     0     0
            c5t96d0         ONLINE       0     0     0
        logs
          c5t2d0            ONLINE       0     0     0
          c5t18d0           ONLINE       0     0     0
        spares
          c5t3d0            AVAIL
          c5t4d0            AVAIL
          c5t19d0           AVAIL
          c5t20d0           AVAIL

errors: No known data errors

Donald Stahl

2011-May-16 17:40 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> I copy and pasted to make sure that wasn''t the issue :)Which, ironically, turned out to be the problem- there was an extra
carriage return in there that mdb did not like:

Here is the output:

spa_name = [ "pool0" ]
spa_last_io = 0x82721a4
spa_scrub_inflight = 0x1

spa_name = [ "pool0" ]
spa_last_io = 0x8272240
spa_scrub_inflight = 0x1

spa_name = [ "pool0" ]
spa_last_io = 0x82722f0
spa_scrub_inflight = 0x1

spa_name = [ "pool0" ]
spa_last_io = 0x827239e
spa_scrub_inflight = 0

spa_name = [ "pool0" ]
spa_last_io = 0x8272441
spa_scrub_inflight = 0x1

Donald Stahl

2011-May-16 17:58 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

Here is another example of the performance problems I am seeing:

~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 56.2184 s, 37.3 MB/s

37MB/s seems like some sort of bad joke for all these disks. I can
write the same amount of data to a set of 6 SAS disks on a Dell
PERC6/i at a rate of 160MB/s and those disks are hosting 25 vm''s and a
lot more IOPS than this box.

zpool iostat during the same time shows:
pool0       14.2T  25.3T    124  1.30K   981K  4.02M
pool0       14.2T  25.3T    277    914  2.16M  23.2M
pool0       14.2T  25.3T     65  4.03K   526K  90.2M
pool0       14.2T  25.3T     18  1.76K   136K  6.81M
pool0       14.2T  25.3T    460  5.55K  3.60M   111M
pool0       14.2T  25.3T    160      0  1.24M      0
pool0       14.2T  25.3T    182  2.34K  1.41M  33.3M

The zero''s and other low numbers don''t make any sense. And as
I
mentioned- the busy percent and service times of these disks are never
abnormally high- especially when compared to the much smaller, better
performing pool I have.

George Wilson

2011-May-16 18:21 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

You mentioned that the pool was somewhat full, can you send the output
of ''zpool iostat -v pool0''? You can also try doing the
following to
reduce ''metaslab_min_alloc_size'' to 4K:

echo "metaslab_min_alloc_size/Z 1000" | mdb -kw

NOTE: This will change the running system so you may want to make this
change during off-peak hours.

Then check your performance and see if it makes a difference.

- George


On Mon, May 16, 2011 at 10:58 AM, Donald Stahl <don at blacksun.org>
wrote:> Here is another example of the performance problems I am seeing:
>
> ~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.1 GB) copied, 56.2184 s, 37.3 MB/s
>
> 37MB/s seems like some sort of bad joke for all these disks. I can
> write the same amount of data to a set of 6 SAS disks on a Dell
> PERC6/i at a rate of 160MB/s and those disks are hosting 25 vm''s
and a
> lot more IOPS than this box.
>
> zpool iostat during the same time shows:
> pool0 ? ? ? 14.2T ?25.3T ? ?124 ?1.30K ? 981K ?4.02M
> pool0 ? ? ? 14.2T ?25.3T ? ?277 ? ?914 ?2.16M ?23.2M
> pool0 ? ? ? 14.2T ?25.3T ? ? 65 ?4.03K ? 526K ?90.2M
> pool0 ? ? ? 14.2T ?25.3T ? ? 18 ?1.76K ? 136K ?6.81M
> pool0 ? ? ? 14.2T ?25.3T ? ?460 ?5.55K ?3.60M ? 111M
> pool0 ? ? ? 14.2T ?25.3T ? ?160 ? ? ?0 ?1.24M ? ? ?0
> pool0 ? ? ? 14.2T ?25.3T ? ?182 ?2.34K ?1.41M ?33.3M
>
> The zero''s and other low numbers don''t make any sense.
And as I
> mentioned- the busy percent and service times of these disks are never
> abnormally high- especially when compared to the much smaller, better
> performing pool I have.
>


-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com

Donald Stahl

2011-May-16 18:28 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> You mentioned that the pool was somewhat full, can you send the output
> of ''zpool iostat -v pool0''?
~# zpool iostat -v pool0
                       capacity     operations    bandwidth
pool                alloc   free   read  write   read  write
------------------  -----  -----  -----  -----  -----  -----
pool0               14.1T  25.4T    926  2.35K  7.20M  15.7M
  raidz1             673G   439G     42    117   335K   790K
    c5t5d0              -      -     20     20   167K   273K
    c5t6d0              -      -     20     20   167K   272K
    c5t7d0              -      -     20     20   167K   273K
    c5t8d0              -      -     20     20   167K   272K
  raidz1             710G   402G     38     84   309K   546K
    c5t9d0              -      -     18     16   158K   189K
    c5t10d0             -      -     18     16   157K   187K
    c5t11d0             -      -     18     16   158K   189K
    c5t12d0             -      -     18     16   157K   187K
  raidz1             719G   393G     43     95   348K   648K
    c5t13d0             -      -     20     17   172K   224K
    c5t14d0             -      -     20     17   171K   223K
    c5t15d0             -      -     20     17   172K   224K
    c5t16d0             -      -     20     17   172K   223K
  raidz1             721G   391G     42     96   341K   653K
    c5t21d0             -      -     20     16   170K   226K
    c5t22d0             -      -     20     16   169K   224K
    c5t23d0             -      -     20     16   170K   226K
    c5t24d0             -      -     20     16   170K   224K
  raidz1             721G   391G     43    100   342K   667K
    c5t25d0             -      -     20     17   172K   231K
    c5t26d0             -      -     20     17   172K   229K
    c5t27d0             -      -     20     17   172K   231K
    c5t28d0             -      -     20     17   172K   229K
  raidz1             721G   391G     43    101   341K   672K
    c5t29d0             -      -     20     18   173K   233K
    c5t30d0             -      -     20     18   173K   231K
    c5t31d0             -      -     20     18   173K   233K
    c5t32d0             -      -     20     18   173K   231K
  raidz1             722G   390G     42    100   339K   667K
    c5t33d0             -      -     20     19   171K   231K
    c5t34d0             -      -     20     19   172K   229K
    c5t35d0             -      -     20     19   171K   231K
    c5t36d0             -      -     20     19   171K   229K
  raidz1             709G   403G     42    107   341K   714K
    c5t37d0             -      -     20     20   171K   247K
    c5t38d0             -      -     20     19   170K   245K
    c5t39d0             -      -     20     20   171K   247K
    c5t40d0             -      -     20     19   170K   245K
  raidz1             744G   368G     39     79   316K   530K
    c5t41d0             -      -     18     16   163K   183K
    c5t42d0             -      -     18     15   163K   182K
    c5t43d0             -      -     18     16   163K   183K
    c5t44d0             -      -     18     15   163K   182K
  raidz1             737G   375G     44     98   355K   668K
    c5t45d0             -      -     21     18   178K   231K
    c5t46d0             -      -     21     18   178K   229K
    c5t47d0             -      -     21     18   178K   231K
    c5t48d0             -      -     21     18   178K   229K
  raidz1             733G   379G     43    103   344K   683K
    c5t49d0             -      -     20     19   175K   237K
    c5t50d0             -      -     20     19   175K   235K
    c5t51d0             -      -     20     19   175K   237K
    c5t52d0             -      -     20     19   175K   235K
  raidz1             732G   380G     43    104   344K   685K
    c5t53d0             -      -     20     19   176K   237K
    c5t54d0             -      -     20     19   175K   235K
    c5t55d0             -      -     20     19   175K   237K
    c5t56d0             -      -     20     19   175K   235K
  raidz1             733G   379G     43    101   344K   672K
    c5t57d0             -      -     20     17   175K   233K
    c5t58d0             -      -     20     17   174K   231K
    c5t59d0             -      -     20     17   175K   233K
    c5t60d0             -      -     20     17   174K   231K
  raidz1             806G  1.38T     50    123   401K   817K
    c5t61d0             -      -     24     22   201K   283K
    c5t62d0             -      -     24     22   201K   281K
    c5t63d0             -      -     24     22   201K   283K
    c5t64d0             -      -     24     22   201K   281K
  raidz1             794G  1.40T     47    120   377K   786K
    c5t65d0             -      -     22     23   194K   272K
    c5t66d0             -      -     22     23   194K   270K
    c5t67d0             -      -     22     23   194K   272K
    c5t68d0             -      -     22     23   194K   270K
  raidz1             788G  1.40T     47    115   376K   763K
    c5t69d0             -      -     22     22   191K   264K
    c5t70d0             -      -     22     22   191K   262K
    c5t71d0             -      -     22     22   191K   264K
    c5t72d0             -      -     22     22   191K   262K
  raidz1             786G  1.40T     46    106   373K   723K
    c5t73d0             -      -     22     18   185K   250K
    c5t74d0             -      -     22     19   185K   248K
    c5t75d0             -      -     22     18   185K   250K
    c5t76d0             -      -     22     19   185K   248K
  raidz1             767G  1.42T     40     79   323K   534K
    c5t77d0             -      -     19     16   165K   185K
    c5t78d0             -      -     19     16   165K   183K
    c5t79d0             -      -     19     16   165K   185K
    c5t80d0             -      -     19     16   165K   183K
  c5t2d0            3.40M  46.5G      0      4      0  90.0K
  mirror            61.7G  1.75T      4     25  33.2K   149K
    c5t81d0             -      -      1      9  24.3K   149K
    c5t82d0             -      -      1      9  24.3K   149K
  mirror             140G  1.68T     19     71   158K   504K
    c5t83d0             -      -      6     13  95.2K   504K
    c5t84d0             -      -      6     14  96.6K   504K
  mirror             141G  1.67T     18     79   148K   535K
    c5t85d0             -      -      6     16  93.0K   535K
    c5t86d0             -      -      6     16  93.5K   535K
  mirror             131G  1.68T     20     65   166K   419K
    c5t87d0             -      -      6     14   156K   419K
    c5t97d0             -      -      4     20  66.7K   683K
  mirror             145G  1.67T     19     77   157K   525K
    c5t89d0             -      -      6     15  97.4K   525K
    c5t90d0             -      -      6     15  97.7K   525K
  mirror             147G  1.67T     18     80   152K   539K
    c5t91d0             -      -      6     15  96.2K   539K
    c5t92d0             -      -      6     15  95.3K   539K
  mirror             150G  1.67T     19     81   156K   547K
    c5t93d0             -      -      6     15  98.1K   547K
    c5t94d0             -      -      6     16  97.7K   547K
  mirror             155G  1.66T     19     80   154K   538K
    c5t95d0             -      -      6     16  97.1K   538K
    c5t96d0             -      -      6     17  97.3K   538K
  c5t18d0           3.11M  46.5G      0      4      0  91.2K
------------------  -----  -----  -----  -----  -----  -----

> You can also try doing the following to
> reduce ''metaslab_min_alloc_size'' to 4K:
>
> echo "metaslab_min_alloc_size/Z 1000" | mdb -kw
>
> NOTE: This will change the running system so you may want to make this
> change during off-peak hours.
> Then check your performance and see if it makes a difference.I''ll make this change tonight and see if it helps.

Thanks,
-Don

Donald Stahl

2011-May-16 23:29 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> You mentioned that the pool was somewhat full, can you send the output
> of ''zpool iostat -v pool0''? You can also try doing the
following to
> reduce ''metaslab_min_alloc_size'' to 4K:
>
> echo "metaslab_min_alloc_size/Z 1000" | mdb -kwSo just changing that setting moved my write rate from 40MB/s to 175MB/s.

That''s a huge improvement. It''s still not as high as I used to
see on
this box- but at least now the array is useable again. Thanks for the
suggestion!

Any other tunables I should be taking a look at?

-Don

Roy Sigurd Karlsbakk

2011-May-17 00:33 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> Running a zpool scrub on our production pool is showing a scrub rate
> of about 400K/s. (When this pool was first set up we saw rates in the
> MB/s range during a scrub).
Usually, something like this is caused by a bad drive. Can you post iostat -en
output?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Jim Klimov

2011-May-17 01:42 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

2011-05-16 22:21, George Wilson ?????:> echo "metaslab_min_alloc_size/Z 1000" | mdb -kw
Thanks, this also boosted my home box from hundreds of kb/s into
several Mb/s range, which is much better (I''m evacuating data from
a pool hosted in a volume inside my main pool, and the bottleneck
is quite substantial) - now I''d get rid of this experiment much faster
;)


-- 


+============================================================+
|                                                            |
| ?????? ???????,                                 Jim Klimov |
| ??????????? ????????                                   CTO |
| ??? "??? ? ??"                                  JSC COS&HT |
|                                                            |
| +7-903-7705859 (cellular)          mailto:jimklimov at cos.ru |
|                          CC:admin at cos.ru,jimklimov at mail.ru |
+============================================================+
| ()  ascii ribbon campaign - against html mail              |
| /\                        - against microsoft attachments  |
+============================================================+

Donald Stahl

2011-May-17 02:32 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

As a followup:

I ran the same DD test as earlier- but this time I stopped the scrub:

pool0       14.1T  25.4T     88  4.81K   709K   262M
pool0       14.1T  25.4T    104  3.99K   836K   248M
pool0       14.1T  25.4T    360  5.01K  2.81M   230M
pool0       14.1T  25.4T    305  5.69K  2.38M   231M
pool0       14.1T  25.4T    389  5.85K  3.05M   293M
pool0       14.1T  25.4T    376  5.38K  2.94M   328M
pool0       14.1T  25.4T    295  3.29K  2.31M   286M

~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
2000+0 records out
2097152000 bytes (2.1 GB) copied, 6.50394 s, 322 MB/s

Stopping the scrub seemed to increase my performance by another 60%
over the highest numbers I saw just from the metaslab change earlier
(That peak was 201 MB/s).

This is the performance I was seeing out of this array when newly built.

I have two follow up questions:

1. We changed the metaslab size from 10M to 4k- that''s a pretty
drastic change. Is there some median value that should be used instead
and/or is there a downside to using such a small metaslab size?

2. I''m still confused by the poor scrub performance and it''s
impact on
the write performance. I''m not seeing a lot of IO''s or
processor load-
so I''m wondering what else I might be missing.

-Don

Jim Klimov

2011-May-17 13:49 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

2011-05-17 6:32, Donald Stahl ?????:> I have two follow up questions:
>
> 1. We changed the metaslab size from 10M to 4k- that''s a pretty
> drastic change. Is there some median value that should be used instead
> and/or is there a downside to using such a small metaslab size?
>
> 2. I''m still confused by the poor scrub performance and
it''s impact on
> the write performance. I''m not seeing a lot of IO''s or
processor load-
> so I''m wondering what else I might be missing.
I have a third question, following up to the first one above ;)

3) Is the "4k" size anyhow theoretically based?
Namely, is it a "reasonably large" amount of eight or so
metadata blocks of 512Kb size,or something else is in
play - like a 4Kb IO?

In particular, since my system uses 4Kb blocks (ashift=12),
for similar benefit I should set metaslab size to 32k (4K*8
blocks) - yes/no?

Am I also correct to assume that if I have a large streaming
write and ZFS can see or predict that it would soon have to
reference many blocks, it can allocate a metaslab larger
that this specified minimum and thus keep fragmentation
somewhat not extremely high?

Actually, am I understanding correctly that metaslabs are
large contiguous ranges reserved for metadata blocks?
If so, and if they are indeed treated specially anyway,
is it possible to use 512-byte records for metadata even
on VDEVs with 4kb block size configured by ashift=12?
Perhaps not today, but as an RFE for ZFS development
(I posted the idea here https://www.illumos.org/issues/954 )

Rationale: Very much space is wasted on my box just to
reference data blocks and keep 3.5kb of trailing garbage ;)

4) In one internet post I''ve seen suggestions about this
value to be set as well:
set zfs:metaslab_smo_bonus_pct = 0xc8

http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg40765.html

Can anybody comment - what it is and whether it would
be useful? The original post passed the knowledge as-is...
Thanks



-- 


+============================================================+
|                                                            |
| ?????? ???????,                                 Jim Klimov |
| ??????????? ????????                                   CTO |
| ??? "??? ? ??"                                  JSC COS&HT |
|                                                            |
| +7-903-7705859 (cellular)          mailto:jimklimov at cos.ru |
|                          CC:admin at cos.ru,jimklimov at mail.ru |
+============================================================+
| ()  ascii ribbon campaign - against html mail              |
| /\                        - against microsoft attachments  |
+============================================================+

Richard Elling

2011-May-17 15:26 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

On May 16, 2011, at 7:32 PM, Donald Stahl wrote:
> As a followup:
> 
> I ran the same DD test as earlier- but this time I stopped the scrub:
> 
> pool0       14.1T  25.4T     88  4.81K   709K   262M
> pool0       14.1T  25.4T    104  3.99K   836K   248M
> pool0       14.1T  25.4T    360  5.01K  2.81M   230M
> pool0       14.1T  25.4T    305  5.69K  2.38M   231M
> pool0       14.1T  25.4T    389  5.85K  3.05M   293M
> pool0       14.1T  25.4T    376  5.38K  2.94M   328M
> pool0       14.1T  25.4T    295  3.29K  2.31M   286M
> 
> ~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
> 2000+0 records out
> 2097152000 bytes (2.1 GB) copied, 6.50394 s, 322 MB/s
> 
> Stopping the scrub seemed to increase my performance by another 60%
> over the highest numbers I saw just from the metaslab change earlier
> (That peak was 201 MB/s).
> 
> This is the performance I was seeing out of this array when newly built.
> 
> I have two follow up questions:
> 
> 1. We changed the metaslab size from 10M to 4k- that''s a pretty
> drastic change. Is there some median value that should be used instead
> and/or is there a downside to using such a small metaslab size?
metaslab_min_alloc_size is not the metaslab size. From the source

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/metaslab.c#57

/*
 * A metaslab is considered "free" if it contains a contiguous
 * segment which is greater than metaslab_min_alloc_size.
 */

By reducing this value, it is easier for the allocator to identify a metaslab
for
allocation as the file system becomes full.
> 
> 2. I''m still confused by the poor scrub performance and
it''s impact on
> the write performance. I''m not seeing a lot of IO''s or
processor load-
> so I''m wondering what else I might be missing.
For slow disks with the default zfs_vdev_max_pending, the IO scheduler becomes
ineffective. Consider reducing zfs_vdev_max_pending to see if performance
improves.
Based on recent testing I''ve done on a variety of disks, a value of 1
or 2 can be better
for 7,200 rpm disks or slower. The tradeoff is a few IOPS for much better
average latency.
 -- richard


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110517/5b61ba46/attachment.html>

Donald Stahl

2011-May-17 15:38 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> metaslab_min_alloc_size is not the metaslab size. From the sourceSorry-  that was simply a slip of the mind- it was a long day.
> By reducing this value, it is easier for the allocator to identify a
> metaslab for allocation as the file system becomes full.Thank you for clarifying. Is there a danger to reducing this value to
4k? Also 4k and 10M are pretty far apart- is there an intermediate
value we should be using that would be a better compromise?
> For slow disks with the default zfs_vdev_max_pending, the IO scheduler
> becomes ineffective. Consider reducing zfs_vdev_max_pending to see if
performance
> improves.
> Based on recent testing I''ve done on a variety of disks, a value
of 1 or 2
> can be better for 7,200 rpm disks or slower. The tradeoff is a few IOPS for
much better
> average latency.I was having this scrub performance problem when my pool was nothing
but 15k SAS drives so I''m not sure if it will help but I''ll
certainly
give it a try. Thanks for the suggestion.

-Don

George Wilson

2011-May-17 17:23 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

On Mon, May 16, 2011 at 7:32 PM, Donald Stahl <don at blacksun.org>
wrote:> As a followup:
>
> I ran the same DD test as earlier- but this time I stopped the scrub:
>
> pool0 ? ? ? 14.1T ?25.4T ? ? 88 ?4.81K ? 709K ? 262M
> pool0 ? ? ? 14.1T ?25.4T ? ?104 ?3.99K ? 836K ? 248M
> pool0 ? ? ? 14.1T ?25.4T ? ?360 ?5.01K ?2.81M ? 230M
> pool0 ? ? ? 14.1T ?25.4T ? ?305 ?5.69K ?2.38M ? 231M
> pool0 ? ? ? 14.1T ?25.4T ? ?389 ?5.85K ?3.05M ? 293M
> pool0 ? ? ? 14.1T ?25.4T ? ?376 ?5.38K ?2.94M ? 328M
> pool0 ? ? ? 14.1T ?25.4T ? ?295 ?3.29K ?2.31M ? 286M
>
> ~# dd if=/dev/zero of=/pool0/ds.test bs=1024k count=2000 2000+0 records in
> 2000+0 records out
> 2097152000?bytes (2.1 GB) copied, 6.50394 s, 322 MB/s
>
> Stopping the scrub seemed to increase my performance by another 60%
> over the highest numbers I saw just from the metaslab change earlier
> (That peak was 201 MB/s).
>
> This is the performance I was seeing out of this array when newly built.
>
> I have two follow up questions:
>
> 1. We changed the metaslab size from 10M to 4k- that''s a pretty
> drastic change. Is there some median value that should be used instead
> and/or is there a downside to using such a small metaslab size?

Unfortunately the default value for metaslab_min_alloc_size is too
high. I''ve been meaning to rework much of this code to make the change
more dynamic rather than just a hard-coded value. What this is trying
to do is make sure that zfs switches to a different metaslab once it
finds that it can''t allocate its desired chunk. With the default value
the desired chunk is 160MB. By taking the value to 4K it now is
looking for 64K chunks which is more reasonable for fuller pools. My
plan is to make these values dynamically change as we start to fill up
the metaslabs. This is a substantial rewhack of the code and not
something that will be available anytime soon.

> 2. I''m still confused by the poor scrub performance and
it''s impact on
> the write performance. I''m not seeing a lot of IO''s or
processor load-
> so I''m wondering what else I might be missing.
Scrub will impact performance although I wouldn''t expect a 60% drop.
Do you mind sharing more data on this? I would like to see the
spa_scrub_* values I sent you earlier while you''re running your test
(in a loop so we can see the changes). What I''m looking for is to see
how many inflight scrubs you have at the time of your run.

Thanks,
George
> -Don
>


-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com

George Wilson

2011-May-17 17:44 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

On Tue, May 17, 2011 at 6:49 AM, Jim Klimov <jimklimov at cos.ru>
wrote:> 2011-05-17 6:32, Donald Stahl ?????:
>>
>> I have two follow up questions:
>>
>> 1. We changed the metaslab size from 10M to 4k- that''s a
pretty
>> drastic change. Is there some median value that should be used instead
>> and/or is there a downside to using such a small metaslab size?
>>
>> 2. I''m still confused by the poor scrub performance and
it''s impact on
>> the write performance. I''m not seeing a lot of IO''s
or processor load-
>> so I''m wondering what else I might be missing.
>
> I have a third question, following up to the first one above ;)
>
> 3) Is the "4k" size anyhow theoretically based?
> Namely, is it a "reasonably large" amount of eight or so
> metadata blocks of 512Kb size,or something else is in
> play - like a 4Kb IO?
The 4k blocksize was based on some analysis I had done on some systems
at Oracle. The code uses this shifted by another tuneable (defaults to
4) to determine the "fragmented" minimum size. So if you bump this to
32k then the fragmented size is 512k which tells ZFS to switch to a
different metaslab once it drops below this threshold.
>
> In particular, since my system uses 4Kb blocks (ashift=12),
> for similar benefit I should set metaslab size to 32k (4K*8
> blocks) - yes/no?
>
> Am I also correct to assume that if I have a large streaming
> write and ZFS can see or predict that it would soon have to
> reference many blocks, it can allocate a metaslab larger
> that this specified minimum and thus keep fragmentation
> somewhat not extremely high?
The metaslabs are predetermined at config time and their sizes are
fixed. A good way to think about them is as slices of your disk. If
you take your disk size and divided them up into 200 equally sized
sections then you end up with your metaslab size.
>
> Actually, am I understanding correctly that metaslabs are
> large contiguous ranges reserved for metadata blocks?
> If so, and if they are indeed treated specially anyway,
> is it possible to use 512-byte records for metadata even
> on VDEVs with 4kb block size configured by ashift=12?
> Perhaps not today, but as an RFE for ZFS development
> (I posted the idea here https://www.illumos.org/issues/954 )
>
No, metaslab are for all allocations and not specific to metadata.
There''s more work to do to efficiently deal with 4k block sizes.
> Rationale: Very much space is wasted on my box just to
> reference data blocks and keep 3.5kb of trailing garbage ;)
>
> 4) In one internet post I''ve seen suggestions about this
> value to be set as well:
> set zfs:metaslab_smo_bonus_pct = 0xc8
>
> http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg40765.html
This is used to add more weight (i.e. preference) to specific
metaslabs. A metaslab receives this bonus if it has an offset which is
lower than a previously use metaslab. Sorry this is somewhat
complicated and hard to explain without a whiteboard. :-)

Thanks,
George
>
> Can anybody comment - what it is and whether it would
> be useful? The original post passed the knowledge as-is...
> Thanks
>
>
>
> --
>
>
> +============================================================+
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|
> | ?????? ???????, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Jim Klimov |
> | ??????????? ???????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? CTO |
> | ??? "??? ? ??" ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?JSC COS&HT
|
> | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?|
> | +7-903-7705859 (cellular) ? ? ? ? ?mailto:jimklimov at cos.ru |
> | ? ? ? ? ? ? ? ? ? ? ? ? ?CC:admin at cos.ru,jimklimov at mail.ru |
> +============================================================+
> | () ?ascii ribbon campaign - against html mail ? ? ? ? ? ? ?|
> | /\ ? ? ? ? ? ? ? ? ? ? ? ?- against microsoft attachments ?|
> +============================================================+
>
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com

Jim Klimov

2011-May-17 18:48 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> So if you bump this to 32k then the fragmented size 
> is 512k which tells ZFS to switch to a different metaslab
> once it drops below this threshold.
Makes sense after some more reading today ;)
 
What happens if no metaslab has a block this large (or small)
on a sufficiently full and fragmented system? Will the new writes
fail altogether, or a sufficient free space block would still be used?
 
 > This is used to add more weight (i.e. preference) to specific
> metaslabs. A metaslab receives this bonus if it has an offset 
> which is
> lower than a previously use metaslab. Sorry this is somewhat
> complicated and hard to explain without a whiteboard. :-)
 >From recent reading on Jeff''s blog and links leading from it,I might guess this relates to different disk offsets with different
writing speeds? Yes-no would suffice, as to spare the absent 
whiteboard ,)
 
Thanks,
//Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110517/1e86976b/attachment.html>

George Wilson

2011-May-17 19:40 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

On Tue, May 17, 2011 at 11:48 AM, Jim Klimov <jim at cos.ru>
wrote:>> So if you bump this to 32k then the fragmented size
>> is 512k which tells ZFS to switch to a different metaslab
>>?once it drops below this threshold.
>
> Makes sense after some more reading today ;)
>
> What happens if no metaslab has a block this large (or small)
> on a sufficiently full and fragmented system? Will the new writes
> fail altogether, or a sufficient free space block would still be used?
If all the metaslabs on all of your devices won''t accommodate the
specified block then you will start to create gang blocks (i.e.
smaller fragments which make up the specified block size).
>
>
>> This is used to add more weight (i.e. preference) to specific
>> metaslabs. A metaslab receives this bonus if it has an offset
>> which is
>> lower than a previously use metaslab. Sorry this is somewhat
>> complicated and hard to explain without a whiteboard. :-)
>
> From?recent reading on Jeff''s blog and links leading from it,
> I might guess this relates to different disk offsets with different
> writing speeds? Yes-no would suffice, as to spare the absent
> whiteboard ,)

No. Imagine if you started allocations on a disk and used the
metaslabs that are at the edge of disk and some out a 1/3 of the way
in. Then you want all the metaslabs which are a 1/3 of the way in and
lower to get the bonus. This keeps the allocations towards the outer
edges.

- George
>
> Thanks,
> //Jim


-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com

Donald Stahl

2011-May-19 00:48 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

Wow- so a bit of an update:

With the default scrub delay:
echo "zfs_scrub_delay/K" | mdb -kw
zfs_scrub_delay:200000004

pool0       14.1T  25.3T    165    499  1.28M  2.88M
pool0       14.1T  25.3T    146      0  1.13M      0
pool0       14.1T  25.3T    147      0  1.14M      0
pool0       14.1T  25.3T    145      3  1.14M  31.9K
pool0       14.1T  25.3T    314      0  2.43M      0
pool0       14.1T  25.3T    177      0  1.37M  3.99K

The scrub continues on at about 250K/s - 500K/s

With the delay set to 1:

echo "zfs_scrub_delay/W1" | mdb -kw

pool0       14.1T  25.3T    272      3  2.11M  31.9K
pool0       14.1T  25.3T    180      0  1.39M      0
pool0       14.1T  25.3T    150      0  1.16M      0
pool0       14.1T  25.3T    248      3  1.93M  31.9K
pool0       14.1T  25.3T    223      0  1.73M      0

The pool scrub rate climbs to about 800K/s - 100K/s

If I set the delay to 0:

echo "zfs_scrub_delay/W0" | mdb -kw

pool0       14.1T  25.3T  50.1K    116   392M   434K
pool0       14.1T  25.3T  49.6K      0   389M      0
pool0       14.1T  25.3T  50.8K     61   399M   633K
pool0       14.1T  25.3T  51.2K      3   402M  31.8K
pool0       14.1T  25.3T  51.6K      0   405M  3.98K
pool0       14.1T  25.3T  52.0K      0   408M      0

Now the pool scrub rate climbs to 100MB/s (in the brief time I looked at it).

Is there a setting somewhere between slow and ludicrous speed?

-Don

George Wilson

2011-May-19 01:19 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

Don,

Try setting the zfs_scrub_delay to 1 but increase the
zfs_top_maxinflight to something like 64.

Thanks,
George

On Wed, May 18, 2011 at 5:48 PM, Donald Stahl <don at blacksun.org>
wrote:> Wow- so a bit of an update:
>
> With the default scrub delay:
> echo "zfs_scrub_delay/K" | mdb -kw
> zfs_scrub_delay:200000004
>
> pool0 ? ? ? 14.1T ?25.3T ? ?165 ? ?499 ?1.28M ?2.88M
> pool0 ? ? ? 14.1T ?25.3T ? ?146 ? ? ?0 ?1.13M ? ? ?0
> pool0 ? ? ? 14.1T ?25.3T ? ?147 ? ? ?0 ?1.14M ? ? ?0
> pool0 ? ? ? 14.1T ?25.3T ? ?145 ? ? ?3 ?1.14M ?31.9K
> pool0 ? ? ? 14.1T ?25.3T ? ?314 ? ? ?0 ?2.43M ? ? ?0
> pool0 ? ? ? 14.1T ?25.3T ? ?177 ? ? ?0 ?1.37M ?3.99K
>
> The scrub continues on at about 250K/s - 500K/s
>
> With the delay set to 1:
>
> echo "zfs_scrub_delay/W1" | mdb -kw
>
> pool0 ? ? ? 14.1T ?25.3T ? ?272 ? ? ?3 ?2.11M ?31.9K
> pool0 ? ? ? 14.1T ?25.3T ? ?180 ? ? ?0 ?1.39M ? ? ?0
> pool0 ? ? ? 14.1T ?25.3T ? ?150 ? ? ?0 ?1.16M ? ? ?0
> pool0 ? ? ? 14.1T ?25.3T ? ?248 ? ? ?3 ?1.93M ?31.9K
> pool0 ? ? ? 14.1T ?25.3T ? ?223 ? ? ?0 ?1.73M ? ? ?0
>
> The pool scrub rate climbs to about 800K/s - 100K/s
>
> If I set the delay to 0:
>
> echo "zfs_scrub_delay/W0" | mdb -kw
>
> pool0 ? ? ? 14.1T ?25.3T ?50.1K ? ?116 ? 392M ? 434K
> pool0 ? ? ? 14.1T ?25.3T ?49.6K ? ? ?0 ? 389M ? ? ?0
> pool0 ? ? ? 14.1T ?25.3T ?50.8K ? ? 61 ? 399M ? 633K
> pool0 ? ? ? 14.1T ?25.3T ?51.2K ? ? ?3 ? 402M ?31.8K
> pool0 ? ? ? 14.1T ?25.3T ?51.6K ? ? ?0 ? 405M ?3.98K
> pool0 ? ? ? 14.1T ?25.3T ?52.0K ? ? ?0 ? 408M ? ? ?0
>
> Now the pool scrub rate climbs to 100MB/s (in the brief time I looked at
it).
>
> Is there a setting somewhere between slow and ludicrous speed?
>
> -Don
>


-- 
George Wilson



M: +1.770.853.8523
F: +1.650.494.1676
275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com

Donald Stahl

2011-May-19 02:03 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

> Try setting the zfs_scrub_delay to 1 but increase the
> zfs_top_maxinflight to something like 64.The array is running some regression tests right now but when it
quiets down I''ll try that change.

-Don

Donald Stahl

2011-May-19 03:57 UTC

head link

[zfs-discuss] Extremely slow zpool scrub performance

>> Try setting the zfs_scrub_delay to 1 but increase the
>> zfs_top_maxinflight to something like 64.With the delay set to 1 or higher it doesn''t matter what I set the
maxinflight value to- when I check with:

echo "::walk spa | ::print spa_t spa_name spa_last_io
spa_scrub_inflight"

The value returned is only ever 0, 1 or 2.

If I set the delay to zero, but drop the maxinflight to 8, then the
read rate drops from 400MB/s to 125MB/s.

If I drop it again to 4- then the read rate drops to a much more
manageable 75MB/s.

The delay seems to be useless on this array- but the maxinflight makes
a big difference.

At 16 my read rate is 300. At 32 it goes up to 380. Beyond 32 it
doesn''t seem to change much- it seems to level out at about 400 and
50k R/s:

pool0       14.1T  25.3T  51.2K      4   402M  35.8K
pool0       14.1T  25.3T  51.9K      3   407M  31.8K
pool0       14.1T  25.3T  52.1K      0   409M      0
pool0       14.1T  25.3T  51.9K      2   407M   103K
pool0       14.1T  25.3T  51.7K      3   406M  31.9K

I''m going to leave it at 32 for the night- as that is a quiet time for
us.

In fact I will probably leave it at 32 all the time. Since our array
is very quiet on the weekends I can start a scan on Friday night and
be done long before Monday morning rolls around. For us that''s
actually much more useful than having the scrub throttled at all
times, but taking a month to finish.

Thanks for the suggestions.

-Don

Reasonably Related Threads

Search for more possibly parallel threads

zfs discuss - May 2011 - Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

[zfs-discuss] Extremely slow zpool scrub performance

Reasonably Related Threads