thr3ads.net - zfs discuss - [zfs-discuss] slow reads question... [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Harley Gorrell

2006-Sep-22 18:05 UTC

[zfs-discuss] slow reads question...

I have set up a small box to work with zfs.  (2x 2.4GHz
xeons, 4GB memory, 6x scsi disks) I made one drive the boot
drive and put the other five into a pool with the "zpool
create tank" command right out of the admin manual.

    The administration experience has been very nice and most
everything as worked as expected.  (Setting up new
filesystems, swapping out failed drives, etc.) What isnt as
I expected is the slow speed.

    When using a raw device, a scsi disk on the system reads
at 34MB/s.  About what I would expect for these disks.

| # time dd if=/dev/rdsk/c0t1d0 of=/dev/null bs=8k count=102400
| 102400+0 records in
| 102400+0 records out
|
| real    0m23.182s
| user    0m0.135s
| sys     0m1.979s

    However when reading from a 10GB file of zeros, made with
mkfile, the read performace is much lower, 11MB/s.

| # time dd if=zeros-10g of=/dev/null bs=8k count=102400
| 102400+0 records in
| 102400+0 records out
|
| real    1m8.763s
| user    0m0.104s
| sys     0m1.759s

    After reading the list archives, I saw "ztune.sh". Using
it I tried a couple of different settings and didnt see any
changes.  After that I toggled the compression, atime,
recordsize and checksum options on and off to no avail.

    Am I expecting too much from this setup?  What might be
changed to speed things up?  Wait until snv_45?

    The version of open solaris is:

| # uname -a
| SunOS donatella 5.11 snv_44 i86pc i386 i86pc

    The options on the filesystem are:

| # zfs get all tank/home
| NAME       PROPERTY       VALUE                  SOURCE
| tank/home  type           filesystem             - 
| tank/home  creation       Fri Sep 22 10:47 2006  - 
| tank/home  used           39.1K                  - 
| tank/home  available      112G                   - 
| tank/home  referenced     39.1K                  - 
| tank/home  compressratio  1.00x                  - 
| tank/home  mounted        yes                    - 
| tank/home  quota          none                   default 
| tank/home  reservation    none                   default 
| tank/home  recordsize     128K                   default 
| tank/home  mountpoint     /export/zfs            local
| tank/home  sharenfs       on                     local
| tank/home  checksum       on                     default 
| tank/home  compression    off                    default 
| tank/home  atime          on                     default 
| tank/home  devices        on                     default 
| tank/home  exec           on                     default 
| tank/home  setuid         on                     default 
| tank/home  readonly       off                    default 
| tank/home  zoned          off                    default 
| tank/home  snapdir        hidden                 default 
| tank/home  aclmode        groupmask              default 
| tank/home  aclinherit     secure                 default

thanks,
harley.

johansen

2006-Sep-22 18:26 UTC

head link

[zfs-discuss] Re: slow reads question...

ZFS uses a 128k block size.  If you change dd to use a bs=128k, do you observe
any performance improvement?
> | # time dd if=zeros-10g of=/dev/null bs=8k
> count=102400
> | 102400+0 records in
> | 102400+0 records out
>
> | real    1m8.763s
> | user    0m0.104s
> | sys     0m1.759s
It''s also worth noting that this dd used less system and user time than
the read from the raw device, yet took a longer time in "real" time.
 
 
This message posted from opensolaris.org

Harley Gorrell

2006-Sep-22 19:08 UTC

head link

[zfs-discuss] Re: slow reads question...

On Fri, 22 Sep 2006, johansen wrote:> ZFS uses a 128k block size.  If you change dd to use a
> bs=128k, do you observe any performance improvement?
    I had tried other sizes with much the same results, but
hadnt gone as large as 128K.  With bs=128K, it gets worse:

| # time dd if=zeros-10g of=/dev/null bs=128k count=102400
| 81920+0 records in
| 81920+0 records out
| 
| real    2m19.023s
| user    0m0.105s
| sys     0m8.514s
> It''s also worth noting that this dd used less system and
> user time than the read from the raw device, yet took a
> longer time in "real" time.
    I think some of the blocks might be cached, as I have run
this a number of times.  I really dont know how the time
might be accounted for -- However, the real time is correct
as that is what I see while waiting for the command to
complete.

    Is there any other info I can provide which would help?

harley.

johansen-osdev at sun.com

2006-Sep-22 20:16 UTC

head link

[zfs-discuss] Re: slow reads question...

Harley:
>    I had tried other sizes with much the same results, but
> hadnt gone as large as 128K.  With bs=128K, it gets worse:
> 
> | # time dd if=zeros-10g of=/dev/null bs=128k count=102400
> | 81920+0 records in
> | 81920+0 records out
> | 
> | real    2m19.023s
> | user    0m0.105s
> | sys     0m8.514s
I may have done my math wrong, but if we assume that the real
time is the actual amount of time we spent performing the I/O (which may
be incorrect) haven''t you done better here?

In this case you pushed 81920 128k records in ~139 seconds -- approx
75437 k/sec.

Using ZFS with 8k bs, you pushed 102400 8k records in ~68 seconds --
approx 12047 k/sec.

Using the raw device you pushed 102400 8k records in ~23 seconds --
approx 35617 k/sec.

I may have missed something here, but isn''t this newest number the
highest performance so far?

What does iostat(1M) say about your disk read performance?
>    Is there any other info I can provide which would help?
Are you just trying to measure ZFS''s read performance here?

It might be interesting to change your outfile (of) argument and see if
we''re actually running into some other performance problem.  If you
change of=/tmp/zeros does performance improve or degrade?  Likewise, if
you write the file out to another disk (UFS, ZFS, whatever), does this
improve performance?

-j

Harley Gorrell

2006-Sep-23 00:46 UTC

head link

[zfs-discuss] Re: slow reads question...

On Fri, 22 Sep 2006, johansen-osdev at sun.com wrote:> Are you just trying to measure ZFS''s read performance here?
    That is what I started looking at.  We scrounged around
and found a set of 300GB drives to replace the old ones we
started with.  Comparing these new drives to the old ones:

Old 36GB drives:

| # time mkfile -v 1g zeros-1g
| zeros-1g 1073741824 bytes
| 
| real    2m31.991s
| user    0m0.007s
| sys     0m0.923s

Newer 300GB drives:

| # time mkfile -v 1g zeros-1g
| zeros-1g 1073741824 bytes
| 
| real    0m8.425s
| user    0m0.010s
| sys     0m1.809s

    At this point I am pretty happy.

    I am wondering if there is something other than capacity
and seek time which has changed between the drives.  Would a
different scsi command set or features have this dramatic a
difference?

thanks!,
harley.

johansen-osdev at sun.com

2006-Sep-23 01:12 UTC

head link

[zfs-discuss] Re: slow reads question...

Harley:
> Old 36GB drives:
> 
> | # time mkfile -v 1g zeros-1g
> | zeros-1g 1073741824 bytes
> | 
> | real    2m31.991s
> | user    0m0.007s
> | sys     0m0.923s
> 
> Newer 300GB drives:
> 
> | # time mkfile -v 1g zeros-1g
> | zeros-1g 1073741824 bytes
> | 
> | real    0m8.425s
> | user    0m0.010s
> | sys     0m1.809s
This is a pretty dramatic difference.  What type of drives were your old
36g drives?
>    I am wondering if there is something other than capacity
> and seek time which has changed between the drives.  Would a
> different scsi command set or features have this dramatic a
> difference?
I''m hardly the authority on hardware, but there are a couple of
possibilties.  Your newer drives may have a write cache.  It''s also
quite likely that the newer drives have a faster speed of rotation and
seek time.

If you subtract the usr + sys time from the real time in these
measurements, I suspect the result is the amount of time you were
actually waiting for the I/O to finish.  In the first case, you spent
99% of your total time waiting for stuff to happen, whereas in the
second case it was only ~86% of your overall time.

-j

Roch

2006-Sep-25 12:24 UTC

head link

[zfs-discuss] Re: slow reads question...

Harley Gorrell writes:
 > On Fri, 22 Sep 2006, johansen-osdev at sun.com wrote:
 > > Are you just trying to measure ZFS''s read performance here?
 > 
 >     That is what I started looking at.  We scrounged around
 > and found a set of 300GB drives to replace the old ones we
 > started with.  Comparing these new drives to the old ones:
 > 
 > Old 36GB drives:
 > 
 > | # time mkfile -v 1g zeros-1g
 > | zeros-1g 1073741824 bytes
 > | 
 > | real    2m31.991s
 > | user    0m0.007s
 > | sys     0m0.923s
 > 
 > Newer 300GB drives:
 > 
 > | # time mkfile -v 1g zeros-1g
 > | zeros-1g 1073741824 bytes
 > | 
 > | real    0m8.425s
 > | user    0m0.010s
 > | sys     0m1.809s
 > 
 >     At this point I am pretty happy.
 > 

This looks like on the second run, you had lots more free
memory and mkfile completed near memcpy speed.

Something is awry on the first pass though. Then,

	zpool iostat 1

can put some lights on this. IO will keep on going after the 
mkfile completes in the second case. For the first one,
there may have been an interaction with not yet finished I/O loads ?

-r


 >     I am wondering if there is something other than capacity
 > and seek time which has changed between the drives.  Would a
 > different scsi command set or features have this dramatic a
 > difference?
 > 
 > thanks!,
 > harley.
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Harley Gorrell

2006-Sep-25 17:27 UTC

head link

[zfs-discuss] Re: slow reads question...

On Mon, 25 Sep 2006, Roch wrote:> This looks like on the second run, you had lots more free
> memory and mkfile completed near memcpy speed.
    Both times the system was near idle.
> Something is awry on the first pass though. Then,
>
> 	zpool iostat 1
>
> can put some lights on this. IO will keep on going after the
> mkfile completes in the second case. For the first one,
> there may have been an interaction with not yet finished I/O loads ?
     The old drives arent in the system, but I did try this
with the new drives.  I ran "mkfile -v 1g zeros-1g" a couple
times while "zpool iostat -v 1" was running in another
window.  There were a seven stats like this first one where
it is writing to disk.  The next to last is were the
bandwidth drops as there isnt enough IO to fill out that
second. Followed by zeros of no IO.  I didnt see any "write
behind" -- Once the IO was done I didnt see more until I
started something else.

|                capacity     operations    bandwidth
| pool         used  avail   read  write   read  write
| ----------  -----  -----  -----  -----  -----  -----
| tank        26.1G  1.34T      0  1.13K      0   134M
|   raidz1    26.1G  1.34T      0  1.13K      0   134M
|     c0t1d0      -      -      0    367      0  33.6M
|     c0t2d0      -      -      0    377      0  35.5M
|     c0t3d0      -      -      0    401      0  35.0M
|     c0t4d0      -      -      0    411      0  36.0M
|     c0t5d0      -      -      0    424      0  34.9M
| ----------  -----  -----  -----  -----  -----  -----
| 
|                capacity     operations    bandwidth
| pool         used  avail   read  write   read  write
| ----------  -----  -----  -----  -----  -----  -----
| tank        26.4G  1.34T      0  1.01K    560   118M
|   raidz1    26.4G  1.34T      0  1.01K    560   118M
|     c0t1d0      -      -      0    307      0  29.6M
|     c0t2d0      -      -      0    309      0  27.6M
|     c0t3d0      -      -      0    331      0  28.1M
|     c0t4d0      -      -      0    338  35.0K  27.0M
|     c0t5d0      -      -      0    338  35.0K  28.3M
| ----------  -----  -----  -----  -----  -----  -----
| 
|                capacity     operations    bandwidth
| pool         used  avail   read  write   read  write
| ----------  -----  -----  -----  -----  -----  -----
| tank        26.4G  1.34T      0      0      0      0
|   raidz1    26.4G  1.34T      0      0      0      0
|     c0t1d0      -      -      0      0      0      0
|     c0t2d0      -      -      0      0      0      0
|     c0t3d0      -      -      0      0      0      0
|     c0t4d0      -      -      0      0      0      0
|     c0t5d0      -      -      0      0      0      0
| ----------  -----  -----  -----  -----  -----  -----

    As things stand now, I am happy.

    I do wonder what accounts for the improvement -- seek
time, transfer rate, disk cache, or something else?  Does
anywone have a dtrace script to measure this which they
would share?

harley.

Richard Elling - PAE

2006-Sep-25 17:52 UTC

head link

[zfs-discuss] Re: slow reads question...

Harley Gorrell wrote:>    I do wonder what accounts for the improvement -- seek
> time, transfer rate, disk cache, or something else?  Does
> anywone have a dtrace script to measure this which they
> would share?
You might also be seeing the effects of defect management.  As
drives get older, they tend to find and repair more defects.
This will slow the performance of the drive, though I''ve not
seen this sort of extreme.  You might infer this from a dtrace
script which would record the service time per iop -- in which
case you may see some iops with much larger service times than
normal.  I would expect this to be a second order effect.

Meanwhile, you should check to make sure you''re tranferring data
at the rate you think (SCSI autonegotiates data transfer rates).
If you know the model number, you can get the rotational speed
and average seek times to see if that is radically different
for the two disk types.
  -- richard

zfs discuss - Sep 2006 - slow reads question...

[zfs-discuss] slow reads question...

[zfs-discuss] Re: slow reads question...

[zfs-discuss] Re: slow reads question...

[zfs-discuss] Re: slow reads question...

[zfs-discuss] Re: slow reads question...

[zfs-discuss] Re: slow reads question...

[zfs-discuss] Re: slow reads question...

[zfs-discuss] Re: slow reads question...

[zfs-discuss] Re: slow reads question...