thr3ads.net - zfs discuss - [zfs-discuss] ZFS Performance Issue [Feb 2008]

If this information is useful, please help other people find it:
Share via:

William Fretts-Saxton

2008-Feb-05 19:52 UTC

[zfs-discuss] ZFS Performance Issue

This may not be a ZFS issue, so please bear with me!

I have 4 internal drives that I have striped/mirrored with ZFS and have an
application server which is reading/writing to hundreds of thousands of files on
it, thousands of files @ a time.

If 1 client uses the app server, the transaction (reading/writing to ~80 files)
takes about 200 ms.  If I have about 80 clients attempting it @ once, it can
sometimes take a minute or more.  I''m pretty sure its a file I/O
bottleneck so I want to make sure ZFS is tuned properly for this kind of usage.

The only thing I could think of, so far, is to turn off ZFS compression.  Is
there anything else I can do?  Here is my "zpool iostat" output:

# zpool iostat 5
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool1       5.69G   266G     23     76  1.44M  2.24M
pool1       5.69G   266G     96    259  5.70M  7.25M
pool1       5.69G   266G     98    267  5.73M  7.32M
pool1       5.69G   266G     92    253  5.76M  7.31M
pool1       5.69G   266G     90    254  5.67M  7.43M

and here is regular iostat:

# iostat -xnz 5
                 extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.2    0.0    0.1  0.0  0.0    0.0    0.3   0   0 c0t0d0
    0.0    0.2    0.0    0.1  0.0  0.0    0.0    0.3   0   0 c0t1d0
   20.4  145.0 1315.8 3714.5  0.0  2.8    0.0   16.8   0  21 c0t2d0
   21.4  143.2 1380.2 3711.3  0.0  4.1    0.0   25.1   0  27 c0t3d0
   23.4  138.4 1509.3 3693.0  0.0  1.6    0.0    9.8   0  17 c0t4d0
   20.8  137.8 1341.6 3693.0  0.0  2.3    0.0   14.7   0  21 c0t5d0
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-05 20:03 UTC

head link

[zfs-discuss] ZFS Performance Issue

Some more information about the system.  NOTE: Cpu utilization never goes above
10%.

Sun Fire v40z
4 x 2.4 GHz proc
8 GB memory
3 x 146 GB Seagate Drives (10k RPM)
1 x 146 GB Fujitsu Drive (10k RPM)
 
 
This message posted from opensolaris.org

Marc Bevand

2008-Feb-06 05:05 UTC

head link

[zfs-discuss] ZFS Performance Issue

William Fretts-Saxton <william.fretts.saxton <at> sun.com>
writes:> 
> Some more information about the system.  NOTE: Cpu utilization never
> goes above 10%.
> 
> Sun Fire v40z
> 4 x 2.4 GHz proc
> 8 GB memory
> 3 x 146 GB Seagate Drives (10k RPM)
> 1 x 146 GB Fujitsu Drive (10k RPM)
And what version of Solaris or what build of OpenSolaris are you using ?
Do you know if your application uses synchronous I/O transactions ?
Have you tried disabling ZFS file-level prefetching (just as an
experiment) ? See:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#File-Level_Prefetching

-marc

William Fretts-Saxton

2008-Feb-06 14:29 UTC

head link

[zfs-discuss] ZFS Performance Issue

Hi Marc,

# cat /etc/release
                        Solaris 10 8/07 s10x_u4wos_12b X86

I don''t know if my application uses synchronous I/O
transactions...I''m using Sun''s Glassfish v2u1.

I''ve deleted the ZFS partition and have setup an SVM stripe/mirror just
to see if "ZFS" is getting in the way.  I"ll try out the
prefetching idea when I''m done with the SVM testing.

Thanks.
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-06 18:36 UTC

head link

[zfs-discuss] ZFS Performance Issue

I disabled file prefetch and there was no effect.

Here are some performance numbers.  Note that, when the application server used
a ZFS file system to save its data, the transaction took TWICE as long.  For
some reason, though, iostat is showing 5x as much disk writing (to the physical
disks) on the ZFS partition.  Can anyone see a problem here?

-----
Average application server client response time (1st run/2nd run):

SVM - 12/18 seconds
ZFS - 35/38 seconds

SVM Performance
---------------
# iostat -xnz 5
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  195.1  414.3 1465.9 1657.3  0.0  1.7    0.0    2.7   0  98 md/d100
   97.5  414.3  730.2 1657.3  0.0  1.0    0.0    1.9   0  74 md/d101
   97.7  414.1  735.8 1656.5  0.0  0.8    0.0    1.5   0  59 md/d102
   54.4  203.6  370.7  814.2  0.0  0.5    0.0    2.1   0  42 c0t2d0
   52.8  210.6  359.5  842.2  0.0  0.5    0.0    1.9   0  40 c0t3d0
   54.0  203.6  374.7  814.2  0.0  0.3    0.0    1.2   0  26 c0t4d0
   52.2  210.6  361.1  842.2  0.0  0.5    0.0    1.8   0  38 c0t5d0

ZFS Performance
---------------
# iostat -xnz 5
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   23.2  148.8 1496.7 3806.8  0.0  2.5    0.0   14.7   0  21 c0t2d0
   22.8  148.8 1470.9 3806.8  0.0  2.4    0.0   13.9   0  22 c0t3d0
   24.2  149.0 1561.1 3805.0  0.0  1.5    0.0    8.6   0  18 c0t4d0
   23.4  149.4 1509.6 3805.0  0.0  2.5    0.0   14.7   0  25 c0t5d0

# zpool iostat 5
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool1       5.69G   266G     12    243   775K  7.20M
pool1       5.69G   266G     88    232  5.53M  7.12M
pool1       5.69G   266G     78    216  4.87M  6.81M
 
 
This message posted from opensolaris.org

Will Murnane

2008-Feb-06 19:53 UTC

head link

[zfs-discuss] ZFS Performance Issue

On Feb 6, 2008 6:36 PM, William Fretts-Saxton
<william.fretts.saxton at sun.com> wrote:> Here are some performance numbers.  Note that, when the
> application server used a ZFS file system to save its data, the
> transaction took TWICE as long.  For some reason, though, iostat is
> showing 5x as much disk writing (to the physical disks) on the ZFS
> partition.  Can anyone see a problem here?What is the disk layout of the zpool in question?  Striped?  Mirrored?
 Raidz?  I would suggest either a simple stripe or striping+mirroring
as the best-performing layout.

William Fretts-Saxton

2008-Feb-06 20:00 UTC

head link

[zfs-discuss] ZFS Performance Issue

It is a striped/mirror:

 # zpool status
        NAME        STATE     READ WRITE CKSUM
        pool1       ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
 
 
This message posted from opensolaris.org

Vincent Fox

2008-Feb-06 20:53 UTC

head link

[zfs-discuss] ZFS Performance Issue

Solaris 10u4 eh?

Sounds a lot like fsync issues we want into, trying to run Cyrus mail-server
spools in ZFS.

This was highlighted for us by the filebench software varmail test.

OpenSolaris nv78 however worked very well.
 
 
This message posted from opensolaris.org

Marc Bevand

2008-Feb-06 21:28 UTC

head link

[zfs-discuss] ZFS Performance Issue

William Fretts-Saxton <william.fretts.saxton <at> sun.com>
writes:> 
> I disabled file prefetch and there was no effect.
> 
> Here are some performance numbers.  Note that, when the application server
> used a ZFS file system to save its data, the transaction took TWICE as
long.
> For some reason, though, iostat is showing 5x as much disk
> writing (to the physical disks) on the ZFS partition.  Can anyone see a
> problem here?
Possible explanation: the Glassfish applications are using synchronous
writes, causing the ZIL (ZFS Intent Log) to be intensively used, which
leads to a lot of extra I/O. Try to disable it:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29

Since disabling it is not recommended, if you find out it is the cause of your
perf problems, you should instead try to use a SLOG (separate intent log, see
above link). Unfortunately your OS version (Solaris 10 8/07) doesn''t
support
SLOGs, they have only been added to OpenSolaris build snv_68:

http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

-marc

Neil Perrin

2008-Feb-06 21:50 UTC

head link

[zfs-discuss] ZFS Performance Issue

Marc Bevand wrote:> William Fretts-Saxton <william.fretts.saxton <at> sun.com>
writes:
>   
>> I disabled file prefetch and there was no effect.
>>
>> Here are some performance numbers.  Note that, when the application
server
>> used a ZFS file system to save its data, the transaction took TWICE as
long.
>> For some reason, though, iostat is showing 5x as much disk
>> writing (to the physical disks) on the ZFS partition.  Can anyone see a
>> problem here?
>>     
>
> Possible explanation: the Glassfish applications are using synchronous
> writes, causing the ZIL (ZFS Intent Log) to be intensively used, which
> leads to a lot of extra I/O.
The ZIL doesn''t do a lot of extra IO. It usually just does one write
per
synchronous request and will batch
up multiple writes into the same log block if possible. However, it does 
need to wait for the
writes to be on stable storage before returning to the application, 
which is what the application has
requested. It does this by waiting for the write to complete and then 
flushing the disk write cache.
If the write cache is battery backed for all zpool devices then the 
global zfs_nocacheflush can be set
to give dramatically better performance.>  Try to disable it:
>
>
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29
>
> Since disabling it is not recommended, if you find out it is the cause of
your
> perf problems, you should instead try to use a SLOG (separate intent log,
see
> above link). Unfortunately your OS version (Solaris 10 8/07)
doesn''t support
> SLOGs, they have only been added to OpenSolaris build snv_68:
>
> http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
>
> -marc
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Marion Hakanson

2008-Feb-06 22:10 UTC

head link

[zfs-discuss] ZFS Performance Issue

william.fretts.saxton at sun.com said:> Here are some performance numbers.  Note that, when the application server
> used a ZFS file system to save its data, the transaction took TWICE as
long.
> For some reason, though, iostat is showing 5x as much disk writing (to the
> physical disks) on the ZFS partition.  Can anyone see a problem here? 
I''m not familiar with the application in use here, but your iostat
numbers
remind me of something I saw during "small overwrite" tests on ZFS. 
Even
though the test was doing only writing, because it was writing over only a
small part of existing blocks, ZFS had to read (the unchanged part of) each
old block in before writing out the changed block to a new location (COW).

This is a case where you want to set the ZFS recordsize to match your
application''s typical write size, in order to avoid the read overhead
inherent in partial-block updates.  UFS by default has a smaller max
blocksize than ZFS'' default 128k, so in addition to the ZIL/fsync issue
UFS will also suffer less overhead from such partial-block updates.

Again, this may not be what''s going on, but it''s worth
checking if you
haven''t already done so.

Regards,

Marion

Marc Bevand

2008-Feb-07 04:08 UTC

head link

[zfs-discuss] ZFS Performance Issue

Neil Perrin <Neil.Perrin <at> Sun.COM>
writes:> 
> The ZIL doesn''t do a lot of extra IO. It usually just does one
write per
> synchronous request and will batch up multiple writes into the same log
> block if possible.
Ok. I was wrong then. Well, William, I think Marion Hakanson has the
most plausible explanation. As he suggests, experiment with "zfs set
recordsize=XXX" to force the filesystem to use small records. See
the zfs(1) manpage.

-marc

William Fretts-Saxton

2008-Feb-07 13:41 UTC

head link

[zfs-discuss] ZFS Performance Issue

Unfortunately, I don''t know the record size of the writes.  Is it as
simple as looking @ the size of a file, before and after a client request, and
noting the difference in size?  This is binary data, so I don''t know if
that makes a difference, but the average write size is a lot smaller than the
file size.

Should the recordsize be in place BEFORE data is written to the file system, or
can it be changed after the fact?  I might try a bunch of different settings for
trial and error.

The I/O is actually done by RRD4J, which is a round-robin database library.  It
is a Java version of ''rrdtool'' which saves data into a binary
format, but also "cleans up" the data according to its age, saving
less of the older data as time goes on.
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-07 13:44 UTC

head link

[zfs-discuss] ZFS Performance Issue

I just installed nv82 so we''ll see how that goes.  I''m going
to try the recordsize idea above as well.

A note about UFS:  I was told by our local Admin guru that ZFS turns on
write-caching for disks, which is something that a UFS file system should not
have turned on, so that if I convert the ZFS f/s to a UFS one, I could be giving
the UFS performance an unrealistic "boost" to performance because it
would still have the caching on.
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-07 13:50 UTC

head link

[zfs-discuss] ZFS Performance Issue

One thing I just observed is that the initial file size is 65796 bytes.  When it
gets an update, the file size remains @ 65796.

Is there a minimum file size?
 
 
This message posted from opensolaris.org

Sanjeev Bagewadi

2008-Feb-07 14:11 UTC

head link

[zfs-discuss] ZFS Performance Issue

William,

It should be fairly easy to find the record size using DTrace. Take an 
aggregation of the
the writes happening (aggregate on size for all the write(2) system calls).

This would give fair idea of the IO size pattern.

Does RRD4J have a record size mentioned ? Usually if it is a 
database-application they have a record-size
option when the DB is created (based on my limited knowledge about DBs).

Thanks and regards,
Sanjeev.

PS : Here is a simple script which just aggregates on the write size and 
executable name :
-- snip --
#!/usr/sbin/dtrace -s


syscall::write:entry
{
        wsize = (size_t) arg2;
        @write[wsize, execname] = count();
}
-- snip --

William Fretts-Saxton wrote:> Unfortunately, I don''t know the record size of the writes.  Is it
as simple as looking @ the size of a file, before and after a client request,
and noting the difference in size?  This is binary data, so I don''t
know if that makes a difference, but the average write size is a lot smaller
than the file size.
>
> Should the recordsize be in place BEFORE data is written to the file
system, or can it be changed after the fact?  I might try a bunch of different
settings for trial and error.
>
> The I/O is actually done by RRD4J, which is a round-robin database library.
It is a Java version of ''rrdtool'' which saves data into a
binary format, but also "cleans up" the data according to its age,
saving less of the older data as time goes on.
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:    x27521 +91 80 669 27521

William Fretts-Saxton

2008-Feb-07 15:39 UTC

head link

[zfs-discuss] ZFS Performance Issue

RRD4J isn''t a DB, per se, so it doesn''t really have a
"record" size.  In fact, I don''t even know if, when data is
written to the binary, whether it is contiguous or not so the amount written may
not directly correlate to a proper record-size.

I did run your command and found the size patterns you were talking about:

              462  java    409
             3320  java    409
             6819  java    409
                5  java   1227
                1  java   1692
               16  java   3243

"409" is the number of clients I tested, so I assume it means the
largest write it makes is "6819".  Is that bits or bytes?

Does that mean I should try setting my recordsize equal to the lowest multiple
of 512 GREATER than 6819? (14 x 512 = 7168)
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-07 15:41 UTC

head link

[zfs-discuss] ZFS Performance Issue

Slight correction.  ''recsize'' must be a power of 2 so it would
be 8192.
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-07 19:46 UTC

head link

[zfs-discuss] ZFS Performance Issue

To avoid making multiple posts, I''ll just write everything here:

-Moving to nv_82 did not seem to do anything, so I doesn''t look like
fsync was the issue.
-Disabling ZIL didn''t do anything either
-Still playing with ''recsize'' values but it doesn''t
seem to be doing much...I don''t think I have a good understand of what
exactly is being written...I think the whole file might be overwritten each time
because it''s in binary format.
-Setting zfs_nocacheflush, though got me drastically increased
throughput--client requests took, on average, less than 2 seconds each!

So, in order to use this, I should have a storage array, w/battery backup,
instead of using the internal drives, correct?  I have the option of using a
6120 or 6140 array on this system so I might just try that out.
 
 
This message posted from opensolaris.org

johansen at Sun.COM

2008-Feb-08 00:28 UTC

head link

[zfs-discuss] ZFS Performance Issue

> -Still playing with ''recsize'' values but it
doesn''t seem to be doing
> much...I don''t think I have a good understand of what exactly is
being
> written...I think the whole file might be overwritten each time
> because it''s in binary format.
The other thing to keep in mind is that the tunables like compression
and recsize only affect newly written blocks.  If you have a bunch of
data that was already laid down on disk and then you change the tunable,
this will only cause new blocks to have the new size.  If you experiment
with this, make sure all of your data has the same blocksize by copying
it over to the new pool once you''ve changed the properties.
> -Setting zfs_nocacheflush, though got me drastically increased
> throughput--client requests took, on average, less than 2 seconds
> each!
> 
> So, in order to use this, I should have a storage array, w/battery
> backup, instead of using the internal drives, correct?
zfs_nocacheflush should only be used on arrays with a battery backed
cache.  If you use this option on a disk, and you lose power, there''s
no
guarantee that your write successfully made it out of the cache.

A performance problem when flushing the cache of an individual disk
implies that there''s something wrong with the disk or its firmware. 
You
can disable the write cache of an individual disk using format(1M).  When you
do this, ZFS won''t lose any data, whereas enabling zfs_nocacheflush can
lead to problems.

I''m attaching a DTrace script that will show the cache-flush times
per-vdev.  Remove the zfs_nocacheflush tuneable and re-run your test
while using this DTrace script.  If one particular disk takes longer
than the rest to flush, this should show us.  In that case, we can
disable the write cache on that particular disk.  Otherwise, we''ll need
to disable the write cache on all of the disks.

The script is attached as zfs_flushtime.d

Use format(1M) with the -e option to adjust the write_cache settings for
SCSI disks.

-j
-------------- next part --------------
#!/usr/sbin/dtrace -Cs
/*
 * CDDL HEADER START
 *
 * The contents of this file are subject to the terms of the
 * Common Development and Distribution License (the "License").
 * You may not use this file except in compliance with the License.
 *
 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 * or http://www.opensolaris.org/os/licensing.
 * See the License for the specific language governing permissions
 * and limitations under the License.
 *
 * When distributing Covered Code, include this CDDL HEADER in each
 * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 * If applicable, add the following below this CDDL HEADER, with the
 * fields enclosed by brackets "[]" replaced with your own identifying
 * information: Portions Copyright [yyyy] [name of copyright owner]
 *
 * CDDL HEADER END
 */

/*
 * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
 * Use is subject to license terms.
 */

#define	DKIOC			(0x04 << 8)
#define	DKIOCFLUSHWRITECACHE	(DKIOC|34)

fbt:zfs:vdev_disk_io_start:entry
/(args[0]->io_cmd == DKIOCFLUSHWRITECACHE) && (self->traced == 0)/
{
	self->traced = args[0];
	self->start = timestamp;
}

fbt:zfs:vdev_disk_ioctl_done:entry
/args[0] == self->traced/
{
	@a[stringof(self->traced->io_vd->vdev_path)] 		quantize(timestamp -
self->start);
	self->start = 0;
	self->traced = 0;
}

Vincent Fox

2008-Feb-08 03:25 UTC

head link

[zfs-discuss] ZFS Performance Issue

> -Setting zfs_nocacheflush, though got me drastically
> increased throughput--client requests took, on
> average, less than 2 seconds each!
> 
> So, in order to use this, I should have a storage
> array, w/battery backup, instead of using the
> internal drives, correct?  I have the option of using
> a 6120 or 6140 array on this system so I might just
> try that out.
We use 3510 and 2540 arrays for Cyrus mail-stores which hold about 10K accounts
each.  Recommend going with dual-controllers though for safety.  Our setups are
really simple.  Put 2 array units on the SAN, make a pair or RAID-5 LUNs.  Then
RAID-10 these LUNs together in ZFS.
 
 
This message posted from opensolaris.org

Daniel Cheng

2008-Feb-08 07:12 UTC

head link

[zfs-discuss] ZFS Performance Issue

William Fretts-Saxton wrote:> Unfortunately, I don''t know the record size of the writes.  Is it
as simple as looking @ the size of a file, before and after a client request,
and noting the difference in size?  This is binary data, so I don''t
know if that makes a difference, but the average write size is a lot smaller
than the file size.
> 
> Should the recordsize be in place BEFORE data is written to the file
system, or can it be changed after the fact?  I might try a bunch of different
settings for trial and error.
> 
> The I/O is actually done by RRD4J, which is a round-robin database library.
It is a Java version of ''rrdtool'' which saves data into a
binary format, but also "cleans up" the data according to its age,
saving less of the older data as time goes on.
>  
You should tune that in application level, see
https://rrd4j.dev.java.net/ down in "performance issue" section.

Try the "NIO" backend and use smaller (2048?)  record size...

-- 
This space was intended to be left blank.

William Fretts-Saxton

2008-Feb-08 15:13 UTC

head link

[zfs-discuss] ZFS Performance Issue

> The other thing to keep in mind is that the tunables
> like compression
> and recsize only affect newly written blocks.  If you
> have a bunch of
> data that was already laid down on disk and then you
> change the tunable,
> this will only cause new blocks to have the new size.
>  If you experiment
> ith this, make sure all of your data has the same
> blocksize by copying
> it over to the new pool once you''ve changed the
> properties.
Is deleting the old files/directories in the ZFS file system sufficient or do I
need to destroy/recreate the pool and/or file system itself?  I''ve been
doing the former.

I will use your dtrace script today and get back to you.  Thanks for that.
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-08 15:15 UTC

head link

[zfs-discuss] ZFS Performance Issue

We are going to get a 6120 for this temporarily.  If all goes well, we are going
to move to a 6140 SAN solution.
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-08 15:21 UTC

head link

[zfs-discuss] ZFS Performance Issue

Hi Daniel.  I take it you are an RRD4J user?

I didn''t see anything in the "performance issues" area that
would help.  Please let me know if I''m missing something:

- The default of RRD4J is to use NIO backend, so that is already in place.

- Pooling won''t help because there is almost never a time when an RRD
file will be accessed simultaneously.

- I''m using trial and error when it comes to the recsize right now, so
I''ll post back with my results.  Right now, it looks like a higher
recsize is better (16k better performance than 8k, etc) which is strange, but
I''m not done yet.
 
 
This message posted from opensolaris.org

Henk Langeveld

2008-Feb-09 17:24 UTC

head link

[zfs-discuss] ZFS Performance Issue

William Fretts-Saxton wrote:> Unfortunately, I don''t know the record size of the writes.  Is it
as
> simple as looking @ the size of a file, before and after a client
> request, and noting the difference in size?
and
> The I/O is actually done by RRD4J, [...] a Java version of
''rrdtool''
If it behaves like rrdtool, it will limit the size of the file, by
consolidating older data.  After every n samples, older data will be
replaced by an aggregate, freeing space for new samples.  To me that
implies random I/O.  You really need a tool like dtrace (or old
fashioned truss) to see the sample rate and size.

Cheers,
Henk

Robert Milkowski

2008-Feb-10 12:01 UTC

head link

[zfs-discuss] ZFS Performance Issue

Hello William,

Thursday, February 7, 2008, 7:46:51 PM, you wrote:

WFS> -Setting zfs_nocacheflush, though got me drastically increased
WFS> throughput--client requests took, on average, less than 2 seconds each!

That''s interesting - a bug in scsi driver for v40z?


-- 
Best regards,
 Robert                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Johan Hartzenberg

2008-Feb-10 20:18 UTC

head link

[zfs-discuss] ZFS Performance Issue

On Feb 5, 2008 9:52 PM, William Fretts-Saxton <william.fretts.saxton at
sun.com>
wrote:
> This may not be a ZFS issue, so please bear with me!
>
> I have 4 internal drives that I have striped/mirrored with ZFS and have an
> application server which is reading/writing to hundreds of thousands of
> files on it, thousands of files @ a time.
>
> If 1 client uses the app server, the transaction (reading/writing to ~80
> files) takes about 200 ms.  If I have about 80 clients attempting it @
once,
> it can sometimes take a minute or more.  I''m pretty sure its a
file I/O
> bottleneck so I want to make sure ZFS is tuned properly for this kind of
> usage.
>
> The only thing I could think of, so far, is to turn off ZFS compression.
>  Is there anything else I can do?  Here is my "zpool iostat"
output:
>
>Hi William

To improve performance, consider turning off atime, assuming you don''t
need
it...

# zfs set atime=off POOL/filesystem

  _J
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080210/8d45b0e5/attachment.html>

William Fretts-Saxton

2008-Feb-11 17:00 UTC

head link

[zfs-discuss] ZFS Performance Issue

It does.  The file size is limited to the original creation size, which is 65k
for files with 1 data sample.

Unfortunately, I have zero experience with dtrace and only a little with truss. 
I''m relying on the dtrace scripts from people on this thread to get by
for now!
 
 
This message posted from opensolaris.org

William Fretts-Saxton

2008-Feb-11 17:27 UTC

head link

[zfs-discuss] ZFS Performance Issue

I ran this dtrace script and got no output.  Any ideas?
 
 
This message posted from opensolaris.org

johansen at sun.com

2008-Feb-11 18:45 UTC

head link

[zfs-discuss] ZFS Performance Issue

> Is deleting the old files/directories in the ZFS file system
> sufficient or do I need to destroy/recreate the pool and/or file
> system itself?  I''ve been doing the former.
The former should be sufficient, it''s not necessary to destroy the
pool.

-j

William Fretts-Saxton

2008-Feb-13 17:35 UTC

head link

[zfs-discuss] ZFS Performance Issue

After working with Sanjeev, and putting in a bunch of timing statement
throughout the code, it turns out that file writes ARE NOT the bottleneck, as
would be assumed.

It is actually reading the file into a byte buffer that is the culprit. 
Specifically, this java command:

byteBuffer = file.getChannel().map(mapMode, 0, length);

I''m going to try to apply the some of the same things I tried here with
troubleshooting the writes to the reads now.  If anyone has any different
advice, please let me know.

Thanks for all the help so far.
 
 
This message posted from opensolaris.org

Reasonably Related Threads

Search for more apparently analagous threads

zfs discuss - Feb 2008 - ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

[zfs-discuss] ZFS Performance Issue

Reasonably Related Threads