thr3ads.net - zfs discuss - [zfs-discuss] Expected ZFS behavior? [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Richard Bruce

2009-Dec-07 17:22 UTC

[zfs-discuss] Expected ZFS behavior?

Hi all,

First, kudos to all the ZFS folks for a killer technology.  We use several Sun
7000 series boxes at work and love the features.

I recently decided to build an Opensolaris server for home.  I just put the box
together over the weekend.  It is using an LSI 1068E based HBA (Supermicro FWIW)
and 8 2TB WD drives in a single raidz2 pool.  It is a clean install of snv_128a
with the only changes from vanilla being to install the CIFS server packages and
create and share a CIFS share.

I started copying over all the data from my existing workstation.  When copying
files (mostly multi-gigabyte DV video files), network throughput drops to zero
for ~1/2 second every 8-15 seconds.  This throughput drop corresponds to drive
activity on the Opensolaris box.  The ZFS pool drives show no activity except
every 8-15 seconds.  As best as I can guess, the Opensolaris box is caching
traffic and batching it to disk every so often.  I guess I didn''t
expect disk writes to interrupt network traffic.  Is this correct?

One other item to note, the pool is currently degraded as one of the drives was
apparently damaged during shipping and died almost immediately after I created
my pool.  I completely removed this drive to RMA it.

I''d be happy to provide any info needed.

Thanks in advance.

Richard Bruce
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Dec-07 17:40 UTC

head link

[zfs-discuss] Expected ZFS behavior?

On Mon, 7 Dec 2009, Richard Bruce wrote:
> I started copying over all the data from my existing workstation. 
> When copying files (mostly multi-gigabyte DV video files), network 
> throughput drops to zero for ~1/2 second every 8-15 seconds.  This 
> throughput drop corresponds to drive activity on the Opensolaris 
> box.  The ZFS pool drives show no activity except every 8-15 
> seconds.  As best as I can guess, the Opensolaris box is caching 
> traffic and batching it to disk every so often.  I guess I didn''t 
> expect disk writes to interrupt network traffic.  Is this correct?
This is expected behavior.  From what has been posted here, these are 
the current buffering rules:

   up to 7/8ths of available memory
   up to 5 seconds worth of 100% write I/O time
   up to 30 seconds without a write

and if you don''t like it, you can use the zfs:zfs_arc_max tunable in 
/etc/system to set a maximum amount of memory to be used prior to a 
write.  This may be useful on systems with a large amount of memory 
and which want to limit the maximum delay time due to committing the 
zfs transation group.  There will still be interruptions, but the 
interruptions can be made briefer (and more often).

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2009-Dec-07 18:24 UTC

head link

[zfs-discuss] Expected ZFS behavior?

On Mon, 7 Dec 2009, Bob Friesenhahn wrote:> and if you don''t like it, you can use the zfs:zfs_arc_max tunable
in
> /etc/system to set a maximum amount of memory to be used prior to a write.
Oops.  Bad cut-n-paste. That should have been

   zfs:zfs_write_limit_override

So I am currently using

* Set ZFS maximum TXG group size to 3932160000
set zfs:zfs_write_limit_override = 0xea600000

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Eugene Vilensky

2009-Dec-07 20:38 UTC

head link

[zfs-discuss] Expected ZFS behavior?

>> and if you don''t like it, you can use the zfs:zfs_arc_max
tunable in
>> /etc/system to set a maximum amount of memory to be used prior to a
write.
>
> Oops. ?Bad cut-n-paste. That should have been
>
> ?zfs:zfs_write_limit_override
>
> So I am currently using
>
> * Set ZFS maximum TXG group size to 3932160000
> set zfs:zfs_write_limit_override = 0xea600000
I have a DAS array with nvram so I enabled zfs_nocacheflush = 1 and it
made a world of difference in performance.  Does the LSI HBA have any
nvram to make this tuning acceptable?  Is this setting acceptable as I
understood the Evil Tuning Guide?

Richard Bruce

2009-Dec-08 19:51 UTC

head link

[zfs-discuss] Expected ZFS behavior?

Bob,

Thanks for your help.  I thought that I might have seen something about this in
the past but couldn''t remember for sure.  Thanks for pointing me in the
right direction.
>From the URL below, it states that each TXG will be limited to 1/8th of the
physical memory (this differs from the 7/8ths of available memory you
referenced).
http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle

In my current config that would yield a TXG size of 512MB (4GB of system RAM/8)
and makes sense for ~1 second write times.  At the moment this is not causing
any real issues; I just hadn''t expected the system to pause network
traffic while committing data to disk.

Is there a tunable to bump up the 1-tick delay that is enforced on writing
threads when we trigger the first threshold (7/8ths of TXG commit threshold as
per above URL) or change the threshold at which we start enforcing tick delays? 
This seems to me it would help to smooth out I/O better than just limited the
TXG size.

Richard
-- 
This message posted from opensolaris.org

Ragnar Sundblad

2009-Dec-09 20:18 UTC

head link

[zfs-discuss] Expected ZFS behavior?

On 7 dec 2009, at 18.40, Bob Friesenhahn wrote:
> On Mon, 7 Dec 2009, Richard Bruce wrote:
> 
>> I started copying over all the data from my existing workstation. When
copying files (mostly multi-gigabyte DV video files), network throughput drops
to zero for ~1/2 second every 8-15 seconds.  This throughput drop corresponds to
drive activity on the Opensolaris box.  The ZFS pool drives show no activity
except every 8-15 seconds.  As best as I can guess, the Opensolaris box is
caching traffic and batching it to disk every so often.  I guess I
didn''t expect disk writes to interrupt network traffic.  Is this
correct?
> 
> This is expected behavior.  From what has been posted here, these are the
current buffering rules:
Is it really?

Shouldn''t it start on the next txg and while the previous txg commits,
and just continue writing?

/ragge

Bob Friesenhahn

2009-Dec-09 20:58 UTC

head link

[zfs-discuss] Expected ZFS behavior?

On Wed, 9 Dec 2009, Ragnar Sundblad wrote:>>
>> This is expected behavior.  From what has been posted here, these 
>> are the current buffering rules:
>
> Is it really?
>
> Shouldn''t it start on the next txg and while the previous txg
commits,
> and just continue writing?
The pause is clearly not during the entire TXG commit.  The TXG commit 
could take up to five seconds to complete.  Perhaps the pause occurs 
only during the start of the commit, or perhaps it is at the end, or 
perhaps it is because the next TXG has already become 100% full while 
waiting for the current TXG to commit, and zfs is not willing to 
endanger more than one TXG worth of data so it pauses?

To my recollection, none of the zfs developers have been interested in 
discussing the cause of the pause, although they are clearly 
interested in maximizing performance.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

zfs discuss - Dec 2009 - Expected ZFS behavior?

[zfs-discuss] Expected ZFS behavior?

[zfs-discuss] Expected ZFS behavior?

[zfs-discuss] Expected ZFS behavior?

[zfs-discuss] Expected ZFS behavior?

[zfs-discuss] Expected ZFS behavior?

[zfs-discuss] Expected ZFS behavior?

[zfs-discuss] Expected ZFS behavior?