thr3ads.net - zfs discuss - [zfs-discuss] Heavy write IO for no apparent reason [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Peter Wood

2013-Jan-17 03:25 UTC

[zfs-discuss] Heavy write IO for no apparent reason

I''m running OpenIndiana 151a5 on two identical Supermicro boxes
(hardware
details at the bottom of the message). Pretty clean installation. No extra
services running. Main purpose is to be storage servers and nothing else.
No dedup.

Today I started migrating file systems from some old Open Solaris servers
to these Supermicro boxes and noticed the transfer to one of them was going
10x slower then to the other one (like 10GB/hour).
I was using zfs send/receive over ssh.

This server is doing pretty much nothing. Exporting 3 file systems via NFS
and only one of them is somewhat in use.

Running zpool iostat -v (attachment zpool-IOStat.png) shows 1,22K write
operations on the drives and 661 on the ZIL. Compare to the other server
(who is in way heavier use then this one) these numbers are extremely high.

Any idea how to debug any further?

Supermicro SC847
MB X9DRH-iF
Xeon E5-2620 2GHz 6-Core
LSI SAS9211-8i HBA
32GB RAM
Storage pool drives: 3TB Hitachi SAS 7.2K
ZIL mirror and L2ARC: 80GB Intel SSD SATA 3Gb/s

Thank you,

-- Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130116/c8d533d9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zpool-IOStat.png
Type: image/png
Size: 72714 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130116/c8d533d9/attachment-0001.png>

Bob Friesenhahn

2013-Jan-17 03:49 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On Wed, 16 Jan 2013, Peter Wood wrote:> 
> Running zpool iostat -v (attachment zpool-IOStat.png) shows 1,22K write
operations on the drives and 661 on the
> ZIL. Compare to the other server (who is in way heavier use then this one)
these numbers are extremely high.
> 
> Any idea how to debug any further?
Do some filesystems contain many snapshots?  Do some filesystems use 
small zfs block sizes.  Have the servers been used the same?

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Ray Arachelian

2013-Jan-17 16:55 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On 01/16/2013 10:25 PM, Peter Wood wrote:>
> Today I started migrating file systems from some old Open Solaris
> servers to these Supermicro boxes and noticed the transfer to one of
> them was going 10x slower then to the other one (like 10GB/hour).
What does "dladm show-link" show?  I''m guessing one of your
links is at
100mbps or at half duplex.

Peter Wood

2013-Jan-17 23:33 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

I have a script that rotates hourly, daily and monthly snapshots. Each
filesystem has about 40 snapshots (zfsList.png - output of ''zfs list |
grep
-v home/'' -  the home directories datasets are snipped from the output.
4
users in total.)

I noticed that the hourly snapshots on the heaviest filesystem in use are
about 1.2GB in size where on the other system the regular NFS exported
filesystem has about 60MB snapshots (gallerySnapshots.png - output of
command ''zfs list -t snapshot -r pool01/utils/gallery'')

I know that the gallery FS is in heavier use then normal but I was told it
will be mostly reading and based on the iostat seems that there is heavy
writing too.

I guess I''ll schedule some downtime and disable gallery export and see
if
that will effect the number of write operations and performance in general.

Unless there is some other way to test what/where these write operations
are applied.

The ''zpool iostat -v'' output is uncomfortably static. The
values of
read/write operations and bandwidth are the same for hours and even days.
I''d expect at least some variations between morning and night. The load
on
the servers is different for sure. Any input?

Thanks,

-- Peter

On Wed, Jan 16, 2013 at 7:49 PM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Wed, 16 Jan 2013, Peter Wood wrote:
>
>>
>> Running zpool iostat -v (attachment zpool-IOStat.png) shows 1,22K write
>> operations on the drives and 661 on the
>> ZIL. Compare to the other server (who is in way heavier use then this
>> one) these numbers are extremely high.
>>
>> Any idea how to debug any further?
>>
>
> Do some filesystems contain many snapshots?  Do some filesystems use small
> zfs block sizes.  Have the servers been used the same?
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/**
> users/bfriesen/ <http://www.simplesystems.org/users/bfriesen/>
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130117/544d25db/attachment.html>

Bob Friesenhahn

2013-Jan-17 23:42 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On Thu, 17 Jan 2013, Peter Wood wrote:> 
> Unless there is some other way to test what/where these write operations
are applied.
You can install Brendan Gregg''s DTraceToolkit and use it to find out 
who and what is doing all the writing.  1.2GB in an hour is quite a 
lot of writing.  If this is going continuously, then it may be causing 
more fragmentation in conjunction with your snapshots.

See "http://www.brendangregg.com/dtrace.html".

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Peter Wood

2013-Jan-17 23:51 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

Thanks Bob. Will do.

I forgot the attachments. Here they are with hope to bring more clarity to
the problem.


On Thu, Jan 17, 2013 at 3:42 PM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Thu, 17 Jan 2013, Peter Wood wrote:
>
>>
>> Unless there is some other way to test what/where these write
operations
>> are applied.
>>
>
> You can install Brendan Gregg''s DTraceToolkit and use it to find
out who
> and what is doing all the writing.  1.2GB in an hour is quite a lot of
> writing.  If this is going continuously, then it may be causing more
> fragmentation in conjunction with your snapshots.
>
> See
"http://www.brendangregg.com/**dtrace.html<http://www.brendangregg.com/dtrace.html>
> ".
>
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/**
> users/bfriesen/ <http://www.simplesystems.org/users/bfriesen/>
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130117/ca2873f9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zfsList.png
Type: image/png
Size: 49894 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130117/ca2873f9/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gallerySnapshots.png
Type: image/png
Size: 63559 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130117/ca2873f9/attachment-0003.png>

Timothy Coalson

2013-Jan-18 00:18 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On Thu, Jan 17, 2013 at 5:33 PM, Peter Wood <peterwood.sd at gmail.com>
wrote:
>
> The ''zpool iostat -v'' output is uncomfortably static. The
values of
> read/write operations and bandwidth are the same for hours and even days.
> I''d expect at least some variations between morning and night. The
load on
> the servers is different for sure. Any input?
>
>Without a repetition time parameter, zpool iostat will print exactly once
and exit, and the output is an average from kernel boot to "now", just
like
iostat, this is why it seems so static.  If you want to know the activity
over 5 second intervals, use something like "zpool iostat -v 5"
(repeat
every 5 seconds) and wait for the second and later blocks.  The second and
later blocks are average from previous output until "now".  I
generally use
5 second intervals to match the 5 second commit interval on my pools.

Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130117/4fd8fdad/attachment.html>

Jim Klimov

2013-Jan-18 00:34 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On 2013-01-18 00:42, Bob Friesenhahn wrote:> You can install Brendan Gregg''s DTraceToolkit and use it to find
out who
> and what is doing all the writing.  1.2GB in an hour is quite a lot of
> writing.  If this is going continuously, then it may be causing more
> fragmentation in conjunction with your snapshots.
As a moderately wild guess, since you''re speaking of galleries,
are these problematic filesystems often-read? By default ZFS
updates the last access-time of files it reads, as do many other
filesystems, and this causes avalanches of metadata updates -
sync writes (likely) as well as fragmentation. This may also
be a poorly traceable but considerable "used" space in frequent
snapshots. You can verify (and unset) this behaviour with the
ZFS FS dataset property "atime", i.e.:

# zfs get atime pond/export/home
NAME              PROPERTY  VALUE  SOURCE
pond/export/home  atime     off    inherited from pond

On another hand, verify where your software keeps the temporary
files (i.e. during uploads as may be with galleries). Again, if
this is a frequently snapshotted dataset (though 1 hour is not
really that frequent) then needless temp files can be held by
those older snapshots. Moving such temporary works to a different
dataset with a different snapshot schedule and/or to a different
pool (to keep related fragmentation constrained) may prove useful.

HTH,
//Jim Klimov

Peter Blajev

2013-Jan-18 00:48 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

Right on Tim. Thanks. I didn''t know that. I''m sure
it''s documented
somewhere and I should have read it so double thanks for explaining it.


--
Peter Blajev
IT Manager, TAAZ Inc.
Office: 858-597-0512 x125


On Thu, Jan 17, 2013 at 4:18 PM, Timothy Coalson <tsc5yc at mst.edu>
wrote:
> On Thu, Jan 17, 2013 at 5:33 PM, Peter Wood <peterwood.sd at
gmail.com>wrote:
>
>>
>> The ''zpool iostat -v'' output is uncomfortably static.
The values of
>> read/write operations and bandwidth are the same for hours and even
days.
>> I''d expect at least some variations between morning and night.
The load on
>> the servers is different for sure. Any input?
>>
>>
> Without a repetition time parameter, zpool iostat will print exactly once
> and exit, and the output is an average from kernel boot to "now",
just like
> iostat, this is why it seems so static.  If you want to know the activity
> over 5 second intervals, use something like "zpool iostat -v 5"
(repeat
> every 5 seconds) and wait for the second and later blocks.  The second and
> later blocks are average from previous output until "now".  I
generally use
> 5 second intervals to match the 5 second commit interval on my pools.
>
> Tim
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130117/1f165130/attachment.html>

Peter Wood

2013-Jan-18 00:53 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

Right on Tim. Thanks. I didn''t know that. I''m sure
it''s documented
somewhere and I should have read it so double thanks for explaining it.


On Thu, Jan 17, 2013 at 4:18 PM, Timothy Coalson <tsc5yc at mst.edu>
wrote:
> On Thu, Jan 17, 2013 at 5:33 PM, Peter Wood <peterwood.sd at
gmail.com>wrote:
>
>>
>> The ''zpool iostat -v'' output is uncomfortably static.
The values of
>> read/write operations and bandwidth are the same for hours and even
days.
>> I''d expect at least some variations between morning and night.
The load on
>> the servers is different for sure. Any input?
>>
>>
> Without a repetition time parameter, zpool iostat will print exactly once
> and exit, and the output is an average from kernel boot to "now",
just like
> iostat, this is why it seems so static.  If you want to know the activity
> over 5 second intervals, use something like "zpool iostat -v 5"
(repeat
> every 5 seconds) and wait for the second and later blocks.  The second and
> later blocks are average from previous output until "now".  I
generally use
> 5 second intervals to match the 5 second commit interval on my pools.
>
> Tim
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130117/ea02fbb4/attachment.html>

Peter Wood

2013-Jan-18 01:05 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

Great points Jim. I have requested more information how the gallery share
is being used and any temporary data will be moved out of there.

About atime, it is set to "on" right now and I''ve considered
to turn it off
but I wasn''t sure if this will effect incremental zfs send/receive.

''zfs send -i snapshot0 snapshot1'' doesn''t rely on the
atime, right?


On Thu, Jan 17, 2013 at 4:34 PM, Jim Klimov <jimklimov at cos.ru> wrote:
> On 2013-01-18 00:42, Bob Friesenhahn wrote:
>
>> You can install Brendan Gregg''s DTraceToolkit and use it to
find out who
>> and what is doing all the writing.  1.2GB in an hour is quite a lot of
>> writing.  If this is going continuously, then it may be causing more
>> fragmentation in conjunction with your snapshots.
>>
>
> As a moderately wild guess, since you''re speaking of galleries,
> are these problematic filesystems often-read? By default ZFS
> updates the last access-time of files it reads, as do many other
> filesystems, and this causes avalanches of metadata updates -
> sync writes (likely) as well as fragmentation. This may also
> be a poorly traceable but considerable "used" space in frequent
> snapshots. You can verify (and unset) this behaviour with the
> ZFS FS dataset property "atime", i.e.:
>
> # zfs get atime pond/export/home
> NAME              PROPERTY  VALUE  SOURCE
> pond/export/home  atime     off    inherited from pond
>
> On another hand, verify where your software keeps the temporary
> files (i.e. during uploads as may be with galleries). Again, if
> this is a frequently snapshotted dataset (though 1 hour is not
> really that frequent) then needless temp files can be held by
> those older snapshots. Moving such temporary works to a different
> dataset with a different snapshot schedule and/or to a different
> pool (to keep related fragmentation constrained) may prove useful.
>
> HTH,
> //Jim Klimov
>
>
> ______________________________**_________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
>
http://mail.opensolaris.org/**mailman/listinfo/zfs-discuss<http://mail.opensolaris.org/mailman/listinfo/zfs-discuss>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130117/d9f20ac4/attachment.html>

Bob Friesenhahn

2013-Jan-18 02:24 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On Thu, 17 Jan 2013, Peter Wood wrote:
> Great points Jim. I have requested more information how the gallery share
is being used and any temporary data will
> be moved out of there.
> About atime, it is set to "on" right now and I''ve
considered to turn it off but I wasn''t sure if this will effect
> incremental zfs send/receive.
> 
> ''zfs send -i snapshot0 snapshot1'' doesn''t rely
on the atime, right?
Zfs send does not care about atime.  The access time is useless other 
than as a way to see how long it has been since a file was accessed.

For local access (not true for NFS), Zfs is lazy about updating atime 
on disk and so it may not be updated on disk until the next 
transaction group is written (e.g. up to 5 seconds) and so it does not 
represent much actual load.  Without this behavior, the system could 
become unusable.

For NFS you should disable atime on the NFS client mounts.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2013-Jan-18 02:39 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On Thu, 17 Jan 2013, Bob Friesenhahn wrote:>
> For NFS you should disable atime on the NFS client mounts.
This advice was wrong.  It needs to be done on the server side.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Freddie Cash

2013-Jan-18 22:55 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On Thu, Jan 17, 2013 at 4:48 PM, Peter Blajev <pblajev at taaz.com> wrote:
> Right on Tim. Thanks. I didn''t know that. I''m sure
it''s documented
> somewhere and I should have read it so double thanks for explaining it.
>
When in doubt, always check the man page first:
    man zpool

It''s listed in the section on the "iostat" sub-command:
     zpool iostat [-T d|u] [-v] [pool] ... [interval [count]]

         Displays I/O statistics for the given pools. When given an
interval,
         the statistics are printed every interval seconds until Ctrl-C is
         pressed. If no pools are specified, statistics for every pool in
the
         system is shown. If count is specified, the command exits after
count
         reports are printed.

:D

-- 
Freddie Cash
fjwcash at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130118/f0c868c2/attachment.html>

Timothy Coalson

2013-Jan-18 23:05 UTC

head link

[zfs-discuss] Heavy write IO for no apparent reason

On Fri, Jan 18, 2013 at 4:55 PM, Freddie Cash <fjwcash at gmail.com>
wrote:
> On Thu, Jan 17, 2013 at 4:48 PM, Peter Blajev <pblajev at taaz.com>
wrote:
>
>> Right on Tim. Thanks. I didn''t know that. I''m sure
it''s documented
>> somewhere and I should have read it so double thanks for explaining it.
>>
>
> When in doubt, always check the man page first:
>     man zpool
>
> It''s listed in the section on the "iostat" sub-command:
>      zpool iostat [-T d|u] [-v] [pool] ... [interval [count]]
>
>          Displays I/O statistics for the given pools. When given an
> interval,
>          the statistics are printed every interval seconds until Ctrl-C is
>          pressed. If no pools are specified, statistics for every pool in
> the
>          system is shown. If count is specified, the command exits after
> count
>          reports are printed.
>
To my eye, that doesn''t actually explain what the output is, only how
to
get it to repeat.  It seems to assume that one is familiar with iostat, and
expects this to work the same way.  So, can''t really fault someone for
being confused by the output in this case (perhaps the manpage could use
some clarification).

Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20130118/ce0c62ae/attachment.html>

zfs discuss - Jan 2013 - Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason

[zfs-discuss] Heavy write IO for no apparent reason