thr3ads.net - zfs discuss - [zfs-discuss] ZFS send/recv extreme performance penalty in snv

If this information is useful, please help other people find it:
Share via:

Brent Jones

2009-Dec-12 08:59 UTC

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

I''ve noticed some extreme performance penalties simply by using snv_128

I take snapshots, and send them over SSH to another server over
Gigabit ethernet.
Before, I would get 20-30MBps, prior to snv_128 (127, and nearly all
previous builds).

However, simply image-updating to snv_128 has caused a majority of my
snapshots to do this:

receiving incremental stream of pdxfilu01/vault/01 at 20091212-01:15:00
into pdxfilu02/vault/01 at 20091212-01:15:00
received 13.8KB stream in 491 seconds (28B/sec)

De-dupe is NOT enabled on any pool, but I have upgraded to the newest
ZFS pool version, which prevents me from rolling back to snv_127,
which would send at many tens of megabytes a second.

This is on an X4540, dual quad cores, and 64GB RAM.

Anyone else seeing similar issues?

-- 
Brent Jones
brent at servuhome.net

Bob Friesenhahn

2009-Dec-12 15:55 UTC

head link

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

On Sat, 12 Dec 2009, Brent Jones wrote:
> I''ve noticed some extreme performance penalties simply by using
snv_128
Does the ''zpool scrub'' rate seem similar to before?  Do you
notice any
read performance problems?  What happens if you send to /dev/null 
rather than via ssh?

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Brent Jones

2009-Dec-12 19:39 UTC

head link

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

On Sat, Dec 12, 2009 at 7:55 AM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Sat, 12 Dec 2009, Brent Jones wrote:
>
>> I''ve noticed some extreme performance penalties simply by
using snv_128
>
> Does the ''zpool scrub'' rate seem similar to before? ?Do
you notice any read
> performance problems? ?What happens if you send to /dev/null rather than
via
> ssh?
>
> Bob
> --
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer, ? ?http://www.GraphicsMagick.org/
>
Scrubs on both systems seem to take about the same amoutn of time (16
hours, on a 48TB pool, with about 20TB used)

I''ll test to dev/null tonight

-- 
Brent Jones
brent at servuhome.net

Brent Jones

2009-Dec-13 04:14 UTC

head link

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

On Sat, Dec 12, 2009 at 11:39 AM, Brent Jones <brent at servuhome.net>
wrote:> On Sat, Dec 12, 2009 at 7:55 AM, Bob Friesenhahn
> <bfriesen at simple.dallas.tx.us> wrote:
>> On Sat, 12 Dec 2009, Brent Jones wrote:
>>
>>> I''ve noticed some extreme performance penalties simply by
using snv_128
>>
>> Does the ''zpool scrub'' rate seem similar to before?
?Do you notice any read
>> performance problems? ?What happens if you send to /dev/null rather
than via
>> ssh?
>>
>> Bob
>> --
>> Bob Friesenhahn
>> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
>> GraphicsMagick Maintainer, ? ?http://www.GraphicsMagick.org/
>>
>
> Scrubs on both systems seem to take about the same amoutn of time (16
> hours, on a 48TB pool, with about 20TB used)
>
> I''ll test to dev/null tonight
>
> --
> Brent Jones
> brent at servuhome.net
>
I tested send performance to /dev/null, and I sent a 500GB filesystem
in just a few minutes.

The two servers are linked over GigE fiber (between two cities)

Iperf output:

[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  2.06 GBytes    295 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-60.0 sec  2.38 GBytes    341 Mbits/sec

Usually a bit faster, but some other stuff goes over that pipe.

Though looking at network traffic between these two hosts during the
send, I see a lot of network traffic (about 100-150Mbit usually)
during the send. So theres traffic, but a 100MB send has taken over 10
minutes and still not complete. But given 100Mbit/sec, it should take
about 10 seconds roughly, not 10 minutes.
There is a little bit of disk activity, maybe a MB/sec on average, and
about 30 iops.
So it seems the hosts are exchanging a lot of data about the snapshot,
but not actually replicating any data for a very long time.
SSH CPU usage is minimal, just a few percent (arcfour, but tried
others, no difference)

Odd behavior to be sure, and looks very familiar to what snapshot
replication did back in build 101, before they made significant speed
improvements to snapshot replication. Wonder if this is a major
regression, due to changes in newer ZFS versions, maybe to accomodate
de-dupe?

Sadly, I can''t roll back, since I already upgraded my pool, but I may
try upgrading to 129, but my IPS doesn''t seem to recognize the newer
version yet.

-- 
Brent Jones
brent at servuhome.net

Brent Jones

2009-Dec-13 08:19 UTC

head link

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

On Sat, Dec 12, 2009 at 8:14 PM, Brent Jones <brent at servuhome.net>
wrote:> On Sat, Dec 12, 2009 at 11:39 AM, Brent Jones <brent at
servuhome.net> wrote:
>> On Sat, Dec 12, 2009 at 7:55 AM, Bob Friesenhahn
>> <bfriesen at simple.dallas.tx.us> wrote:
>>> On Sat, 12 Dec 2009, Brent Jones wrote:
>>>
>>>> I''ve noticed some extreme performance penalties simply
by using snv_128
>>>
>>> Does the ''zpool scrub'' rate seem similar to
before? ?Do you notice any read
>>> performance problems? ?What happens if you send to /dev/null rather
than via
>>> ssh?
>>>
>>> Bob
>>> --
>>> Bob Friesenhahn
>>> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
>>> GraphicsMagick Maintainer, ? ?http://www.GraphicsMagick.org/
>>>
>>
>> Scrubs on both systems seem to take about the same amoutn of time (16
>> hours, on a 48TB pool, with about 20TB used)
>>
>> I''ll test to dev/null tonight
>>
>> --
>> Brent Jones
>> brent at servuhome.net
>>
>
> I tested send performance to /dev/null, and I sent a 500GB filesystem
> in just a few minutes.
>
> The two servers are linked over GigE fiber (between two cities)
>
> Iperf output:
>
> [ ID] Interval ? ? ? Transfer ? ? Bandwidth
> [ ?5] ?0.0-60.0 sec ?2.06 GBytes ? ?295 Mbits/sec
> [ ID] Interval ? ? ? Transfer ? ? Bandwidth
> [ ?4] ?0.0-60.0 sec ?2.38 GBytes ? ?341 Mbits/sec
>
> Usually a bit faster, but some other stuff goes over that pipe.
>
>
> Though looking at network traffic between these two hosts during the
> send, I see a lot of network traffic (about 100-150Mbit usually)
> during the send. So theres traffic, but a 100MB send has taken over 10
> minutes and still not complete. But given 100Mbit/sec, it should take
> about 10 seconds roughly, not 10 minutes.
> There is a little bit of disk activity, maybe a MB/sec on average, and
> about 30 iops.
> So it seems the hosts are exchanging a lot of data about the snapshot,
> but not actually replicating any data for a very long time.
> SSH CPU usage is minimal, just a few percent (arcfour, but tried
> others, no difference)
>
> Odd behavior to be sure, and looks very familiar to what snapshot
> replication did back in build 101, before they made significant speed
> improvements to snapshot replication. Wonder if this is a major
> regression, due to changes in newer ZFS versions, maybe to accomodate
> de-dupe?
>
> Sadly, I can''t roll back, since I already upgraded my pool, but I
may
> try upgrading to 129, but my IPS doesn''t seem to recognize the
newer
> version yet.
>
>
> --
> Brent Jones
> brent at servuhome.net
>
I found some time to dig into my troubles updating to 129 (my dev
repository can no longer be called Dev, must use the opensolaris.org
name, bleh)

But at least build 129 seems to fix this. Not sure what the issue is,
but bouncing between 128 and 129, I can reproduce 100% of the time
terrible ZFS send/recv times.
Though, 129 still isnt as fast at 127, with the same datasets and
configuration, but it''s good enough for now.



-- 
Brent Jones
brent at servuhome.net

Bob Friesenhahn

2009-Dec-13 14:54 UTC

head link

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

On Sat, 12 Dec 2009, Brent Jones wrote:> There is a little bit of disk activity, maybe a MB/sec on average, and
> about 30 iops.
> So it seems the hosts are exchanging a lot of data about the snapshot,
> but not actually replicating any data for a very long time.
Note that ''zfs send'' is a one-way stream.  There is no 2-way
exchange
of data.  It seems like the performance problem you are seeing is 
related to either the network protocol stack, or SSH.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

zfs discuss - Dec 2009 - ZFS send/recv extreme performance penalty in snv_128

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128

[zfs-discuss] ZFS send/recv extreme performance penalty in snv_128