thr3ads.net - zfs discuss - [zfs-discuss] slow zfs send/recv speed [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Anatoly

2011-Nov-15 23:05 UTC

[zfs-discuss] slow zfs send/recv speed

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and 
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk 
to 100+ disks in pool. But the speed doesn''t vary in any degree. As I 
understand ''zfs send'' is a limiting factor. I did tests by
sending to
/dev/null. It worked out too slow and absolutely not scalable.
None of cpu/memory/disk activity were in peak load, so there is of room 
for improvement.

Is there any bug report or article that addresses this problem? Any 
workaround or solution?

I found these guys have the same result - around 7 Mbytes/s for
''send''
and 70 Mbytes for ''recv''.
http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html

Thank you in advance,
Anatoly Legkodymov.

Andrew Gabriel

2011-Nov-15 23:17 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On 11/15/11 23:05, Anatoly wrote:> Good day,
>
> The speed of send/recv is around 30-60 MBytes/s for initial send and 
> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk 
> to 100+ disks in pool. But the speed doesn''t vary in any degree.
As I
> understand ''zfs send'' is a limiting factor. I did tests
by sending to
> /dev/null. It worked out too slow and absolutely not scalable.
> None of cpu/memory/disk activity were in peak load, so there is of 
> room for improvement.
>
> Is there any bug report or article that addresses this problem? Any 
> workaround or solution?
>
> I found these guys have the same result - around 7 Mbytes/s for
''send''
> and 70 Mbytes for ''recv''.
> http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html
Well, if I do a zfs send/recv over 1Gbit ethernet from a 2 disk mirror, 
the send runs at almost 100Mbytes/sec, so it''s pretty much limited by 
the ethernet.

Since you have provided none of the diagnostic data you collected, it''s
difficult to guess what the limiting factor is for you.

-- 
Andrew Gabriel

Tim Cook

2011-Nov-15 23:40 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On Tue, Nov 15, 2011 at 5:17 PM, Andrew Gabriel
<andrew.gabriel at oracle.com>wrote:
>  On 11/15/11 23:05, Anatoly wrote:
>
>> Good day,
>>
>> The speed of send/recv is around 30-60 MBytes/s for initial send and
>> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
to
>> 100+ disks in pool. But the speed doesn''t vary in any degree.
As I
>> understand ''zfs send'' is a limiting factor. I did
tests by sending to
>> /dev/null. It worked out too slow and absolutely not scalable.
>> None of cpu/memory/disk activity were in peak load, so there is of room
>> for improvement.
>>
>> Is there any bug report or article that addresses this problem? Any
>> workaround or solution?
>>
>> I found these guys have the same result - around 7 Mbytes/s for
''send''
>> and 70 Mbytes for ''recv''.
>> http://wikitech-static.**wikimedia.org/articles/z/f/s/**
>>
Zfs_replication.html<http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html>
>>
>
> Well, if I do a zfs send/recv over 1Gbit ethernet from a 2 disk mirror,
> the send runs at almost 100Mbytes/sec, so it''s pretty much limited
by the
> ethernet.
>
> Since you have provided none of the diagnostic data you collected,
it''s
> difficult to guess what the limiting factor is for you.
>
> --
> Andrew Gabriel
>
>
>So all the bugs have been fixed?  I seem to recall people on this mailing
list using mbuff to speed it up because it was so bursty and slow at one
point.  IE:
http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/


--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111115/31388112/attachment.html>

Eric D. Mudama

2011-Nov-16 00:01 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On Wed, Nov 16 at  3:05, Anatoly wrote:>Good day,
>
>The speed of send/recv is around 30-60 MBytes/s for initial send and 
>17-25 MBytes/s for incremental. I have seen lots of setups with 1 
>disk to 100+ disks in pool. But the speed doesn''t vary in any
degree.
>As I understand ''zfs send'' is a limiting factor. I did
tests by
>sending to /dev/null. It worked out too slow and absolutely not 
>scalable.
>None of cpu/memory/disk activity were in peak load, so there is of 
>room for improvement.
My belief is that initial/incremental may be affecting it because of
initial versus incremental efficiency of the data layout in the pools,
not because of something inherent in the send/recv process itself.

There are various send/recv improvements (e.g. don''t use SSH as a
tunnel) but even that shouldn''t be capping you at 17MBytes/sec.

My incrementals get me ~35MB/s consistently.  Each incremental is
10-50GB worth of transfer.

cheap gig switch, no jumbo frames
Source = 2 mirrored vdevs + l2arc ssd, CPU = xeon E5520, 6GB RAM
Destination = 4-drive raidz1, CPU = c2d E4500 @2.2GHz, 2GB RAM
tunnel is un-tuned SSH
>I found these guys have the same result - around 7 Mbytes/s for 
>''send'' and 70 Mbytes for ''recv''.
>http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html
Their data doesn''t match mine.

--
Eric D. Mudama
edmudama at bounceswoosh.org

Andrew Gabriel

2011-Nov-16 00:02 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On 11/15/11 23:40, Tim Cook wrote:> On Tue, Nov 15, 2011 at 5:17 PM, Andrew Gabriel 
> <andrew.gabriel at oracle.com <mailto:andrew.gabriel at
oracle.com>> wrote:
>
>      On 11/15/11 23:05, Anatoly wrote:
>
>         Good day,
>
>         The speed of send/recv is around 30-60 MBytes/s for initial
>         send and 17-25 MBytes/s for incremental. I have seen lots of
>         setups with 1 disk to 100+ disks in pool. But the speed
>         doesn''t vary in any degree. As I understand ''zfs
send'' is a
>         limiting factor. I did tests by sending to /dev/null. It
>         worked out too slow and absolutely not scalable.
>         None of cpu/memory/disk activity were in peak load, so there
>         is of room for improvement.
>
>         Is there any bug report or article that addresses this
>         problem? Any workaround or solution?
>
>         I found these guys have the same result - around 7 Mbytes/s
>         for ''send'' and 70 Mbytes for
''recv''.
>        
http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html
>
>
>     Well, if I do a zfs send/recv over 1Gbit ethernet from a 2 disk
>     mirror, the send runs at almost 100Mbytes/sec, so it''s pretty
much
>     limited by the ethernet.
>
>     Since you have provided none of the diagnostic data you collected,
>     it''s difficult to guess what the limiting factor is for you.
>
>     -- 
>     Andrew Gabriel
>
>
>
> So all the bugs have been fixed?
Probably not, but the OP''s implication that zfs send has a specific
rate
limit in the range suggested is demonstrably untrue. So I don''t know 
what''s limiting the OP''s send rate. (I could guess a few
possibilities,
but that''s pointless without the data.)
> I seem to recall people on this mailing list using mbuff to speed it 
> up because it was so bursty and slow at one point.  IE:
>
http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/
>
Yes, this idea originally came from me, having analyzed the send/receive 
traffic behavior in combination with network connection behavior. 
However, it''s the receive side that''s bursty around the TXG
commits, not
the send side, so that doesn''t match the issue the OP is seeing. (The 
buffer sizes in that blog are not optimal, although any buffer at the 
receive side will make a significant improvement if the network 
bandwidth is same order of magnitude as the send/recv are capable of.)

-- 
Andrew Gabriel

Ian Collins

2011-Nov-16 00:11 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On 11/16/11 01:01 PM, Eric D. Mudama wrote:> On Wed, Nov 16 at  3:05, Anatoly wrote:
>> Good day,
>>
>> The speed of send/recv is around 30-60 MBytes/s for initial send and
>> 17-25 MBytes/s for incremental. I have seen lots of setups with 1
>> disk to 100+ disks in pool. But the speed doesn''t vary in any
degree.
>> As I understand ''zfs send'' is a limiting factor. I
did tests by
>> sending to /dev/null. It worked out too slow and absolutely not
>> scalable.
>> None of cpu/memory/disk activity were in peak load, so there is of
>> room for improvement.
> My belief is that initial/incremental may be affecting it because of
> initial versus incremental efficiency of the data layout in the pools,
> not because of something inherent in the send/recv process itself.
>
> There are various send/recv improvements (e.g. don''t use SSH as a
> tunnel) but even that shouldn''t be capping you at 17MBytes/sec.
>
> My incrementals get me ~35MB/s consistently.  Each incremental is
> 10-50GB worth of transfer.
While my incremental sizes are much smaller, the rates I see for dense 
(large blocks of changes, such as media files) incrementals is about the 
same.  I do see much lower rates for more scattered (such as filesystems 
with documents) changes.

-- 
Ian.

Edward Ned Harvey

2011-Nov-16 02:08 UTC

head link

[zfs-discuss] slow zfs send/recv speed

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Anatoly
> 
> The speed of send/recv is around 30-60 MBytes/s for initial send and
> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
I suggest watching zpool iostat before, during, and after the send to
/dev/null.  Actually, I take that back - zpool iostat seems to measure
virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k ops,
which is at least 5-6x higher than my hard drive can handle, which can only
mean it''s reading a lot of previously aggregated small blocks from
disk,
which are now sequentially organized on disk.  How do you measure physical
iops?  Is it just regular iostat?  I have seriously put zero effort into
answering this question (sorry.)

I have certainly noticed a delay in the beginning, while the system thinks
about stuff for a little while to kick off an incremental... And it''s
acknowledged and normal that incrementals are likely fragmented all over the
place so you could be IOPS limited (hence watching the iostat).

Also, whenever I sit and watch it for long times, I see that it varies
enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
will be 5x higher...

Whatever it is, it''s something we likely are all seeing, but probably
just
ignoring.  If you can find it in your heart to just ignore it too, then
great, no problem.  ;-)  Otherwise, it''s a matter of digging in and
characterizing to learn more about it.

David Dyer-Bennet

2011-Nov-16 15:27 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On Tue, November 15, 2011 20:08, Edward Ned Harvey
wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Anatoly
>>
>> The speed of send/recv is around 30-60 MBytes/s for initial send and
>> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
>
> I suggest watching zpool iostat before, during, and after the send to
> /dev/null.  Actually, I take that back - zpool iostat seems to measure
> virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k
> ops,
> which is at least 5-6x higher than my hard drive can handle, which can
> only
> mean it''s reading a lot of previously aggregated small blocks from
disk,
> which are now sequentially organized on disk.  How do you measure physical
> iops?  Is it just regular iostat?  I have seriously put zero effort into
> answering this question (sorry.)
>
> I have certainly noticed a delay in the beginning, while the system thinks
> about stuff for a little while to kick off an incremental... And
it''s
> acknowledged and normal that incrementals are likely fragmented all over
> the
> place so you could be IOPS limited (hence watching the iostat).
>
> Also, whenever I sit and watch it for long times, I see that it varies
> enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
> will be 5x higher...
>
> Whatever it is, it''s something we likely are all seeing, but
probably just
> ignoring.  If you can find it in your heart to just ignore it too, then
> great, no problem.  ;-)  Otherwise, it''s a matter of digging in
and
> characterizing to learn more about it.
I see rather variable io stats while sending incremental backups.  The
receiver is a USB disk, so fairly slow, but I get 30MB/s in a good
stretch.  I''m compressing the ZFS filesystem on the receiving end, but
much of my content is already-compressed photo files, so it doesn''t
make a
huge difference.   Helps some, though, and at 30MB/s there''s no
shortage
of CPU horsepower to handle the compression.

The raw files are around 12MB each, probably not fragmented much
(they''re
just copied over from memory cards).  For a small number of the files,
there''s a photoshop file that''s much bigger (sometimes more
than 1GB, if
it''s a stitched panorama with layers of changes).  And then there are
sidecar XMP files, mostly two per image, and for most of them
web-resolution images, 100kB.

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

David Dyer-Bennet

2011-Nov-16 15:35 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On Tue, November 15, 2011 17:05, Anatoly wrote:> Good day,
>
> The speed of send/recv is around 30-60 MBytes/s for initial send and
> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
> to 100+ disks in pool. But the speed doesn''t vary in any degree.
As I
> understand ''zfs send'' is a limiting factor. I did tests
by sending to
> /dev/null. It worked out too slow and absolutely not scalable.
> None of cpu/memory/disk activity were in peak load, so there is of room
> for improvement.
What you''re probably seeing with incremental sends is that the disks
being
read are hitting their IOPS limits.  Zfs send does random reads all over
the place -- every block that''s changed since the last incremental send
is
read, in TXG order.  So that''s essentially random reads all of the
disk.

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Anatoly

2011-Nov-16 16:07 UTC

head link

Re: slow zfs send/recv speed

Good day,

    I''ve just made clean test for sequential data read. System has 45
    mirror vdevs.

    1. Create 160GB random file.

    2. Read it to /dev/null.

    3. Do Snaspshot and send it to /dev/null.

    4. Compare results.

    1. Write speed is slow due to ''urandom'':

    # dd if=/dev/urandom bs=128k | pv &gt; big_file

    161118683136 bytes (161 GB) copied, 3962.15 seconds, 40.7 MB/s

    2. Read file normally:

    # time dd if=./big_file bs=128k of=/dev/null

    161118683136 bytes (161 GB) copied, 103.455 seconds, 1.6 GB/s

    real    1m43.459s

    user    0m0.899s

    sys     1m25.078s

    3. Snapshot &amp; send:

    # zfs snapshot volume/test@A

    # time zfs send volume/test@A &gt; /dev/null

    real    7m20.635s

    user    0m0.004s

    sys     0m52.760s

    4. As you see, there is 4 times difference on pure sequential read,
    greenhouse conditions.

    I repeated tests couple of times to check ARC influence - no much
    difference.

    Real send speed on this system is around 60 MBytes/s with some 100
    peak.

    File read operation is good scaled for large number of disks. But
    ''zfs send'' is lame.

    In normal conditions moving of large portions of data may take days
    to weeks. It can''t fill 

    10G Ethernet connection, sometimes even 1G.

    Best regards,

    Anatoly Legkodymov.

    On 16.11.2011 06:08, Edward Ned Harvey wrote:

From: zfs-discuss-bounces@opensolaris.org [mailto:zfs-discuss-
bounces@opensolaris.org] On Behalf Of Anatoly

The speed of send/recv is around 30-60 MBytes/s for initial send and
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk

I suggest watching zpool iostat before, during, and after the send to
/dev/null.  Actually, I take that back - zpool iostat seems to measure
virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k ops,
which is at least 5-6x higher than my hard drive can handle, which can only
mean it''s reading a lot of previously aggregated small blocks from
disk,
which are now sequentially organized on disk.  How do you measure physical
iops?  Is it just regular iostat?  I have seriously put zero effort into
answering this question (sorry.)

I have certainly noticed a delay in the beginning, while the system thinks
about stuff for a little while to kick off an incremental... And it''s
acknowledged and normal that incrementals are likely fragmented all over the
place so you could be IOPS limited (hence watching the iostat).

Also, whenever I sit and watch it for long times, I see that it varies
enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
will be 5x higher...

Whatever it is, it''s something we likely are all seeing, but probably
just
ignoring.  If you can find it in your heart to just ignore it too, then
great, no problem.  ;-)  Otherwise, it''s a matter of digging in and
characterizing to learn more about it.


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Paul Kraus

2011-Nov-16 17:14 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On Wed, Nov 16, 2011 at 11:07 AM, Anatoly <legko777 at fastmail.fm> wrote:
> I''ve just made clean test for sequential data read. System has 45
mirror
> vdevs.
>
> 1. Create 160GB random file.
> 2. Read it to /dev/null.
> 3. Do Snaspshot and send it to /dev/null.
> 4. Compare results.
What OS?

The following is under Solaris 10U9 with CPU_2010-10 + an IDR for a
SAS/SATA drive bug.

I just had to replicate over 20TB of small files, `zfs send -R
<zfs at snap> | zfs recv -e <zfs>`, and I got an AVERAGE throughput
of
over 77MB/sec. (over 6TB /day). The entire replication took just over
3 days.

The source zpool is on J4400 750GB SATA drives, 110 of them in a
RAIDz2 configuration (22 vdevs of 5 disks each), the target was a pair
of old h/w raid boxes (one without any NVRAM cache) and a zpool
configuration of 6 striped vdevs (a total of 72 drives behind the h/w
raid controller doing raid5, this is temporary and only for moving
data physically around, so the lack of ZFS redundancy is not an
issue).

There are over 2300 snapshots on the source side and we were
replicating close to 2000 of them.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players

Eric D. Mudama

2011-Nov-16 18:27 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On Wed, Nov 16 at  9:35, David Dyer-Bennet wrote:>
>On Tue, November 15, 2011 17:05, Anatoly wrote:
>> Good day,
>>
>> The speed of send/recv is around 30-60 MBytes/s for initial send and
>> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
>> to 100+ disks in pool. But the speed doesn''t vary in any
degree. As I
>> understand ''zfs send'' is a limiting factor. I did
tests by sending to
>> /dev/null. It worked out too slow and absolutely not scalable.
>> None of cpu/memory/disk activity were in peak load, so there is of room
>> for improvement.
>
>What you''re probably seeing with incremental sends is that the
disks being
>read are hitting their IOPS limits.  Zfs send does random reads all over
>the place -- every block that''s changed since the last incremental
send is
>read, in TXG order.  So that''s essentially random reads all of the
disk.
Anatoly didn''t state whether his 160GB file test was done on a virgin
pool, or whether it was allocated out of an existing pool.  If the
latter, your comment is the likely explanation.  If the former, your
comment wouldn''t explain the slow performance.

--eric

-- 
Eric D. Mudama
edmudama at bounceswoosh.org

Edward Ned Harvey

2011-Nov-17 03:25 UTC

head link

[zfs-discuss] slow zfs send/recv speed

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Anatoly
> 
> I''ve just made clean test for sequential data read. System has 45
mirror
> vdevs.
90 disks in the system...  I bet you have a lot of ram?

> 2. Read file normally:
> # time dd if=./big_file bs=128k of=/dev/null
> 161118683136 bytes (161 GB) copied, 103.455 seconds, 1.6 GB/s
I wonder how much of that is being read back from cache.  Would it be
impossible to reboot, or otherwise invalidate the cache, before reading the
file back?

With 90 disks, in theory, you should be able to read something like 90Gbit 11GB
/ sec.  But of course various bus speed bottlenecks come into play, so
I don''t think the 1.6GB/s is unrealistically high in any way.

> 3. Snapshot & send:
> # zfs snapshot volume/test at A
> # time zfs send volume/test at A > /dev/null
> real??? 7m20.635s
> user??? 0m0.004s
> sys???? 0m52.760s
This doesn''t surprise me, based on gut feel, I don''t think zfs
send performs
optimally, in general.

I think your results are probably correct, and even if you revisit all this,
doing the reboots (or cache invalidation) and/or using a newly created pool,
as anyone here might suggest...  I think you''ll still see the same
results.
Somewhat unpredictably.

Even so, I always find zfs send performance still beats the pants off any
alternative... rsync and whatnot.

Richard Elling

2011-Nov-17 04:51 UTC

head link

[zfs-discuss] slow zfs send/recv speed

On Nov 16, 2011, at 7:35 AM, David Dyer-Bennet wrote:
> 
> On Tue, November 15, 2011 17:05, Anatoly wrote:
>> Good day,
>> 
>> The speed of send/recv is around 30-60 MBytes/s for initial send and
>> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
>> to 100+ disks in pool. But the speed doesn''t vary in any
degree. As I
>> understand ''zfs send'' is a limiting factor. I did
tests by sending to
>> /dev/null. It worked out too slow and absolutely not scalable.
>> None of cpu/memory/disk activity were in peak load, so there is of room
>> for improvement.
> 
> What you''re probably seeing with incremental sends is that the
disks being
> read are hitting their IOPS limits.  Zfs send does random reads all over
> the place -- every block that''s changed since the last incremental
send is
> read, in TXG order.  So that''s essentially random reads all of the
disk.
Not necessarily. I''ve seen sustained zfs sends in the 600+ MB/sec range
for modest servers. It does depend on how the data is used more than the 
hardware it is stored upon.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA ''11, Boston, MA, December 4-9

zfs discuss - Nov 2011 - slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

Re: slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed

[zfs-discuss] slow zfs send/recv speed