thr3ads.net - zfs discuss - [zfs-discuss] ''zfs recv'' is very slow [Nov 2008]

If this information is useful, please help other people find it:
Share via:

River Tarnell

2008-Nov-06 23:09 UTC

[zfs-discuss] ''zfs recv'' is very slow

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi,

i have two systems, A (Solaris 10 update 5) and B (Solaris 10 update 6). 
i''m
using ''zfs send -i'' to replicate changes on A to B.  however,
the ''zfs recv'' on
B is running extremely slowly.  if i run the zfs send on A and redirect output
to a file, it sends at 2MB/sec.  but when i use ''zfs send ... | ssh B
zfs
recv'', the speed drops to 200KB/sec.  according to iostat, B (which is
otherwise idle) is doing ~20MB/sec of disk reads, and very little writing.

i don''t believe the problem is ssh, as the systems are on the same LAN,
and
running ''tar'' over ssh runs much faster (20MB/sec or more).

is this slowness normal?  is there any way to improve it?  (the idea here is to
use B as a backup of A, but if i can only replicate at 200KB/s, it''s
not going
to be able to keep up with the load...)

both systems are X4500s with 16GB ram, 48 SATA disks and 4 2.8GHz cores.

thanks,
river.
-----BEGIN PGP SIGNATURE-----

iD8DBQFJE3k5IXd7fCuc5vIRAs2JAJ0W0dYVgfyNUXGWHbg59D5mQgq9jQCfWUsm
5/c8g4JMmtIj59mZ5ghkdIY=QNG3
-----END PGP SIGNATURE-----

Ian Collins

2008-Nov-06 23:57 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

On Fri 07/11/08 12:09 , River Tarnell river at loreley.flyingparchment.org.uk
sent:> 
> hi,
> 
> i have two systems, A (Solaris 10 update 5) and B (Solaris 10 update 6). 
> i''musing ''zfs send -i'' to replicate changes on A
to B.  however, the ''zfs
> recv'' onB is running extremely slowly.  if i run the zfs send on A
and redirect
> outputto a file, it sends at 2MB/sec.  but when i use ''zfs send
... | ssh B
> zfsrecv'', the speed drops to 200KB/sec.  according to iostat, B
(which is
> otherwise idle) is doing ~20MB/sec of disk reads, and very little
> writing.
> i don''t believe the problem is ssh, as the systems are on the same
LAN,
> andrunning ''tar'' over ssh runs much faster (20MB/sec or
more).
> 
> is this slowness normal?  is there any way to improve it?  (the idea here
> is touse B as a backup of A, but if i can only replicate at 200KB/s,
it''s not
> goingto be able to keep up with the load...)
> That''s very slow.  What''s the nature of your data?

I''m currently replicating data between an x4500 and an x4540 and I see
about 50% of ftp transfer speed for zfs sens/receive (about 60GB/hour).

Time each phase (send to a file, copy the file to B and receive from the file). 
When I tried this on a filesystem with a range of file sizes, I had about 30% of
the total transfer time in send, 50% in copy and 20% in receive.

-- 
Ian.

River Tarnell

2008-Nov-07 00:19 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Collins:> That''s very slow.  What''s the nature of your data? 
mainly two sets of mid-sized files; one of 200KB-2MB in size and other under
50KB.  they are organised into subdirectories, A/B/C/<file>.  each
directory
has 18,000-25,000 files.  total data size is around 2.5TB.

hm, something changed while i was writing this mail: now the transfer is
running at 2MB/sec, and the read i/o has disappeared.  that''s still
slower than
i''d expect, but an improvement.
> Time each phase (send to a file, copy the file to B and receive from the
file).  When I tried this on a filesystem with a range of file sizes, I had
about 30% of the total transfer time in send, 50% in copy and 20% in receive.
i''d rather not interrupt the current send, as it''s quite
large.  once it''s
finished, i''ll test with smaller changes...

	- river.
-----BEGIN PGP SIGNATURE-----

iD8DBQFJE4mXIXd7fCuc5vIRAv0/AJoCRtMBN1/WD7zVVRzV2n4xeqBvyACeLNL/
rLB1iHlu4xZdUPSiNj/iWl4=+F7d
-----END PGP SIGNATURE-----

Brent Jones

2008-Nov-07 00:41 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

On Thu, Nov 6, 2008 at 4:19 PM, River Tarnell
<river at loreley.flyingparchment.org.uk> wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Ian Collins:
>> That''s very slow.  What''s the nature of your data?
>
> mainly two sets of mid-sized files; one of 200KB-2MB in size and other
under
> 50KB.  they are organised into subdirectories, A/B/C/<file>.  each
directory
> has 18,000-25,000 files.  total data size is around 2.5TB.
>
> hm, something changed while i was writing this mail: now the transfer is
> running at 2MB/sec, and the read i/o has disappeared.  that''s
still slower than
> i''d expect, but an improvement.
>
>> Time each phase (send to a file, copy the file to B and receive from
the file).  When I tried this on a filesystem with a range of file sizes, I had
about 30% of the total transfer time in send, 50% in copy and 20% in receive.
>
> i''d rather not interrupt the current send, as it''s quite
large.  once it''s
> finished, i''ll test with smaller changes...
>
>        - river.
> -----BEGIN PGP SIGNATURE-----
>
> iD8DBQFJE4mXIXd7fCuc5vIRAv0/AJoCRtMBN1/WD7zVVRzV2n4xeqBvyACeLNL/
> rLB1iHlu4xZdUPSiNj/iWl4> =+F7d
> -----END PGP SIGNATURE-----
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
Theres been a couple threads about this now, tracked some bug
ID''s/ticket:

6333409
6418042
66104157

If you wanna see the status

-- 
Brent Jones
brent at servuhome.net

Ian Collins

2008-Nov-07 10:17 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

River Tarnell wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Ian Collins:
>   
>> That''s very slow.  What''s the nature of your data?
>>     
>  
> mainly two sets of mid-sized files; one of 200KB-2MB in size and other
under
> 50KB.  they are organised into subdirectories, A/B/C/<file>.  each
directory
> has 18,000-25,000 files.  total data size is around 2.5TB.
>
> hm, something changed while i was writing this mail: now the transfer is
> running at 2MB/sec, and the read i/o has disappeared.  that''s
still slower than
> i''d expect, but an improvement.
>
>   The transfer I mentioned just completed, 1.45.TB sent in 84832 seconds
(17.9MB/sec).  This was during a working day when the sever and network
were busy. 

The best ftp speed I managed was 59 MB/sec over then same network.

-- 
Ian.

Ian Collins

2008-Nov-07 11:26 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

River Tarnell wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> hi,
>
> i have two systems, A (Solaris 10 update 5) and B (Solaris 10 update 6). 
i''m
> using ''zfs send -i'' to replicate changes on A to B. 
however, the ''zfs recv'' on
> B is running extremely slowly.  I''m sorry, I didn''t notice the "-i" in your original
message.

I get the same problem sending incremental streams between Thumpers.

-- 
Ian.

Ian Collins

2008-Nov-07 20:24 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Brent Jones wrote:> Theres been a couple threads about this now, tracked some bug
ID''s/ticket:
>
> 6333409
> 6418042I see these are fixed in build 102.

Are they targeted to get back to Solaris 10 via a patch? 

If not, is it worth escalating the issue with support to get a patch?

-- 
Ian.

Andrew Gabriel

2008-Nov-08 00:55 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Ian Collins wrote:> Brent Jones wrote:
>> Theres been a couple threads about this now, tracked some bug
ID''s/ticket:
>>
>> 6333409
>> 6418042
> I see these are fixed in build 102.
> 
> Are they targeted to get back to Solaris 10 via a patch? 
> 
> If not, is it worth escalating the issue with support to get a patch?
Given the issue described is slow zfs recv over network, I suspect this is:

6729347 Poor zfs receive performance across networks

This is quite easily worked around by putting a buffering program 
between the network and the zfs receive. There is a public domain 
"mbuffer" which should work, although I haven''t tried it as I
wrote my
own. The buffer size you need is about 5 seconds worth of data. In my 
case of 7200RPM disks (in a mirror and not striped) and a gigabit 
ethernet link, the disks are the limiting factor at around 57MB/sec 
sustained i/o, so I used a 250MB buffer to best effect. If I recall 
correctly, that speeded up the zfs send/recv across the network by about 
3 times, and it then ran at the disk platter speed.

-- 
Andrew

Ian Collins

2008-Nov-08 00:59 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Andrew Gabriel wrote:> Ian Collins wrote:
>   
>> Brent Jones wrote:
>>     
>>> Theres been a couple threads about this now, tracked some bug
ID''s/ticket:
>>>
>>> 6333409
>>> 6418042
>>>       
>> I see these are fixed in build 102.
>>
>> Are they targeted to get back to Solaris 10 via a patch? 
>>
>> If not, is it worth escalating the issue with support to get a patch?
>>     
>
> Given the issue described is slow zfs recv over network, I suspect this is:
>
> 6729347 Poor zfs receive performance across networks
>
> This is quite easily worked around by putting a buffering program 
> between the network and the zfs receive. There is a public domain 
> "mbuffer" which should work, although I haven''t tried it
as I wrote my
> own. The buffer size you need is about 5 seconds worth of data. In my 
> case of 7200RPM disks (in a mirror and not striped) and a gigabit 
> ethernet link, the disks are the limiting factor at around 57MB/sec 
> sustained i/o, so I used a 250MB buffer to best effect. If I recall 
> correctly, that speeded up the zfs send/recv across the network by about 
> 3 times, and it then ran at the disk platter speed.
>
>   Did this apply to incremental sends as well?  I can live with ~20MB/sec
for full sends, but ~1MB/sec for incremental sends is a killer.

-- 
Ian.

Andrew Gabriel

2008-Nov-08 01:30 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Ian Collins wrote:> Andrew Gabriel wrote:
>> Ian Collins wrote:
>>   
>>> Brent Jones wrote:
>>>     
>>>> Theres been a couple threads about this now, tracked some bug
ID''s/ticket:
>>>>
>>>> 6333409
>>>> 6418042
>>>>       
>>> I see these are fixed in build 102.
>>>
>>> Are they targeted to get back to Solaris 10 via a patch? 
>>>
>>> If not, is it worth escalating the issue with support to get a
patch?
>>>     
>> Given the issue described is slow zfs recv over network, I suspect this
is:
>>
>> 6729347 Poor zfs receive performance across networks
>>
>> This is quite easily worked around by putting a buffering program 
>> between the network and the zfs receive. There is a public domain 
>> "mbuffer" which should work, although I haven''t
tried it as I wrote my
>> own. The buffer size you need is about 5 seconds worth of data. In my 
>> case of 7200RPM disks (in a mirror and not striped) and a gigabit 
>> ethernet link, the disks are the limiting factor at around 57MB/sec 
>> sustained i/o, so I used a 250MB buffer to best effect. If I recall 
>> correctly, that speeded up the zfs send/recv across the network by
about
>> 3 times, and it then ran at the disk platter speed.
>  
> Did this apply to incremental sends as well?  I can live with ~20MB/sec
> for full sends, but ~1MB/sec for incremental sends is a killer.
It doesn''t help the ~1MB/sec periods in incrementals, but it does help 
the fast periods in incrementals.

-- 
Andrew

Ian Collins

2008-Nov-08 01:41 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Andrew Gabriel wrote:> Ian Collins wrote:
>> Andrew Gabriel wrote:
>>> Ian Collins wrote:
>>>  
>>>> Brent Jones wrote:
>>>>    
>>>>> Theres been a couple threads about this now, tracked some
bug
>>>>> ID''s/ticket:
>>>>>
>>>>> 6333409
>>>>> 6418042
>>>>>       
>>>> I see these are fixed in build 102.
>>>>
>>>> Are they targeted to get back to Solaris 10 via a patch?
>>>> If not, is it worth escalating the issue with support to get a
patch?
>>>>     
>>> Given the issue described is slow zfs recv over network, I suspect
>>> this is:
>>>
>>> 6729347 Poor zfs receive performance across networks
>>>
>>> This is quite easily worked around by putting a buffering program
>>> between the network and the zfs receive. There is a public domain
>>> "mbuffer" which should work, although I haven''t
tried it as I wrote
>>> my own. The buffer size you need is about 5 seconds worth of data.
>>> In my case of 7200RPM disks (in a mirror and not striped) and a
>>> gigabit ethernet link, the disks are the limiting factor at around
>>> 57MB/sec sustained i/o, so I used a 250MB buffer to best effect. If
>>> I recall correctly, that speeded up the zfs send/recv across the
>>> network by about 3 times, and it then ran at the disk platter
speed.
>>  
>> Did this apply to incremental sends as well?  I can live with ~20MB/sec
>> for full sends, but ~1MB/sec for incremental sends is a killer.
>
> It doesn''t help the ~1MB/sec periods in incrementals, but it does
help
> the fast periods in incrementals.
>:)

I don''t see the 5 second bursty behaviour described in the bug report. 
It''s more like 5 second interval gaps in the network traffic while the
data is written to disk.

-- 
Ian.

Andrew Gabriel

2008-Nov-08 02:09 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Ian Collins wrote:> Andrew Gabriel wrote:
>> Ian Collins wrote:
>>> Andrew Gabriel wrote:
>>>> Given the issue described is slow zfs recv over network, I
suspect
>>>> this is:
>>>>
>>>> 6729347 Poor zfs receive performance across networks
>>>>
>>>> This is quite easily worked around by putting a buffering
program
>>>> between the network and the zfs receive. There is a public
domain
>>>> "mbuffer" which should work, although I
haven''t tried it as I wrote
>>>> my own. The buffer size you need is about 5 seconds worth of
data.
>>>> In my case of 7200RPM disks (in a mirror and not striped) and a
>>>> gigabit ethernet link, the disks are the limiting factor at
around
>>>> 57MB/sec sustained i/o, so I used a 250MB buffer to best
effect. If
>>>> I recall correctly, that speeded up the zfs send/recv across
the
>>>> network by about 3 times, and it then ran at the disk platter
speed.
>>>  
>>> Did this apply to incremental sends as well?  I can live with
~20MB/sec
>>> for full sends, but ~1MB/sec for incremental sends is a killer.
>> It doesn''t help the ~1MB/sec periods in incrementals, but it
does help
>> the fast periods in incrementals.
>>
> :)
> 
> I don''t see the 5 second bursty behaviour described in the bug
report.
> It''s more like 5 second interval gaps in the network traffic while
the
> data is written to disk.
That is exactly the issue. When the zfs recv data has been written, zfs 
recv starts reading the network again, but there''s only a tiny amount
of
data buffered in the TCP/IP stack, so it has to wait for the network to 
heave more data across. In effect, it''s a single buffered copy. The 
addition of a buffer program turns it into a double-buffered (or cyclic 
buffered) copy, with the disks running flat out continuously, and the 
network streaming data across continuously at the disk platter speed.

What are your theoretical max speeds for network and disk i/o?
Taking the smaller of these two, are you seeing the sustained send/recv 
performance match that (excluding the ~1MB/sec periods which is some 
other problem)?

The effect described in that bug is most obvious when the disk and 
network speeds are same order of magnitude (as in the example I gave 
above). Given my disk i/o rate above, if the network is much faster 
(say, 10GB), then it''s going to cope with the bursty nature of the 
traffic better. If the network is much slower (say, 100MB), then it''s 
going to be running flat out anyway and again you won''t notice the 
bursty reads (a colleague measured only 20% gain in that case, rather 
than my 200% gain).

-- 
Andrew

River Tarnell

2008-Nov-10 11:16 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Gabriel:> This is quite easily worked around by putting a buffering program 
> between the network and the zfs receive.
i tested inserting mbuffer with a 250MB buffer between the zfs send and zfs
recv.  unfortunately, it seems to make very little different to my incremental
send speed.  mbuffer reported the average speed after the transfer as:

summary: 81.3 GByte in 30 h 28 min 32.4 sec - average of  777 kB/s

i suppose this is only a benefit when the send is running at a reasonable
speed, i.e. for full sends, not incrementals.

	- river.
-----BEGIN PGP SIGNATURE-----

iD8DBQFJGBf9IXd7fCuc5vIRArr4AKDBBYkztna7+vlELs51mFNrGu1GawCgs8BA
jyetTgZeYb2B5Y+xOgDkorM=Z84D
-----END PGP SIGNATURE-----

Scott Williamson

2008-Nov-10 17:24 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

I have an open ticket to have these putback into Solaris 10.

On Fri, Nov 7, 2008 at 3:24 PM, Ian Collins <ian at ianshome.com> wrote:
> Brent Jones wrote:
> > Theres been a couple threads about this now, tracked some bug
> ID''s/ticket:
> >
> > 6333409
> > 6418042
> I see these are fixed in build 102.
>
> Are they targeted to get back to Solaris 10 via a patch?
>
> If not, is it worth escalating the issue with support to get a patch?
>
> --
> Ian.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081110/a87db28d/attachment.html>

Ian Collins

2008-Nov-10 18:18 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

River Tarnell wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Andrew Gabriel:
>   
>> This is quite easily worked around by putting a buffering program 
>> between the network and the zfs receive.
>>     
>
> i tested inserting mbuffer with a 250MB buffer between the zfs send and zfs
> recv.  unfortunately, it seems to make very little different to my
incremental
> send speed.  mbuffer reported the average speed after the transfer as:
>
> summary: 81.3 GByte in 30 h 28 min 32.4 sec - average of  777 kB/s
>
> i suppose this is only a benefit when the send is running at a reasonable
> speed, i.e. for full sends, not incrementals.
>
>   A similar test (which yields similar results) is to sent an incremental
to a file over NFS and then receive from the file.

-- 
Ian.

Scott Williamson

2008-Nov-10 18:37 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

If anyone out there has a support contract with sun that covers Solaris 10
support. Feel free to email me and/or sun and have them add you to my
support case.

The Sun Case is 66104157 and I am seeking to have 6333409 and 6418042
putback into Solaris 10.

CR 6712788 was closed as a duplicate of CR 6421958, the fix for which is
scheduled to be included in Update 6.
On Mon, Nov 10, 2008 at 12:24 PM, Scott Williamson <
scott.williamson at gmail.com> wrote:
> I have an open ticket to have these putback into Solaris 10.
>
>
> On Fri, Nov 7, 2008 at 3:24 PM, Ian Collins <ian at ianshome.com>
wrote:
>
>> Brent Jones wrote:
>> > Theres been a couple threads about this now, tracked some bug
>> ID''s/ticket:
>> >
>> > 6333409
>> > 6418042
>> I see these are fixed in build 102.
>>
>> Are they targeted to get back to Solaris 10 via a patch?
>>
>> If not, is it worth escalating the issue with support to get a patch?
>>
>> --
>> Ian.
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081110/6bda08c0/attachment.html>

Joerg Schilling

2008-Nov-14 14:46 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Andrew Gabriel <Andrew.Gabriel at Sun.COM> wrote:
> That is exactly the issue. When the zfs recv data has been written, zfs 
> recv starts reading the network again, but there''s only a tiny
amount of
> data buffered in the TCP/IP stack, so it has to wait for the network to 
> heave more data across. In effect, it''s a single buffered copy.
The
> addition of a buffer program turns it into a double-buffered (or cyclic 
> buffered) copy, with the disks running flat out continuously, and the 
> network streaming data across continuously at the disk platter speed.
rmt and star increase the Socket read/write buffer size via

setsockopt(STDOUT_FILENO, SOL_SOCKET, SO_SNDBUF, 
setsockopt(STDIN_FILENO, SOL_SOCKET, SO_RCVBUF,

when doing "remote tape access".

This has a notable effect on throughput.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Thomas Maier-Komor

2008-Nov-14 15:12 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Joerg Schilling schrieb:> Andrew Gabriel <Andrew.Gabriel at Sun.COM> wrote:
> 
>> That is exactly the issue. When the zfs recv data has been written, zfs
>> recv starts reading the network again, but there''s only a tiny
amount of
>> data buffered in the TCP/IP stack, so it has to wait for the network to
>> heave more data across. In effect, it''s a single buffered
copy. The
>> addition of a buffer program turns it into a double-buffered (or cyclic
>> buffered) copy, with the disks running flat out continuously, and the 
>> network streaming data across continuously at the disk platter speed.
> 
> rmt and star increase the Socket read/write buffer size via
> 
> setsockopt(STDOUT_FILENO, SOL_SOCKET, SO_SNDBUF, 
> setsockopt(STDIN_FILENO, SOL_SOCKET, SO_RCVBUF,
> 
> when doing "remote tape access".
> 
> This has a notable effect on throughput.
> 
> J?rg
> 
yesterday, I''ve release a new version of mbuffer, which also enlarges
the default TCP buffer size. So everybody using mbuffer for network data
transfer might want to update.

For everybody unfamiliar with mbuffer, it might be worth to note that it
has a bunch of additional features like e.g. sending to multiple clients
at once, high/low watermark flushing to prevent tape drives from
stop/rewind/restart cycles.

- Thomas

Jerry K

2008-Nov-14 16:25 UTC

head link

[zfs-discuss] mbuffer WAS''zfs recv'' is very slow

Hello Thomas,

What is mbuffer?  Where might I go to read more about it?

Thanks,

Jerry


> 
> yesterday, I''ve release a new version of mbuffer, which also
enlarges
> the default TCP buffer size. So everybody using mbuffer for network data
> transfer might want to update.
> 
> For everybody unfamiliar with mbuffer, it might be worth to note that it
> has a bunch of additional features like e.g. sending to multiple clients
> at once, high/low watermark flushing to prevent tape drives from
> stop/rewind/restart cycles.
> 
> - Thomas

Thomas Maier-Komor

2008-Nov-14 16:35 UTC

head link

[zfs-discuss] mbuffer WAS''zfs recv'' is very slow

Jerry K schrieb:> Hello Thomas,
> 
> What is mbuffer?  Where might I go to read more about it?
> 
> Thanks,
> 
> Jerry
> 
> 
> 
>>
>> yesterday, I''ve release a new version of mbuffer, which also
enlarges
>> the default TCP buffer size. So everybody using mbuffer for network
data
>> transfer might want to update.
>>
>> For everybody unfamiliar with mbuffer, it might be worth to note that
it
>> has a bunch of additional features like e.g. sending to multiple
clients
>> at once, high/low watermark flushing to prevent tape drives from
>> stop/rewind/restart cycles.
>>
>> - Thomas
The man page is included in the source, which you get over here:
http://www.maier-komor.de/mbuffer.html

New release are announce on freshmeat.org.

Maybe I should add an html of the man page to the homepage of mbuffer...

- Thomas

Andrew Gabriel

2008-Nov-14 16:50 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Joerg Schilling wrote:> Andrew Gabriel <Andrew.Gabriel at Sun.COM> wrote:
> 
>> That is exactly the issue. When the zfs recv data has been written, zfs
>> recv starts reading the network again, but there''s only a tiny
amount of
>> data buffered in the TCP/IP stack, so it has to wait for the network to
>> heave more data across. In effect, it''s a single buffered
copy. The
>> addition of a buffer program turns it into a double-buffered (or cyclic
>> buffered) copy, with the disks running flat out continuously, and the 
>> network streaming data across continuously at the disk platter speed.
> 
> rmt and star increase the Socket read/write buffer size via
> 
> setsockopt(STDOUT_FILENO, SOL_SOCKET, SO_SNDBUF, 
> setsockopt(STDIN_FILENO, SOL_SOCKET, SO_RCVBUF,
> 
> when doing "remote tape access".
> 
> This has a notable effect on throughput.
Interesting idea, but for 7200 RPM disks (and a 1Gb ethernet link), I 
need a 250GB buffer (enough to buffer 4-5 seconds worth of data).
That''s
many orders of magnitude bigger than SO_RCVBUF can go.

-- 
Andrew

Andrew Gabriel

2008-Nov-14 16:52 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Andrew Gabriel wrote:> Interesting idea, but for 7200 RPM disks (and a 1Gb ethernet link), I 
> need a 250GB buffer (enough to buffer 4-5 seconds worth of data).
That''s
> many orders of magnitude bigger than SO_RCVBUF can go.
No -- that''s wrong -- should read 250MB buffer!
Still some orders of magnitude bigger than SO_RCVBUF can go.

-- 
Andrew

Joerg Schilling

2008-Nov-14 18:04 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Andrew Gabriel <Andrew.Gabriel at Sun.COM> wrote:
> Andrew Gabriel wrote:
> > Interesting idea, but for 7200 RPM disks (and a 1Gb ethernet link), I 
> > need a 250GB buffer (enough to buffer 4-5 seconds worth of data).
That''s
> > many orders of magnitude bigger than SO_RCVBUF can go.
>
> No -- that''s wrong -- should read 250MB buffer!
> Still some orders of magnitude bigger than SO_RCVBUF can go.
It''s affordable e.g. on a X4540 with 64 GB of RAM.

ZFS started with constraints that could not be made true in 2001.

On my first Sun at home (a Sun 2/50 with 1 MB of RAM) in 1986, I could
set the socket buffer size to 63 kB. 63kB : 1 MB is the same ratio
as 256 MB : 4 GB.

BTW: a lot of numbers in Solaris did not grow since a long time and
thus create problems now. Just think about the maxphys values....
63 kB on x86 does not even allow to write a single BluRay disk sector
with a single transfer.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Brent Jones

2008-Nov-14 18:27 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

On Fri, Nov 14, 2008 at 10:04 AM, Joerg Schilling
<Joerg.Schilling at fokus.fraunhofer.de> wrote:> Andrew Gabriel <Andrew.Gabriel at Sun.COM> wrote:
>
>> Andrew Gabriel wrote:
>> > Interesting idea, but for 7200 RPM disks (and a 1Gb ethernet
link), I
>> > need a 250GB buffer (enough to buffer 4-5 seconds worth of data).
That''s
>> > many orders of magnitude bigger than SO_RCVBUF can go.
>>
>> No -- that''s wrong -- should read 250MB buffer!
>> Still some orders of magnitude bigger than SO_RCVBUF can go.
>
> It''s affordable e.g. on a X4540 with 64 GB of RAM.
>
> ZFS started with constraints that could not be made true in 2001.
>
> On my first Sun at home (a Sun 2/50 with 1 MB of RAM) in 1986, I could
> set the socket buffer size to 63 kB. 63kB : 1 MB is the same ratio
> as 256 MB : 4 GB.
>
> BTW: a lot of numbers in Solaris did not grow since a long time and
> thus create problems now. Just think about the maxphys values....
> 63 kB on x86 does not even allow to write a single BluRay disk sector
> with a single transfer.
>
> J?rg
>
> --
>  EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353
Berlin
>       js at cs.tu-berlin.de                (uni)
>       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
>  URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
I''d like to see Sun''s position on the speed at which large
file
systems perform ZFS send/receive.
I expect my X4540''s to nearly fill 48TB (or more considering
compression), and taking 24 hours to transfer 100GB is, well, I could
do better on an ISDN line from 1995.

-- 
Brent Jones
brent at servuhome.net

Bob Friesenhahn

2008-Nov-14 18:32 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

On Fri, 14 Nov 2008, Joerg Schilling wrote:>
> On my first Sun at home (a Sun 2/50 with 1 MB of RAM) in 1986, I could
> set the socket buffer size to 63 kB. 63kB : 1 MB is the same ratio
> as 256 MB : 4 GB.
>
> BTW: a lot of numbers in Solaris did not grow since a long time and
> thus create problems now. Just think about the maxphys values....
> 63 kB on x86 does not even allow to write a single BluRay disk sector
> with a single transfer.
Bloating kernel memory is not the right answer.  Solaris comes with a 
quite effective POSIX threads library (standard since 1996) which 
makes it easy to quickly shuttle the data into a buffer in your own 
application.  One thread deals with the network while the other thread 
deals with the device.  I imagine that this is what the supreme 
mbuffer program is doing.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Andrew Gabriel

2008-Nov-14 18:43 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Joerg Schilling wrote:> Andrew Gabriel <Andrew.Gabriel at Sun.COM> wrote:
>> Andrew Gabriel wrote:
>>> Interesting idea, but for 7200 RPM disks (and a 1Gb ethernet link),
I
>>> need a 250GB buffer (enough to buffer 4-5 seconds worth of data).
That''s
>>> many orders of magnitude bigger than SO_RCVBUF can go.
>> No -- that''s wrong -- should read 250MB buffer!
>> Still some orders of magnitude bigger than SO_RCVBUF can go.
> 
> It''s affordable e.g. on a X4540 with 64 GB of RAM.
I guess the architectures with limited 256MB and 512MB kernel address 
space are mostly retired now.
> ZFS started with constraints that could not be made true in 2001.
> 
> On my first Sun at home (a Sun 2/50 with 1 MB of RAM) in 1986, I could
> set the socket buffer size to 63 kB. 63kB : 1 MB is the same ratio
> as 256 MB : 4 GB.
> 
> BTW: a lot of numbers in Solaris did not grow since a long time and
> thus create problems now. Just think about the maxphys values....
> 63 kB on x86 does not even allow to write a single BluRay disk sector
> with a single transfer.
I have put together a simple set of figures I use to compare how disks 
and systems have changed over the 25 year life of ufs/ffs, which I 
sometimes use when I give ZFS presentations...

           25 years ago             Now        factor
           ------------             ---        ------
Disk RPM         3,600          10,000            x3
Disk IOPS           30             300           x10
Disk Data rate       0.96MB/s       75MB/s       x80
Capacity           100MB             1TB     x10,000
System MIPS          4         400,000      x100,000

-- 
Andrew

Thomas Maier-Komor

2008-Nov-14 18:46 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

----- original Nachricht --------

Betreff: Re: [zfs-discuss] ''zfs recv'' is very slow
Gesendet: Fr, 14. Nov 2008
Von: Bob Friesenhahn<bfriesen at simple.dallas.tx.us>
> On Fri, 14 Nov 2008, Joerg Schilling wrote:
> >
> > On my first Sun at home (a Sun 2/50 with 1 MB of RAM) in 1986, I could
> > set the socket buffer size to 63 kB. 63kB : 1 MB is the same ratio
> > as 256 MB : 4 GB.
> >
> > BTW: a lot of numbers in Solaris did not grow since a long time and
> > thus create problems now. Just think about the maxphys values....
> > 63 kB on x86 does not even allow to write a single BluRay disk sector
> > with a single transfer.
> 
> Bloating kernel memory is not the right answer.  Solaris comes with a 
> quite effective POSIX threads library (standard since 1996) which 
> makes it easy to quickly shuttle the data into a buffer in your own 
> application.  One thread deals with the network while the other thread 
> deals with the device.  I imagine that this is what the supreme 
> mbuffer program is doing.
> 
> Bob
Basically, mbuffer just does this - but it additionally has a whole bunch of
extra functionality. At least there are people who use it to lengthen the live
of their tape drives with the high/low watermark feature...

Thomas
--- original Nachricht Ende ----

Casper.Dik at Sun.COM

2008-Nov-14 19:16 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

>BTW: a lot of numbers in Solaris did not grow since a long time and
>thus create problems now. Just think about the maxphys values....
>63 kB on x86 does not even allow to write a single BluRay disk sector
>with a single transfer.

Any "fixed value" will soon be too small (think about ufs_throttles,
socket buffers, etc)

I''m not sure, however, that making a bigger socket buffer will help all
that much; it''s somewhat wrong to give all the kernel to the data even 
though we know that it won''t be all in flight.

But zfs could certainly use bigger buffers; just like mbuffer, I also
wrote my own "pipebuffer" which does pretty much the same.

Casper

Joerg Schilling

2008-Nov-14 19:24 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Andrew Gabriel <Andrew.Gabriel at Sun.COM> wrote:
> I have put together a simple set of figures I use to compare how disks 
> and systems have changed over the 25 year life of ufs/ffs, which I 
> sometimes use when I give ZFS presentations...
>
>            25 years ago             Now        factor
>            ------------             ---        ------
> Disk RPM         3,600          10,000            x3
> Disk IOPS           30             300           x10
> Disk Data rate       0.96MB/s       75MB/s       x80
> Capacity           100MB             1TB     x10,000
> System MIPS          4         400,000      x100,000
The best rate I did see in 1985 was 800 kB/s (w. linear reads)
now I see 120 MB/s this is more than x100 ;-)

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Bob Friesenhahn

2008-Nov-14 19:27 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

On Fri, 14 Nov 2008, Joerg Schilling wrote:>>            ------------             ---        ------
>> Disk RPM         3,600          10,000            x3
>
> The best rate I did see in 1985 was 800 kB/s (w. linear reads)
> now I see 120 MB/s this is more than x100 ;-)
Yes.  And how that SSDs are entering the market, the disk RPM has 
dropped down to zero.  10,000 --> 0.  I am not sure how to interpret 
that.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Andrew Gabriel

2008-Nov-14 19:28 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Casper.Dik at Sun.COM wrote:> But zfs could certainly use bigger buffers; just like mbuffer, I also
> wrote my own "pipebuffer" which does pretty much the same.
You too? (My "buffer" program which I used to diagnose the problem is 
attached to the bugid ;-)

I know Chris Gerhard wrote one too.

Seems like there''s a strong case to have such a program bundled in
Solaris.

-- 
Andrew

Joerg Schilling

2008-Nov-14 19:30 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:
> On Fri, 14 Nov 2008, Joerg Schilling wrote:
> >>            ------------             ---        ------
> >> Disk RPM         3,600          10,000            x3
> >
> > The best rate I did see in 1985 was 800 kB/s (w. linear reads)
> > now I see 120 MB/s this is more than x100 ;-)
>
> Yes.  And how that SSDs are entering the market, the disk RPM has 
> dropped down to zero.  10,000 --> 0.  I am not sure how to interpret 
> that.
My tests on a OCZ SSD show a "transfer latence" of ~ 0.1 ms,
even SSDs have something similar to "seek times".

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Casper.Dik at Sun.COM

2008-Nov-14 19:33 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

>On Fri, 14 Nov 2008, Joerg Schilling wrote:
>>>            ------------             ---        ------
>>> Disk RPM         3,600          10,000            x3
>>
>> The best rate I did see in 1985 was 800 kB/s (w. linear reads)
>> now I see 120 MB/s this is more than x100 ;-)
>
>Yes.  And how that SSDs are entering the market, the disk RPM has 
>dropped down to zero.  10,000 --> 0.  I am not sure how to interpret 
>that.
Not zero, infinite RPMs.  (Latency is 1/RPM and when RPM becomes infinite,
then latency is 0)

Casper

Andrew Gabriel

2008-Nov-14 20:26 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Bob Friesenhahn wrote:> On Fri, 14 Nov 2008, Joerg Schilling wrote:
>>>            ------------             ---        ------
>>> Disk RPM         3,600          10,000            x3
>>
>> The best rate I did see in 1985 was 800 kB/s (w. linear reads)
>> now I see 120 MB/s this is more than x100 ;-)
> 
> Yes.  And how that SSDs are entering the market, the disk RPM has 
> dropped down to zero.  10,000 --> 0.  I am not sure how to interpret
that.
I don''t have a data rate for SSD''s, but a hard limit is going
to be the
3Gb/s SATA/SAS bus which is going to be around 300MB/s. I''ve no idea
how
close they come to this in practice.

For IOPS (Input/Output operations per second), the figures are 
mind-blowing...

15K SAS drive        Enterprise SSD
-------------        --------------
180 Write IOPS       7,000 Write IOPS
320 Read IOPS        35,000 Read IOPS

I don''t have figures for a SATA drive, but they''re lower than
SAS.
The SSD figures exceed the capabilities of some disk controllers, which 
can make them difficult to measure. The read IOPS figure is pretty close 
to being limited by the SAS bus.

-- 
Andrew

Thomas Maier-Komor

2008-Nov-15 12:54 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

> 
> Seems like there''s a strong case to have such a program bundled in
Solaris.
> 
I think, the idea of having a separate configurable buffer program with a high
feature set fits into UNIX philosophy of having small programs that can be used
as building blocks to solve larger problems.

mbuffer is already bundled with several Linux distros. And that is also the
reason its feature set expanded over time. In the beginning there
wasn''t even support for network transfers.

Today mbuffer supports direct transfer to multiple receivers, data transfer rate
limitation, high/low water mark algorithm, on the fly md5 calculation, multi
volume tape access, usage of sendfile, and has a configurable buffer
size/layout.

So ZFS send/receive is just another use case for this tool.

- Thomas

Joerg Schilling

2008-Nov-18 11:02 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Casper.Dik at Sun.COM wrote:
>
>
> >BTW: a lot of numbers in Solaris did not grow since a long time and
> >thus create problems now. Just think about the maxphys values....
> >63 kB on x86 does not even allow to write a single BluRay disk sector
> >with a single transfer.
>
>
> Any "fixed value" will soon be too small (think about
ufs_throttles,
> socket buffers, etc)
The maxphys limit of 56kB or 63kB in the early 1980s was a result of the 
fact that many DMA controllers could only handle 16 Bit counters and because
(on a multi-tasking environment) a typical DMA speed of 600 kB/s would result
in ~ 0.1 seconds of wait time for other users.

In 1980, Disk sector sizes have been 512 bytes.

In 1995, the DVD was introduced with 32 kB sector size.

Now we have BluRay disks with 64 kB sector size.

On many systems, cdrecord cannot write a single BLuRay sector in a single
SCSI transfer. This is bad.

With today''s constraints, I would expect to see typical maxphys values
of ~ 2 MB.

Linux typically allows this but Solaris does not. In addition, the ioctl 
DKIOCINFO in many cases returns wrong (too big) numbers for maxphys which causes
cdrecord to fail. Solaris needs to aproach today''s reality with some
parameters.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       schilling at fokus.fraunhofer.de     (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

Ian Collins

2008-Nov-24 09:46 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Andrew Gabriel wrote:> Ian Collins wrote:
>>
>>
>> I don''t see the 5 second bursty behaviour described in the bug
>> report. It''s more like 5 second interval gaps in the network
traffic
>> while the
>> data is written to disk.
>
> That is exactly the issue. When the zfs recv data has been written,
> zfs recv starts reading the network again, but there''s only a tiny
> amount of data buffered in the TCP/IP stack, so it has to wait for the
> network to heave more data across. In effect, it''s a single
buffered
> copy. The addition of a buffer program turns it into a double-buffered
> (or cyclic buffered) copy, with the disks running flat out
> continuously, and the network streaming data across continuously at
> the disk platter speed.
>
> What are your theoretical max speeds for network and disk i/o?
> Taking the smaller of these two, are you seeing the sustained
> send/recv performance match that (excluding the ~1MB/sec periods which
> is some other problem)?
>I''ve just finished a small application to couple zfs_send and
zfs_receive through a socket to remove ssh from the equation and the
speed up is better than 2x.  I have a small (140K) buffer on the sending
side to ensure the minimum number of sent packets

The times I get for 3.1GB of data (b101 ISO and some smaller files) to a
modest mirror at the receive end are:

1m36s for cp over NFS,
2m48s for zfs send though ssh and
1m14s through a socket.

-- 
Ian.

Andrew Gabriel

2008-Nov-24 10:08 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Ian Collins wrote:> I''ve just finished a small application to couple zfs_send and
> zfs_receive through a socket to remove ssh from the equation and the
> speed up is better than 2x.  I have a small (140K) buffer on the sending
> side to ensure the minimum number of sent packets
> 
> The times I get for 3.1GB of data (b101 ISO and some smaller files) to a
> modest mirror at the receive end are:
> 
> 1m36s for cp over NFS,
> 2m48s for zfs send though ssh and
> 1m14s through a socket.
So the best speed is equivalent to 42MB/s.

Can''t tell from this what the limiting factor is (might be the disks).

It would be interesting to try putting a buffer (5 x 42MB = 210MB 
initial stab) at the recv side and see if you get any improvement.

-- 
Andrew

Ian Collins

2008-Nov-24 17:58 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Andrew Gabriel wrote:> Ian Collins wrote:
>> I''ve just finished a small application to couple zfs_send and
>> zfs_receive through a socket to remove ssh from the equation and the
>> speed up is better than 2x.  I have a small (140K) buffer on the
sending
>> side to ensure the minimum number of sent packets
>>
>> The times I get for 3.1GB of data (b101 ISO and some smaller files) to
a
>> modest mirror at the receive end are:
>>
>> 1m36s for cp over NFS,
>> 2m48s for zfs send though ssh and
>> 1m14s through a socket.
>
> So the best speed is equivalent to 42MB/s.
>
> Can''t tell from this what the limiting factor is (might be the
disks).
>It probably is.
> It would be interesting to try putting a buffer (5 x 42MB = 210MB
> initial stab) at the recv side and see if you get any improvement.
>
That''s my next test....

-- 
Ian.

Ian Collins

2008-Dec-06 08:44 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Ian Collins wrote:> Andrew Gabriel wrote:
>   
>> Ian Collins wrote:
>>     
>>> I''ve just finished a small application to couple zfs_send
and
>>> zfs_receive through a socket to remove ssh from the equation and
the
>>> speed up is better than 2x.  I have a small (140K) buffer on the
sending
>>> side to ensure the minimum number of sent packets
>>>
>>> The times I get for 3.1GB of data (b101 ISO and some smaller files)
to a
>>> modest mirror at the receive end are:
>>>
>>> 1m36s for cp over NFS,
>>> 2m48s for zfs send though ssh and
>>> 1m14s through a socket.
>>>       
>> So the best speed is equivalent to 42MB/s.
>>
>> Can''t tell from this what the limiting factor is (might be the
disks).
>>     
> It probably is.
>
>   
>> It would be interesting to try putting a buffer (5 x 42MB = 210MB
>> initial stab) at the recv side and see if you get any improvement.
>>     It took a while...

I was able to get about 47MB/s with a 256MB circular input buffer. I
think that''s about as fast it can go, the buffer fills so receive
processing is the bottleneck.  Bonnie++ shows the pool (a mirror) block
write speed is 58MB/s.

When I reverse the transfer to the faster box, the rate drops to 35MB/s
with neither the send nor receive buffer filling.  So send processing
appears to be the limit in this case.

-- 
Ian.

Ian Collins

2008-Dec-06 19:40 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Richard Elling wrote:> Ian Collins wrote:
>> Ian Collins wrote:  
>>> Andrew Gabriel wrote:  
>>>> Ian Collins wrote:  
>>>>> I''ve just finished a small application to couple
zfs_send and
>>>>> zfs_receive through a socket to remove ssh from the
equation and the
>>>>> speed up is better than 2x.  I have a small (140K) buffer
on the
>>>>> sending
>>>>> side to ensure the minimum number of sent packets
>>>>>
>>>>> The times I get for 3.1GB of data (b101 ISO and some
smaller
>>>>> files) to a
>>>>> modest mirror at the receive end are:
>>>>>
>>>>> 1m36s for cp over NFS,
>>>>> 2m48s for zfs send though ssh and
>>>>> 1m14s through a socket.
>>>>>               
>>>> So the best speed is equivalent to 42MB/s. 
>>>> It would be interesting to try putting a buffer (5 x 42MB =
210MB
>>>> initial stab) at the recv side and see if you get any
improvement.
>>>>           
>> It took a while...
>>
>> I was able to get about 47MB/s with a 256MB circular input buffer. I
>> think that''s about as fast it can go, the buffer fills so
receive
>> processing is the bottleneck.  Bonnie++ shows the pool (a mirror) block
>> write speed is 58MB/s.
>>
>> When I reverse the transfer to the faster box, the rate drops to 35MB/s
>> with neither the send nor receive buffer filling.  So send processing
>> appears to be the limit in this case.  
> Those rates are what I would expect writing to a single disk.
> How is the pool configured?
>The "slow" system has a single mirror pool of two SATA drives, the
faster one a stripe of 4 mirrors and an IDE SD boot drive.

ZFS send though ssh from the slow to the fast box takes 189 seconds, the
direct socket connection send takes 82 seconds.

-- 
Ian.

Brent Jones

2009-Jan-07 00:04 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

On Sat, Dec 6, 2008 at 11:40 AM, Ian Collins <ian at ianshome.com>
wrote:> Richard Elling wrote:
>> Ian Collins wrote:
>>> Ian Collins wrote:
>>>> Andrew Gabriel wrote:
>>>>> Ian Collins wrote:
>>>>>> I''ve just finished a small application to
couple zfs_send and
>>>>>> zfs_receive through a socket to remove ssh from the
equation and the
>>>>>> speed up is better than 2x.  I have a small (140K)
buffer on the
>>>>>> sending
>>>>>> side to ensure the minimum number of sent packets
>>>>>>
>>>>>> The times I get for 3.1GB of data (b101 ISO and some
smaller
>>>>>> files) to a
>>>>>> modest mirror at the receive end are:
>>>>>>
>>>>>> 1m36s for cp over NFS,
>>>>>> 2m48s for zfs send though ssh and
>>>>>> 1m14s through a socket.
>>>>>>
>>>>> So the best speed is equivalent to 42MB/s.
>>>>> It would be interesting to try putting a buffer (5 x 42MB =
210MB
>>>>> initial stab) at the recv side and see if you get any
improvement.
>>>>>
>>> It took a while...
>>>
>>> I was able to get about 47MB/s with a 256MB circular input buffer.
I
>>> think that''s about as fast it can go, the buffer fills so
receive
>>> processing is the bottleneck.  Bonnie++ shows the pool (a mirror)
block
>>> write speed is 58MB/s.
>>>
>>> When I reverse the transfer to the faster box, the rate drops to
35MB/s
>>> with neither the send nor receive buffer filling.  So send
processing
>>> appears to be the limit in this case.
>> Those rates are what I would expect writing to a single disk.
>> How is the pool configured?
>>
> The "slow" system has a single mirror pool of two SATA drives,
the
> faster one a stripe of 4 mirrors and an IDE SD boot drive.
>
> ZFS send though ssh from the slow to the fast box takes 189 seconds, the
> direct socket connection send takes 82 seconds.
>
> --
> Ian.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
Reviving an old discussion, but has the core issue been addressed in
regards to zfs send/recv performance issues? I''m not able to find any
new bug reports on bugs.opensolaris.org related to this, but my search
kung-fu may be weak.

Using mbuffer can speed it up dramatically, but this seems like a hack
without addressing a real problem with zfs send/recv.
Trying to send any meaningful sized snapshots from say an X4540 takes
up to 24 hours, for as little as 300GB changerate.



-- 
Brent Jones
brent at servuhome.net

Carsten Aulbert

2009-Jan-07 07:31 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Hi,

Brent Jones wrote:> 
> Using mbuffer can speed it up dramatically, but this seems like a hack
> without addressing a real problem with zfs send/recv.
> Trying to send any meaningful sized snapshots from say an X4540 takes
> up to 24 hours, for as little as 300GB changerate.
I have not found a solution yet also. But it seems to depend highly on
the distribution of file sizes, number of files per directory or
whatever. The last tests I made showed still more than 50 hours for 700
GB and ~45 hours for 5 TB (both tests were null tests where zfs send
wrote to /dev/null).

Cheers from a still puzzled Carsten

Andrew Gabriel

2009-Jan-07 08:36 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Brent Jones wrote:
> Reviving an old discussion, but has the core issue been addressed in
> regards to zfs send/recv performance issues? I''m not able to find
any
> new bug reports on bugs.opensolaris.org related to this, but my search
> kung-fu may be weak.
I raised:
CR 6729347 Poor zfs receive performance across networks
(Seems to still be in the Dispatched state nearly half a year later.)

This relates mainly to full archives, and is most obvious when
the disk throughput is the same order of magnitude as the network
throughput. (It becomes less obvious if one is significantly
different from the other, either way around.)

There appears to be an additional problem for incrementals, which
spend long periods sending almost no data at all (I presume this
is when zfs send is searching for changed blocks to send).
I don''t know off-hand of a bugid for this.
> Using mbuffer can speed it up dramatically, but this seems like a hack
> without addressing a real problem with zfs send/recv.
I don''t think it''s a hack, but something along these lines
should
be more properly integrated into the zfs receive command or
documented.
> Trying to send any meaningful sized snapshots from say an X4540 takes
> up to 24 hours, for as little as 300GB changerate.
Are those incrementals from a much larger filesystem?
If so, that''s probably mainly the the other problem.

-- 
Andrew

Brent Jones

2009-Jan-07 19:08 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

On Wed, Jan 7, 2009 at 12:36 AM, Andrew Gabriel <Andrew.Gabriel at
sun.com> wrote:> Brent Jones wrote:
>
>> Reviving an old discussion, but has the core issue been addressed in
>> regards to zfs send/recv performance issues? I''m not able to
find any
>> new bug reports on bugs.opensolaris.org related to this, but my search
>> kung-fu may be weak.
>
> I raised:
> CR 6729347 Poor zfs receive performance across networks
> (Seems to still be in the Dispatched state nearly half a year later.)
>
> This relates mainly to full archives, and is most obvious when
> the disk throughput is the same order of magnitude as the network
> throughput. (It becomes less obvious if one is significantly
> different from the other, either way around.)
>
> There appears to be an additional problem for incrementals, which
> spend long periods sending almost no data at all (I presume this
> is when zfs send is searching for changed blocks to send).
> I don''t know off-hand of a bugid for this.
>
>> Using mbuffer can speed it up dramatically, but this seems like a hack
>> without addressing a real problem with zfs send/recv.
>
> I don''t think it''s a hack, but something along these
lines should
> be more properly integrated into the zfs receive command or
> documented.
>
>> Trying to send any meaningful sized snapshots from say an X4540 takes
>> up to 24 hours, for as little as 300GB changerate.
>
> Are those incrementals from a much larger filesystem?
> If so, that''s probably mainly the the other problem.
Yah, the incrementals are from a 30TB volume, with about 1TB used.
Watching iostat on each side during the incremental sends, the sender
side is hardly doing anything, maybe 50iops read, and that could be
from other machines accessing it, really light load.
The receiving side however, for about 3 minutes it is peaking around
1500 iops reads, and no writes.
It will do that for 3-5 minutes, then it will calm down and only read
sporadically, and write about 1MB/sec.
Using Mbuffer can get the writes to spike to 20-30MB/sec, but the
initial massive reads still remain.

I have yet to devise a script that starts Mbuffer zfs recv on the
receiving side with proper parameters, then start an Mbuffer ZFS send
on the sending side, but I may work on one later this week.
I''d like the snapshots to be sent every 15 minutes, just to keep the
amount of change that needs to be sent as low as possible.

Not sure if its worth opening a case with Sun since we have a support
contract...
>
> --
> Andrew
>


-- 
Brent Jones
brent at servuhome.net

Mike Futerko

2009-Jan-08 11:05 UTC

head link

[zfs-discuss] ''zfs recv'' is very slow

Hello
> Yah, the incrementals are from a 30TB volume, with about 1TB used.
> Watching iostat on each side during the incremental sends, the sender
> side is hardly doing anything, maybe 50iops read, and that could be
> from other machines accessing it, really light load.
> The receiving side however, for about 3 minutes it is peaking around
> 1500 iops reads, and no writes.

Have you tries truss on both sides? From my experiments I found that
sending side on beginning of the transfer mostly sleeps while receiving
lists all available snapshots on the syncing file system. So if you have
a lot of snapshots on receiving side (as in my case) the process will
take long time sending no data but listing the snapshots. The worst case
is if you use recursive sync of hundreds of file system with hundreds of
snapshots on each. I''m sure this must be optimized somehow otherwise
it''s almost useless in practice.


Regards
Mike

Apparently Analagous Threads

Search for more maybe matching threads

zfs discuss - Nov 2008 - ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] mbuffer WAS''zfs recv'' is very slow

[zfs-discuss] mbuffer WAS''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

[zfs-discuss] ''zfs recv'' is very slow

Apparently Analagous Threads