thr3ads.net - zfs discuss - [zfs-discuss] Improving zfs send performance [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Carsten Aulbert

2008-Oct-13 15:14 UTC

[zfs-discuss] Improving zfs send performance

Hi all,

although I''m running all this in a Sol10u5 X4500, I hope I may ask this
question here. If not, please let me know where to head to.

We are running several X4500 with only 3 raidz2 zpools since we want
quite a bit of storage space[*], but the performance we get when using
zfs send is sometimes really lousy. Of course this depends what''s in
the
file system, but when doing a few backups today I have seen the following:

receiving full stream of atlashome/XXX at 2008-10-13T115649 into
atlashome/BACKUP/XXX at 2008-10-13T115649
in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s

So, a mere 15 GB were transferred in 45 minutes, another user''s home
which is quite large (7TB) took more than 42 hours to be transferred.
Since all this is going a 10 Gb/s network and the CPUs are all idle I
would really like to know why

* zfs send is so slow and
* how can I improve the speed?

Thanks a lot for any hint

Cheers

Carsten

[*] we have some quite a few tests with more zpools but were not able to
improve the speeds substantially. For this particular bad file system I
still need to histogram the file sizes.

-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31

Darren J Moffat

2008-Oct-13 15:25 UTC

head link

[zfs-discuss] Improving zfs send performance

Carsten Aulbert wrote:> Hi all,
> 
> although I''m running all this in a Sol10u5 X4500, I hope I may ask
this
> question here. If not, please let me know where to head to.
> 
> We are running several X4500 with only 3 raidz2 zpools since we want
> quite a bit of storage space[*], but the performance we get when using
> zfs send is sometimes really lousy. Of course this depends what''s
in the
> file system, but when doing a few backups today I have seen the following:
> 
> receiving full stream of atlashome/XXX at 2008-10-13T115649 into
> atlashome/BACKUP/XXX at 2008-10-13T115649
> in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
> summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s
> 
> So, a mere 15 GB were transferred in 45 minutes, another user''s
home
> which is quite large (7TB) took more than 42 hours to be transferred.
> Since all this is going a 10 Gb/s network and the CPUs are all idle I
> would really like to know why
What are you using to transfer the data over the network ?

-- 
Darren J Moffat

Thomas Maier-Komor

2008-Oct-13 15:47 UTC

head link

[zfs-discuss] Improving zfs send performance

Carsten Aulbert schrieb:> Hi all,
> 
> although I''m running all this in a Sol10u5 X4500, I hope I may ask
this
> question here. If not, please let me know where to head to.
> 
> We are running several X4500 with only 3 raidz2 zpools since we want
> quite a bit of storage space[*], but the performance we get when using
> zfs send is sometimes really lousy. Of course this depends what''s
in the
> file system, but when doing a few backups today I have seen the following:
> 
> receiving full stream of atlashome/XXX at 2008-10-13T115649 into
> atlashome/BACKUP/XXX at 2008-10-13T115649
> in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
> summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s
> 
> So, a mere 15 GB were transferred in 45 minutes, another user''s
home
> which is quite large (7TB) took more than 42 hours to be transferred.
> Since all this is going a 10 Gb/s network and the CPUs are all idle I
> would really like to know why
> 
> * zfs send is so slow and
> * how can I improve the speed?
> 
> Thanks a lot for any hint
> 
> Cheers
> 
> Carsten
> 
> [*] we have some quite a few tests with more zpools but were not able to
> improve the speeds substantially. For this particular bad file system I
> still need to histogram the file sizes.
> 

Carsten,

the summary looks like you are using mbuffer. Can you elaborate on what
options you are passing to mbuffer? Maybe changing the blocksize to be
consistent with the recordsize of the zpool could improve performance.
Is the buffer running full or is it empty most of the time? Are you sure
that the network connection is 10Gb/s all the way through from machine
to machine?

- Thomas

Carsten Aulbert

2008-Oct-13 18:16 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi

Darren J Moffat wrote:
> 
> What are you using to transfer the data over the network ?
> 
Initially just plain ssh which was way to slow, now we use mbuffer on
both ends and socket transfer the data over via socat - I know that
mbuffer already allows this, but in a few tests socat seemed to be faster.

Sorry for not writing this into the first email.

Cheers

Carsten

Carsten Aulbert

2008-Oct-13 18:20 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi Thomas,

Thomas Maier-Komor wrote:
> 
> Carsten,
> 
> the summary looks like you are using mbuffer. Can you elaborate on what
> options you are passing to mbuffer? Maybe changing the blocksize to be
> consistent with the recordsize of the zpool could improve performance.
> Is the buffer running full or is it empty most of the time? Are you sure
> that the network connection is 10Gb/s all the way through from machine
> to machine?
Well spotted :)

right now plain mbuffer with plenty of buffer (-m 2048M) on both ends
and I have not seen any buffer exceeding the 10% watermark level. The
network connection are via Neterion XFrame II Sun Fire NICs then via CX4
cables to our core switch where both boxes are directly connected
(WovenSystmes EFX1000). netperf tells me that the TCP performance is
close to 7.5 GBit/s duplex and if I use

cat /dev/zero | mbuffer | socat ---> socat | mbuffer > /dev/null

I easily see speeds of about 350-400 MB/s so I think the network is fine.

Cheers

Carsten

Thomas Maier-Komor

2008-Oct-14 07:11 UTC

head link

[zfs-discuss] Improving zfs send performance

Carsten Aulbert schrieb:> Hi Thomas,
> 
> Thomas Maier-Komor wrote:
> 
>> Carsten,
>>
>> the summary looks like you are using mbuffer. Can you elaborate on what
>> options you are passing to mbuffer? Maybe changing the blocksize to be
>> consistent with the recordsize of the zpool could improve performance.
>> Is the buffer running full or is it empty most of the time? Are you
sure
>> that the network connection is 10Gb/s all the way through from machine
>> to machine?
> 
> Well spotted :)
> 
> right now plain mbuffer with plenty of buffer (-m 2048M) on both ends
> and I have not seen any buffer exceeding the 10% watermark level. The
> network connection are via Neterion XFrame II Sun Fire NICs then via CX4
> cables to our core switch where both boxes are directly connected
> (WovenSystmes EFX1000). netperf tells me that the TCP performance is
> close to 7.5 GBit/s duplex and if I use
> 
> cat /dev/zero | mbuffer | socat ---> socat | mbuffer > /dev/null
> 
> I easily see speeds of about 350-400 MB/s so I think the network is fine.
> 
> Cheers
> 
> Carsten
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I don''t know socat or what benefit it gives you, but have you tried
using mbuffer to send and receive directly (options -I and -O)?
Additionally, try to set the block size of mbuffer to the recordsize of
zfs (usually 128k):
receiver$ mbuffer -I sender:10000 -s 128k -m 2048M | zfs receive
sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:10000

As transmitting from /dev/zero to /dev/null is at a rate of 350MB/s, I
guess, you are really hitting the maximum speed of your zpool. From my
understanding, I''d guess sending is always slower than receiving,
because reads are random and writes are sequential. So it should be
quite normal that mbuffer''s buffer doesn''t really see a lot of
usage.

Cheers,
Thomas

Carsten Aulbert

2008-Oct-14 07:42 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi again,

Thomas Maier-Komor wrote:> Carsten Aulbert schrieb:
>> Hi Thomas,
> I don''t know socat or what benefit it gives you, but have you
tried
> using mbuffer to send and receive directly (options -I and -O)?
I thought we tried that in the past and with socat it seemed faster, but
I just made a brief test and I got (/dev/zero -> remote /dev/null) 330
MB/s with mbuffer+socat and 430MB/s with mbuffer alone.
> Additionally, try to set the block size of mbuffer to the recordsize of
> zfs (usually 128k):
> receiver$ mbuffer -I sender:10000 -s 128k -m 2048M | zfs receive
> sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:10000
We are using 32k since many of our user use tiny files (and then I need
to reduce the buffer size because of this ''funny'' error):

mbuffer: fatal: Cannot address so much memory
(32768*65536=2147483648>1544040742911).

Does this qualify for a bug report?

Thanks for the hint of looking into this again!

Cheers

Carsten

Thomas Maier-Komor

2008-Oct-14 08:16 UTC

head link

[zfs-discuss] Improving zfs send performance

Carsten Aulbert schrieb:> Hi again,
> 
> Thomas Maier-Komor wrote:
>> Carsten Aulbert schrieb:
>>> Hi Thomas,
>> I don''t know socat or what benefit it gives you, but have you
tried
>> using mbuffer to send and receive directly (options -I and -O)?
> 
> I thought we tried that in the past and with socat it seemed faster, but
> I just made a brief test and I got (/dev/zero -> remote /dev/null) 330
> MB/s with mbuffer+socat and 430MB/s with mbuffer alone.
> 
>> Additionally, try to set the block size of mbuffer to the recordsize of
>> zfs (usually 128k):
>> receiver$ mbuffer -I sender:10000 -s 128k -m 2048M | zfs receive
>> sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:10000
> 
> We are using 32k since many of our user use tiny files (and then I need
> to reduce the buffer size because of this ''funny'' error):
> 
> mbuffer: fatal: Cannot address so much memory
> (32768*65536=2147483648>1544040742911).
> 
> Does this qualify for a bug report?
> 
> Thanks for the hint of looking into this again!
> 
> Cheers
> 
> Carsten
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Yes this qualifies for a bug report. As a workaround for now, you can
compile in 64 bit mode.
I.e.:
$ ./configure CFLAGS="-g -O -m64"
$ make && make install

This works for Sun Studio 12 and gcc. For older version of Sun Studio,
you need to pass -xarch=v9 instead of -m64.

I am planning to release an updated version mbuffer this week. I''ll
include a patch for this issue.

Cheers,
Thomas

Ross

2008-Oct-15 09:51 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi,

I''m just doing my first proper send/receive over the network and
I''m getting just 9.4MB/s over a gigabit link.  Would you be able to
provide an example of how to use mbuffer / socat with ZFS for a Solaris
beginner?

thanks,

Ross
--
This message posted from opensolaris.org

Thomas Maier-Komor

2008-Oct-15 10:08 UTC

head link

[zfs-discuss] Improving zfs send performance

Ross schrieb:> Hi,
> 
> I''m just doing my first proper send/receive over the network and
I''m getting just 9.4MB/s over a gigabit link.  Would you be able to
provide an example of how to use mbuffer / socat with ZFS for a Solaris
beginner?
> 
> thanks,
> 
> Ross
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
receiver> mbuffer -I sender:10000 -s 128k -m 512M | zfs receive

sender> zfs send mypool/myfilesystem at mysnapshot | mbuffer -s 128k -m
512M -O receiver:10000

BTW: I release a new version of mbuffer today.

HTH,
Thomas

Ross Smith

2008-Oct-15 11:22 UTC

head link

[zfs-discuss] Improving zfs send performance

Thanks, that got it working.  I''m still only getting 10MB/s, so
it''s not solved my problem - I''ve still got a bottleneck
somewhere, but mbuffer is a huge improvement over standard zfs send / receive. 
It makes such a difference when you can actually see what''s going on.


----------------------------------------> Date: Wed, 15 Oct 2008 12:08:14 +0200
> From: thomas at maier-komor.de
> To: myxiplx at hotmail.com; zfs-discuss at opensolaris.org
> Subject: Re: [zfs-discuss] Improving zfs send performance
> 
> Ross schrieb:
>> Hi,
>> 
>> I''m just doing my first proper send/receive over the network
and I''m getting just 9.4MB/s over a gigabit link.  Would you be able to
provide an example of how to use mbuffer / socat with ZFS for a Solaris
beginner?
>> 
>> thanks,
>> 
>> Ross
>> --
>> This message posted from opensolaris.org
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> receiver> mbuffer -I sender:10000 -s 128k -m 512M | zfs receive
> 
> sender> zfs send mypool/myfilesystem at mysnapshot | mbuffer -s 128k -m
> 512M -O receiver:10000
> 
> BTW: I release a new version of mbuffer today.
> 
> HTH,
> Thomas
_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/

Carsten Aulbert

2008-Oct-15 11:37 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi Ross

Ross Smith wrote:> Thanks, that got it working.  I''m still only getting 10MB/s, so
it''s not solved my problem - I''ve still got a bottleneck
somewhere, but mbuffer is a huge improvement over standard zfs send / receive. 
It makes such a difference when you can actually see what''s going on.
I''m currently trying to investigate this a bit. One of our
user''s home
directories is extremely slow to ''zfs send''. It started
yesterday
afternoon at about 1600+0200 and is still running and has only copied
less than 50% of the whole tree:

On the receiving side zfs get tells me:

atlashome/BACKUP/XXX  used           193G                   -
atlashome/BACKUP/XXX  available      17.2T                  -
atlashome/BACKUP/XXX  referenced     193G                   -
atlashome/BACKUP/XXX  compressratio  1.81x                  -

So close 350 GB are transferred and about 500 GB to go.

More later.

Carsten

Thomas Maier-Komor

2008-Oct-15 11:52 UTC

head link

[zfs-discuss] Improving zfs send performance

Thomas Maier-Komor schrieb:> BTW: I release a new version of mbuffer today.
WARNING!!!

Sorry people!!!

The latest version of mbuffer has a regression that can CORRUPT output
if stdout is used. Please fall back to the last version. A fix is on the
way...

- Thomas

Ross Smith

2008-Oct-15 11:55 UTC

head link

[zfs-discuss] Improving zfs send performance

I''m using 2008-05-07 (latest stable), am I right in assuming that one
is ok?

----------------------------------------> Date: Wed, 15 Oct 2008 13:52:42 +0200
> From: thomas at maier-komor.de
> To: myxiplx at hotmail.com; zfs-discuss at opensolaris.org
> Subject: Re: [zfs-discuss] Improving zfs send performance
> 
> Thomas Maier-Komor schrieb:
>> BTW: I release a new version of mbuffer today.
> 
> WARNING!!!
> 
> Sorry people!!!
> 
> The latest version of mbuffer has a regression that can CORRUPT output
> if stdout is used. Please fall back to the last version. A fix is on the
> way...
> 
> - Thomas
_________________________________________________________________
Discover Bird''s Eye View now with Multimap from Live Search
http://clk.atdmt.com/UKM/go/111354026/direct/01/

Carsten Aulbert

2008-Oct-15 12:41 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi all,

Carsten Aulbert wrote:> More later.
OK, I''m completely puzzled right now (and sorry for this lengthy
email).
 My first (and currently only idea) was that the size of the files is
related to this effect, but that does not seem to be the case:

(1) A 185 GB zfs file system was transferred yesterday with a speed of
about 60 MB/s to two different servers. The histogram of files looks like:

2822 files were investigated, total size is: 185.82 Gbyte

Summary of file sizes [bytes]:
zero:                  2
1 -> 2                 0
2 -> 4                 1
4 -> 8                 3
8 -> 16               26
16 -> 32               8
32 -> 64               6
64 -> 128             29
128 -> 256            11
256 -> 512            13
512 -> 1024           17
1024 -> 2k            33
2k -> 4k              45
4k -> 8k            9044  ************
8k -> 16k             60
16k -> 32k            41
32k -> 64k            19
64k -> 128k           22
128k -> 256k          12
256k -> 512k           5
512k -> 1024k       1218  **
1024k -> 2M        16004  *********************
2M -> 4M           46202
************************************************************
4M -> 8M               0
8M -> 16M              0
16M -> 32M             0
32M -> 64M             0
64M -> 128M            0
128M -> 256M           0
256M -> 512M           0
512M -> 1024M          0
1024M -> 2G            0
2G -> 4G               0
4G -> 8G               0
8G -> 16G              1

(2) Currently a much larger file system is being transferred, the same
script (even the same incarnation, i.e. process) is now running close to
22 hours:

28549 files were investigated, total size is: 646.67 Gbyte

Summary of file sizes [bytes]:
zero:               4954  **************************
1 -> 2                 0
2 -> 4                 0
4 -> 8                 1
8 -> 16                1
16 -> 32               0
32 -> 64               0
64 -> 128              1
128 -> 256             0
256 -> 512             9
512 -> 1024           71
1024 -> 2k             1
2k -> 4k            1095  ******
4k -> 8k            8449  *********************************************
8k -> 16k           2217  ************
16k -> 32k           503  ***
32k -> 64k             1
64k -> 128k            1
128k -> 256k           1
256k -> 512k           0
512k -> 1024k          0
1024k -> 2M            0
2M -> 4M               0
4M -> 8M              16
8M -> 16M              0
16M -> 32M             0
32M -> 64M         11218
************************************************************
64M -> 128M            0
128M -> 256M           0
256M -> 512M           0
512M -> 1024M          0
1024M -> 2G            0
2G -> 4G               5
4G -> 8G               1
8G -> 16G              3
16G -> 32G             1


When watching zpool iostat I get this (30 second average, NOT the first
output):

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
atlashome   3.54T  17.3T    137      0  4.28M      0
  raidz2     833G  6.00T      1      0  30.8K      0
    c0t0d0      -      -      1      0  2.38K      0
    c1t0d0      -      -      1      0  2.18K      0
    c4t0d0      -      -      0      0  1.91K      0
    c6t0d0      -      -      0      0  1.76K      0
    c7t0d0      -      -      0      0  1.77K      0
    c0t1d0      -      -      0      0  1.79K      0
    c1t1d0      -      -      0      0  1.86K      0
    c4t1d0      -      -      0      0  1.97K      0
    c5t1d0      -      -      0      0  2.04K      0
    c6t1d0      -      -      1      0  2.25K      0
    c7t1d0      -      -      1      0  2.31K      0
    c0t2d0      -      -      1      0  2.21K      0
    c1t2d0      -      -      0      0  1.99K      0
    c4t2d0      -      -      0      0  1.99K      0
    c5t2d0      -      -      1      0  2.38K      0
  raidz2    1.29T  5.52T     67      0  2.09M      0
    c6t2d0      -      -     58      0   143K      0
    c7t2d0      -      -     58      0   141K      0
    c0t3d0      -      -     53      0   131K      0
    c1t3d0      -      -     53      0   130K      0
    c4t3d0      -      -     58      0   143K      0
    c5t3d0      -      -     58      0   145K      0
    c6t3d0      -      -     59      0   147K      0
    c7t3d0      -      -     59      0   146K      0
    c0t4d0      -      -     59      0   145K      0
    c1t4d0      -      -     58      0   145K      0
    c4t4d0      -      -     58      0   145K      0
    c6t4d0      -      -     58      0   143K      0
    c7t4d0      -      -     58      0   143K      0
    c0t5d0      -      -     58      0   145K      0
    c1t5d0      -      -     58      0   144K      0
  raidz2    1.43T  5.82T     69      0  2.16M      0
    c4t5d0      -      -     62      0   141K      0
    c5t5d0      -      -     60      0   138K      0
    c6t5d0      -      -     59      0   135K      0
    c7t5d0      -      -     60      0   138K      0
    c0t6d0      -      -     62      0   142K      0
    c1t6d0      -      -     61      0   138K      0
    c4t6d0      -      -     59      0   135K      0
    c5t6d0      -      -     60      0   138K      0
    c6t6d0      -      -     62      0   142K      0
    c7t6d0      -      -     61      0   138K      0
    c0t7d0      -      -     58      0   134K      0
    c1t7d0      -      -     60      0   137K      0
    c4t7d0      -      -     62      0   142K      0
    c5t7d0      -      -     61      0   139K      0
    c6t7d0      -      -     58      0   134K      0
    c7t7d0      -      -     60      0   138K      0
----------  -----  -----  -----  -----  -----  -----

Odd things:

(1) The zpool is not equally striped across the raidz2-pools
(2) The disks should be able to perform much much faster than they
currently output data at, I believe it;s 2008 and not 1995.
(3) The four cores of the X4500 are dying of boredom, i.e. idle >95% all
the time.

Has anyone a good idea, where the bottleneck could be? I''m running out
of ideas.

Cheers

Carsten

Thomas Maier-Komor

2008-Oct-15 12:44 UTC

head link

[zfs-discuss] Improving zfs send performance

Ross Smith schrieb:> I''m using 2008-05-07 (latest stable), am I right in assuming that
one is ok?
> 
> ----------------------------------------
>> Date: Wed, 15 Oct 2008 13:52:42 +0200
>> From: thomas at maier-komor.de
>> To: myxiplx at hotmail.com; zfs-discuss at opensolaris.org
>> Subject: Re: [zfs-discuss] Improving zfs send performance
>>
>> Thomas Maier-Komor schrieb:
>>> BTW: I release a new version of mbuffer today.
>> WARNING!!!
>>
>> Sorry people!!!
>>
>> The latest version of mbuffer has a regression that can CORRUPT output
>> if stdout is used. Please fall back to the last version. A fix is on
the
>> way...
>>
>> - Thomas
> 
> _________________________________________________________________
> Discover Bird''s Eye View now with Multimap from Live Search
> http://clk.atdmt.com/UKM/go/111354026/direct/01/
Yes this one is OK. The regression appeared in 20081014.

- Thomas

Marcelo Leal

2008-Oct-15 18:32 UTC

head link

[zfs-discuss] Improving zfs send performance

Hello all,
 I think in SS 11 should be -xarch=amd64.

 Leal.
--
This message posted from opensolaris.org

Richard Elling

2008-Oct-15 18:46 UTC

head link

[zfs-discuss] Improving zfs send performance

comments below...

Carsten Aulbert wrote:> Hi all,
>
> Carsten Aulbert wrote:
>   
>> More later.
>>     
>
> OK, I''m completely puzzled right now (and sorry for this lengthy
email).
>  My first (and currently only idea) was that the size of the files is
> related to this effect, but that does not seem to be the case:
>
> (1) A 185 GB zfs file system was transferred yesterday with a speed of
> about 60 MB/s to two different servers. The histogram of files looks like:
>
> 2822 files were investigated, total size is: 185.82 Gbyte
>
> Summary of file sizes [bytes]:
> zero:                  2
> 1 -> 2                 0
> 2 -> 4                 1
> 4 -> 8                 3
> 8 -> 16               26
> 16 -> 32               8
> 32 -> 64               6
> 64 -> 128             29
> 128 -> 256            11
> 256 -> 512            13
> 512 -> 1024           17
> 1024 -> 2k            33
> 2k -> 4k              45
> 4k -> 8k            9044  ************
> 8k -> 16k             60
> 16k -> 32k            41
> 32k -> 64k            19
> 64k -> 128k           22
> 128k -> 256k          12
> 256k -> 512k           5
> 512k -> 1024k       1218  **
> 1024k -> 2M        16004  *********************
> 2M -> 4M           46202
> ************************************************************
> 4M -> 8M               0
> 8M -> 16M              0
> 16M -> 32M             0
> 32M -> 64M             0
> 64M -> 128M            0
> 128M -> 256M           0
> 256M -> 512M           0
> 512M -> 1024M          0
> 1024M -> 2G            0
> 2G -> 4G               0
> 4G -> 8G               0
> 8G -> 16G              1
>
> (2) Currently a much larger file system is being transferred, the same
> script (even the same incarnation, i.e. process) is now running close to
> 22 hours:
>
> 28549 files were investigated, total size is: 646.67 Gbyte
>
> Summary of file sizes [bytes]:
> zero:               4954  **************************
> 1 -> 2                 0
> 2 -> 4                 0
> 4 -> 8                 1
> 8 -> 16                1
> 16 -> 32               0
> 32 -> 64               0
> 64 -> 128              1
> 128 -> 256             0
> 256 -> 512             9
> 512 -> 1024           71
> 1024 -> 2k             1
> 2k -> 4k            1095  ******
> 4k -> 8k            8449  *********************************************
> 8k -> 16k           2217  ************
> 16k -> 32k           503  ***
> 32k -> 64k             1
> 64k -> 128k            1
> 128k -> 256k           1
> 256k -> 512k           0
> 512k -> 1024k          0
> 1024k -> 2M            0
> 2M -> 4M               0
> 4M -> 8M              16
> 8M -> 16M              0
> 16M -> 32M             0
> 32M -> 64M         11218
> ************************************************************
> 64M -> 128M            0
> 128M -> 256M           0
> 256M -> 512M           0
> 512M -> 1024M          0
> 1024M -> 2G            0
> 2G -> 4G               5
> 4G -> 8G               1
> 8G -> 16G              3
> 16G -> 32G             1
>
>
> When watching zpool iostat I get this (30 second average, NOT the first
> output):
>
>                capacity     operations    bandwidth
> pool         used  avail   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> atlashome   3.54T  17.3T    137      0  4.28M      0
>   raidz2     833G  6.00T      1      0  30.8K      0
>     c0t0d0      -      -      1      0  2.38K      0
>     c1t0d0      -      -      1      0  2.18K      0
>     c4t0d0      -      -      0      0  1.91K      0
>     c6t0d0      -      -      0      0  1.76K      0
>     c7t0d0      -      -      0      0  1.77K      0
>     c0t1d0      -      -      0      0  1.79K      0
>     c1t1d0      -      -      0      0  1.86K      0
>     c4t1d0      -      -      0      0  1.97K      0
>     c5t1d0      -      -      0      0  2.04K      0
>     c6t1d0      -      -      1      0  2.25K      0
>     c7t1d0      -      -      1      0  2.31K      0
>     c0t2d0      -      -      1      0  2.21K      0
>     c1t2d0      -      -      0      0  1.99K      0
>     c4t2d0      -      -      0      0  1.99K      0
>     c5t2d0      -      -      1      0  2.38K      0
>   raidz2    1.29T  5.52T     67      0  2.09M      0
>     c6t2d0      -      -     58      0   143K      0
>     c7t2d0      -      -     58      0   141K      0
>     c0t3d0      -      -     53      0   131K      0
>     c1t3d0      -      -     53      0   130K      0
>     c4t3d0      -      -     58      0   143K      0
>     c5t3d0      -      -     58      0   145K      0
>     c6t3d0      -      -     59      0   147K      0
>     c7t3d0      -      -     59      0   146K      0
>     c0t4d0      -      -     59      0   145K      0
>     c1t4d0      -      -     58      0   145K      0
>     c4t4d0      -      -     58      0   145K      0
>     c6t4d0      -      -     58      0   143K      0
>     c7t4d0      -      -     58      0   143K      0
>     c0t5d0      -      -     58      0   145K      0
>     c1t5d0      -      -     58      0   144K      0
>   raidz2    1.43T  5.82T     69      0  2.16M      0
>     c4t5d0      -      -     62      0   141K      0
>     c5t5d0      -      -     60      0   138K      0
>     c6t5d0      -      -     59      0   135K      0
>     c7t5d0      -      -     60      0   138K      0
>     c0t6d0      -      -     62      0   142K      0
>     c1t6d0      -      -     61      0   138K      0
>     c4t6d0      -      -     59      0   135K      0
>     c5t6d0      -      -     60      0   138K      0
>     c6t6d0      -      -     62      0   142K      0
>     c7t6d0      -      -     61      0   138K      0
>     c0t7d0      -      -     58      0   134K      0
>     c1t7d0      -      -     60      0   137K      0
>     c4t7d0      -      -     62      0   142K      0
>     c5t7d0      -      -     61      0   139K      0
>     c6t7d0      -      -     58      0   134K      0
>     c7t7d0      -      -     60      0   138K      0
> ----------  -----  -----  -----  -----  -----  -----
>
> Odd things:
>
> (1) The zpool is not equally striped across the raidz2-pools
>   
Since you are reading, it depends on where the data was written.
Remember, ZFS dynamic striping != RAID-0.
I would expect something like this if the pool was expanded at some
point in time.> (2) The disks should be able to perform much much faster than they
> currently output data at, I believe it;s 2008 and not 1995.
>   
X4500?  Those disks are good for about 75-80 random iops,
which seems to be about what they are delivering.  The dtrace
tool, iopattern, will show the random/sequential nature of the
workload.
> (3) The four cores of the X4500 are dying of boredom, i.e. idle >95% all
> the time.
>
> Has anyone a good idea, where the bottleneck could be? I''m running
out
> of ideas.
>   
I would suspect the disks.  30 second samples are not very useful
to try and debug such things -- even 1 second samples can be
too coarse.  But you should take a look at 1 second samples
to see if there is a consistent I/O workload.
 -- richard
> Cheers
>
> Carsten
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Carsten Aulbert

2008-Oct-15 19:07 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi Richard,

Richard Elling wrote:> Since you are reading, it depends on where the data was written.
> Remember, ZFS dynamic striping != RAID-0.
> I would expect something like this if the pool was expanded at some
> point in time.
No, the RAID was set-up in one go right after jumpstarting the box.
>> (2) The disks should be able to perform much much faster than they
>> currently output data at, I believe it;s 2008 and not 1995.
>>   
> 
> X4500?  Those disks are good for about 75-80 random iops,
> which seems to be about what they are delivering.  The dtrace
> tool, iopattern, will show the random/sequential nature of the
> workload.
>
I need to read about his a bit and will try to analyze it.
>> (3) The four cores of the X4500 are dying of boredom, i.e. idle >95%
all
>> the time.
>>
>> Has anyone a good idea, where the bottleneck could be? I''m
running out
>> of ideas.
>>   
> 
> I would suspect the disks.  30 second samples are not very useful
> to try and debug such things -- even 1 second samples can be
> too coarse.  But you should take a look at 1 second samples
> to see if there is a consistent I/O workload.
> -- richard
> 
Without doing too much statistics (yet, if needed I can easily do that)
it looks like these:


               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
atlashome   3.54T  17.3T    256      0  7.97M      0
  raidz2     833G  6.00T      0      0      0      0
    c0t0d0      -      -      0      0      0      0
    c1t0d0      -      -      0      0      0      0
    c4t0d0      -      -      0      0      0      0
    c6t0d0      -      -      0      0      0      0
    c7t0d0      -      -      0      0      0      0
    c0t1d0      -      -      0      0      0      0
    c1t1d0      -      -      0      0      0      0
    c4t1d0      -      -      0      0      0      0
    c5t1d0      -      -      0      0      0      0
    c6t1d0      -      -      0      0      0      0
    c7t1d0      -      -      0      0      0      0
    c0t2d0      -      -      0      0      0      0
    c1t2d0      -      -      0      0      0      0
    c4t2d0      -      -      0      0      0      0
    c5t2d0      -      -      0      0      0      0
  raidz2    1.29T  5.52T    133      0  4.14M      0
    c6t2d0      -      -    117      0   285K      0
    c7t2d0      -      -    114      0   279K      0
    c0t3d0      -      -    106      0   261K      0
    c1t3d0      -      -    114      0   282K      0
    c4t3d0      -      -    118      0   294K      0
    c5t3d0      -      -    125      0   308K      0
    c6t3d0      -      -    126      0   311K      0
    c7t3d0      -      -    118      0   293K      0
    c0t4d0      -      -    119      0   295K      0
    c1t4d0      -      -    120      0   298K      0
    c4t4d0      -      -    120      0   291K      0
    c6t4d0      -      -    106      0   257K      0
    c7t4d0      -      -     96      0   236K      0
    c0t5d0      -      -    109      0   267K      0
    c1t5d0      -      -    114      0   282K      0
  raidz2    1.43T  5.82T    123      0  3.83M      0
    c4t5d0      -      -    108      0   242K      0
    c5t5d0      -      -    104      0   236K      0
    c6t5d0      -      -    104      0   239K      0
    c7t5d0      -      -    107      0   245K      0
    c0t6d0      -      -    108      0   248K      0
    c1t6d0      -      -    106      0   245K      0
    c4t6d0      -      -    108      0   250K      0
    c5t6d0      -      -    112      0   258K      0
    c6t6d0      -      -    114      0   261K      0
    c7t6d0      -      -    110      0   253K      0
    c0t7d0      -      -    109      0   248K      0
    c1t7d0      -      -    109      0   246K      0
    c4t7d0      -      -    108      0   243K      0
    c5t7d0      -      -    108      0   244K      0
    c6t7d0      -      -    106      0   240K      0
    c7t7d0      -      -    109      0   244K      0
----------  -----  -----  -----  -----  -----  -----

the iops vary between about 70 - 140, interesting bit is that the first
raidz2 does not get any hits at all :(

Cheers

Carsten

Scott Williamson

2008-Oct-15 21:17 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi All,

Just want to note that I had the same issue with zfs send + vdevs that had
11 drives in them on a X4500. Reducing the count of drives per zvol cleared
this up.

One vdev is IOPS limited to the speed of one drive in that vdev, according
to this post <http://opensolaris.org/jive/thread.jspa?threadID=74033> (see
comment from ptribble.)

On Wed, Oct 15, 2008 at 3:07 PM, Carsten Aulbert <carsten.aulbert at
aei.mpg.de> wrote:
> Hi Richard,
>
> Richard Elling wrote:
> > Since you are reading, it depends on where the data was written.
> > Remember, ZFS dynamic striping != RAID-0.
> > I would expect something like this if the pool was expanded at some
> > point in time.
>
> No, the RAID was set-up in one go right after jumpstarting the box.
>
> >> (2) The disks should be able to perform much much faster than they
> >> currently output data at, I believe it;s 2008 and not 1995.
> >>
> >
> > X4500?  Those disks are good for about 75-80 random iops,
> > which seems to be about what they are delivering.  The dtrace
> > tool, iopattern, will show the random/sequential nature of the
> > workload.
> >
>
> I need to read about his a bit and will try to analyze it.
>
> >> (3) The four cores of the X4500 are dying of boredom, i.e. idle
>95% all
> >> the time.
> >>
> >> Has anyone a good idea, where the bottleneck could be?
I''m running out
> >> of ideas.
> >>
> >
> > I would suspect the disks.  30 second samples are not very useful
> > to try and debug such things -- even 1 second samples can be
> > too coarse.  But you should take a look at 1 second samples
> > to see if there is a consistent I/O workload.
> > -- richard
> >
>
> Without doing too much statistics (yet, if needed I can easily do that)
> it looks like these:
>
>
>               capacity     operations    bandwidth
> pool         used  avail   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> atlashome   3.54T  17.3T    256      0  7.97M      0
>  raidz2     833G  6.00T      0      0      0      0
>    c0t0d0      -      -      0      0      0      0
>    c1t0d0      -      -      0      0      0      0
>    c4t0d0      -      -      0      0      0      0
>    c6t0d0      -      -      0      0      0      0
>    c7t0d0      -      -      0      0      0      0
>    c0t1d0      -      -      0      0      0      0
>    c1t1d0      -      -      0      0      0      0
>    c4t1d0      -      -      0      0      0      0
>    c5t1d0      -      -      0      0      0      0
>    c6t1d0      -      -      0      0      0      0
>    c7t1d0      -      -      0      0      0      0
>    c0t2d0      -      -      0      0      0      0
>    c1t2d0      -      -      0      0      0      0
>    c4t2d0      -      -      0      0      0      0
>    c5t2d0      -      -      0      0      0      0
>  raidz2    1.29T  5.52T    133      0  4.14M      0
>    c6t2d0      -      -    117      0   285K      0
>    c7t2d0      -      -    114      0   279K      0
>    c0t3d0      -      -    106      0   261K      0
>    c1t3d0      -      -    114      0   282K      0
>    c4t3d0      -      -    118      0   294K      0
>    c5t3d0      -      -    125      0   308K      0
>    c6t3d0      -      -    126      0   311K      0
>    c7t3d0      -      -    118      0   293K      0
>    c0t4d0      -      -    119      0   295K      0
>    c1t4d0      -      -    120      0   298K      0
>    c4t4d0      -      -    120      0   291K      0
>    c6t4d0      -      -    106      0   257K      0
>    c7t4d0      -      -     96      0   236K      0
>    c0t5d0      -      -    109      0   267K      0
>    c1t5d0      -      -    114      0   282K      0
>  raidz2    1.43T  5.82T    123      0  3.83M      0
>    c4t5d0      -      -    108      0   242K      0
>    c5t5d0      -      -    104      0   236K      0
>    c6t5d0      -      -    104      0   239K      0
>    c7t5d0      -      -    107      0   245K      0
>    c0t6d0      -      -    108      0   248K      0
>    c1t6d0      -      -    106      0   245K      0
>    c4t6d0      -      -    108      0   250K      0
>    c5t6d0      -      -    112      0   258K      0
>    c6t6d0      -      -    114      0   261K      0
>    c7t6d0      -      -    110      0   253K      0
>    c0t7d0      -      -    109      0   248K      0
>    c1t7d0      -      -    109      0   246K      0
>    c4t7d0      -      -    108      0   243K      0
>    c5t7d0      -      -    108      0   244K      0
>    c6t7d0      -      -    106      0   240K      0
>    c7t7d0      -      -    109      0   244K      0
> ----------  -----  -----  -----  -----  -----  -----
>
> the iops vary between about 70 - 140, interesting bit is that the first
> raidz2 does not get any hits at all :(
>
> Cheers
>
> Carsten
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081015/6c4aa822/attachment.html>

Brent Jones

2008-Oct-16 01:37 UTC

head link

[zfs-discuss] Improving zfs send performance

On Wed, Oct 15, 2008 at 2:17 PM, Scott Williamson
<scott.williamson at gmail.com> wrote:> Hi All,
>
> Just want to note that I had the same issue with zfs send + vdevs that had
> 11 drives in them on a X4500. Reducing the count of drives per zvol cleared
> this up.
>
> One vdev is IOPS limited to the speed of one drive in that vdev, according
> to this post (see comment from ptribble.)
>
Scott,

Can you tell us the configuration that you''re using that is working for
you?
Were you using RaidZ, or RaidZ2? I''m wondering what the
"sweetspot" is
to get a good compromise in vdevs and usable space/performance

Thanks!

-- 
Brent Jones
brent at servuhome.net

Carsten Aulbert

2008-Oct-16 05:27 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi again

Brent Jones wrote:> 
> Scott,
> 
> Can you tell us the configuration that you''re using that is
working for you?
> Were you using RaidZ, or RaidZ2? I''m wondering what the
"sweetspot" is
> to get a good compromise in vdevs and usable space/performance
>
Some time ago I made some tests to find this:

(1) create a new zpool
(2) Copy user''s home to it (always the same ~ 25 GB IIRC)
(3) zfs send to /dev/null
(4) evaluate && continue loop

I did this for fully mirrored setups, raidz as well as raidz2, the
results were mixed:

https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo

The culprit here might be that in retrospect this seemed like a "good"
home filesystem, i.e. one which was quite fast.

If you don''t want to bother with the table:

Mirrored setup never exceeded 58 MB/s and was getting faster the more
small mirrors you used.

RaidZ had its sweetspot with a configuration of ''6 6 6 6 6 6 5
5'', i.e.
6 or 5 disks per RaidZ and 8 vdevs

RaidZ2 finally was best at ''10 9 9 9 9'', i.e. 5 vdevs but not
much worse
with only 3, i.e. what we are currently using to get more storage space
(gains us about 2 TB/box).

Cheers

Carsten

Scott Williamson

2008-Oct-16 12:00 UTC

head link

[zfs-discuss] Improving zfs send performance

On Wed, Oct 15, 2008 at 9:37 PM, Brent Jones <brent at servuhome.net>
wrote:

> Scott,
>
> Can you tell us the configuration that you''re using that is
working for
> you?
> Were you using RaidZ, or RaidZ2? I''m wondering what the
"sweetspot" is
> to get a good compromise in vdevs and usable space/performance

I used RaidZ with 4x5 disk and 4x6 disk vdevs in one pool with two hot
spares. This is very similar to how the pre-installed OS shipped from sun.
Also note that I am using ssh as the transfer method.

I have not tried mbuffer with this configuration as in testing with initial
home directories of ~14GB in size it was not needed.

This configuration seems to be similar to Carsten Aulbert''s evaluation,
without mbuffer in the pipe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081016/540b836b/attachment.html>

Scott Williamson

2008-Oct-16 12:11 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi Carsten,

You seem to be using dd for write testing. In my testing I noted that there
was a large difference in write speed between using dd to write from
/dev/zero and using other files. Writing from /dev/zero always seemed to be
fast, reaching the maximum of ~200MB/s and using cp which would perform
poorler the fewer the vdevs.

This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the
disks seemed to spend most of their time seeking during the send.

On Thu, Oct 16, 2008 at 1:27 AM, Carsten Aulbert <carsten.aulbert at
aei.mpg.de> wrote:
> Some time ago I made some tests to find this:
>
> (1) create a new zpool
> (2) Copy user''s home to it (always the same ~ 25 GB IIRC)
> (3) zfs send to /dev/null
> (4) evaluate && continue loop
>
> I did this for fully mirrored setups, raidz as well as raidz2, the
> results were mixed:
>
>
>
https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo
>
> The culprit here might be that in retrospect this seemed like a
"good"
> home filesystem, i.e. one which was quite fast.
>
> If you don''t want to bother with the table:
>
> Mirrored setup never exceeded 58 MB/s and was getting faster the more
> small mirrors you used.
>
> RaidZ had its sweetspot with a configuration of ''6 6 6 6 6 6 5
5'', i.e.
> 6 or 5 disks per RaidZ and 8 vdevs
>
> RaidZ2 finally was best at ''10 9 9 9 9'', i.e. 5 vdevs but
not much worse
> with only 3, i.e. what we are currently using to get more storage space
> (gains us about 2 TB/box).
>
> Cheers
>
> Carsten
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081016/bb143f21/attachment.html>

Ross

2008-Oct-16 12:21 UTC

head link

[zfs-discuss] Improving zfs send performance

Ok, I''m not entirely sure this is the same problem, but it does sound
fairly similar.  Apologies for hijacking the thread if this does turn out to be
something else.

After following the advice here to get mbuffer working with zfs send / receive,
I found I was only getting around 10MB/s throughput.  Thinking it was a network
problem I started the below thread in the OpenSolaris help forum:
http://www.opensolaris.org/jive/thread.jspa?messageID=294846

Now though I don''t think it''s network at all.  The end result
from that thread is that we can''t see any errors in the network setup,
and using nicstat and NFS I can show that the server is capable of 50-60MB/s
over the gigabit link.  Nicstat also shows clearly that both zfs send / receive
and mbuffer are only sending 1/5 of that amount of data over the network.

I''ve completely run out of ideas of my own (but I do half expect
there''s a simple explanation I haven''t thought of).  Can
anybody think of a reason why both zfs send / receive and mbuffer would be so
slow?
--
This message posted from opensolaris.org

Carsten Aulbert

2008-Oct-16 12:25 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi Scott,

Scott Williamson wrote:> You seem to be using dd for write testing. In my testing I noted that
> there was a large difference in write speed between using dd to write
> from /dev/zero and using other files. Writing from /dev/zero always
> seemed to be fast, reaching the maximum of ~200MB/s and using cp which
> would perform poorler the fewer the vdevs.
You are right, the write benchmarks were done with dd just to have some
"bulk" bulk figures since usually zeros can be generated fast enough.
> 
> This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the
> disks seemed to spend most of their time seeking during the send.
> 
That seems a bit too simplistic to me. If you compare raidz with raidz2
it seems that raidz2 is not too bad with fewer vdevs. I wish there was a
way for zfs send to avoid so many seeks. The << 1 TB file system is
still being zfs send, now close to 48 hours.

Cheers

Carsten

PS: We still have a spare thumper sitting around, maybe I give it a try
with 5 vdevs

Carsten Aulbert

2008-Oct-16 12:27 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi Ross

Ross wrote:> Now though I don''t think it''s network at all.  The end
result from that thread is that we can''t see any errors in the network
setup, and using nicstat and NFS I can show that the server is capable of
50-60MB/s over the gigabit link.  Nicstat also shows clearly that both zfs send
/ receive and mbuffer are only sending 1/5 of that amount of data over the
network.
> 
> I''ve completely run out of ideas of my own (but I do half expect
there''s a simple explanation I haven''t thought of).  Can
anybody think of a reason why both zfs send / receive and mbuffer would be so
slow?
Try to separate the two things:

(1) Try /dev/zero -> mbuffer --- network ---> mbuffer > /dev/null

That should give you wirespeed

(2) Try zfs send | mbuffer > /dev/null

That should give you an idea how fast zfs send really is locally.

Carsten

Ross Smith

2008-Oct-16 12:43 UTC

head link

[zfs-discuss] Improving zfs send performance

> Try to separate the two things:> > (1) Try /dev/zero -> mbuffer
--- network ---> mbuffer > /dev/null
> That should give you wirespeedI tried that already.  It still gets just 10-11MB/s from this server.
I can get zfs send / receive and mbuffer working at 30MB/s though from a couple
of test servers (with much lower specs).
 > (2) Try zfs send | mbuffer > /dev/null> That should give you an idea
how fast zfs send really is locally.Hmm, that''s better than 10MB/s, but the average is still only around
20MB/s:
summary:  942 MByte in 47.4 sec - average of 19.9 MB/s
 
I think that points to another problem though as the send mbuffer is 100% full. 
Certainly the pool itself doesn''t appear under any strain at all while
this is going on:
 
               capacity     operations    bandwidthpool         used  avail  
read  write   read  write----------  -----  -----  -----  -----  ----- 
-----rc-pool      732G  1.55T    171     85  21.3M  1.01M  mirror     144G  
320G     38      0  4.78M      0    c1t1d0      -      -      6      0   779K   
0    c1t2d0      -      -     17      0  2.17M      0    c2t1d0      -      -   
14      0  1.85M      0  mirror     146G   318G     39      0  4.89M      0   
c1t3d0      -      -     20      0  2.50M      0    c2t2d0      -      -     13 
0  1.63M      0    c2t0d0      -      -      6      0   779K      0  mirror    
146G   318G     34      0  4.35M      0    c2t3d0      -      -     19      0 
2.39M      0    c1t5d0      -      -      7      0  1002K      0    c1t4d0     
-      -      7      0  1002K      0  mirror     148G   316G     23      0 
2.93M      0    c2t4d0      -      -      8      0  1.09M      0    c2t5d0     
-      -      6      0   890K      0    c1t6d0      -      -      7      0 
1002K      0  mirror     148G   316G     35      0  4.35M      0    c1t7d0     
-      -      6      0   779K      0    c2t6d0      -      -     12      0 
1.52M      0    c2t7d0      -      -     17      0  2.07M      0  c3d1p0     
12K   504M      0     85      0  1.01M----------  -----  -----  -----  ----- 
-----  -----
Especially when compared to the zfs send stats on my backup server which managed
30MB/s via mbuffer (Being received on a single virtual SATA disk):
               capacity     operations    bandwidthpool         used  avail  
read  write   read  write----------  -----  -----  -----  -----  ----- 
-----rpool       5.12G  42.6G      0      5      0  27.1K  c4t0d0s0  5.12G 
42.6G      0      5      0  27.1K----------  -----  -----  -----  -----  ----- 
-----zfspool      431G  4.11T    261      0  31.4M      0  raidz2     431G 
4.11T    261      0  31.4M      0    c4t1d0      -      -    155      0  6.28M  
0    c4t2d0      -      -    155      0  6.27M      0    c4t3d0      -      -   
155      0  6.27M      0    c4t4d0      -      -    155      0  6.27M      0   
c4t5d0      -      -    155      0  6.27M      0----------  -----  -----  ----- 
-----  -----  -----
The really ironic thing is that the 30MB/s send / receive was sending to a
virtual SATA disk which is stored (via sync NFS) on the server I''m
having problems with...
 
Ross
 
 
_________________________________________________________________
Win New York holidays with Kellogg?s & Live Search
http://clk.atdmt.com/UKM/go/111354033/direct/01/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081016/b177c365/attachment.html>

Ross Smith

2008-Oct-16 13:01 UTC

head link

[zfs-discuss] Improving zfs send performance

Oh dear god.  Sorry folks, it looks like the new hotmail really doesn''t
play well with the list.  Trying again in plain text:
 
 > Try to separate the two things:
> 
> (1) Try /dev/zero -> mbuffer --- network ---> mbuffer> /dev/null
> That should give you wirespeed 
I tried that already.  It still gets just 10-11MB/s from this server.
I can get zfs send / receive and mbuffer working at 30MB/s though from a couple
of test servers (with much lower specs).
 > (2) Try zfs send | mbuffer> /dev/null
> That should give you an idea how fast zfs send really is locally. 
Hmm, that''s better than 10MB/s, but the average is still only around
20MB/s:
summary:  942 MByte in 47.4 sec - average of 19.9 MB/s
 
I think that points to another problem though as the send mbuffer is 100% full. 
Certainly the pool itself doesn''t appear under any strain at all while
this is going on:
 
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rc-pool      732G  1.55T    171     85  21.3M  1.01M
  mirror     144G   320G     38      0  4.78M      0
    c1t1d0      -      -      6      0   779K      0
    c1t2d0      -      -     17      0  2.17M      0
    c2t1d0      -      -     14      0  1.85M      0
  mirror     146G   318G     39      0  4.89M      0
    c1t3d0      -      -     20      0  2.50M      0
    c2t2d0      -      -     13      0  1.63M      0
    c2t0d0      -      -      6      0   779K      0
  mirror     146G   318G     34      0  4.35M      0
    c2t3d0      -      -     19      0  2.39M      0
    c1t5d0      -      -      7      0  1002K      0
    c1t4d0      -      -      7      0  1002K      0
  mirror     148G   316G     23      0  2.93M      0
    c2t4d0      -      -      8      0  1.09M      0
    c2t5d0      -      -      6      0   890K      0
    c1t6d0      -      -      7      0  1002K      0
  mirror     148G   316G     35      0  4.35M      0
    c1t7d0      -      -      6      0   779K      0
    c2t6d0      -      -     12      0  1.52M      0
    c2t7d0      -      -     17      0  2.07M      0
  c3d1p0      12K   504M      0     85      0  1.01M
----------  -----  -----  -----  -----  -----  -----
 
Especially when compared to the zfs send stats on my backup server which managed
30MB/s via mbuffer (Being received on a single virtual SATA disk):
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       5.12G  42.6G      0      5      0  27.1K
  c4t0d0s0  5.12G  42.6G      0      5      0  27.1K
----------  -----  -----  -----  -----  -----  -----
zfspool      431G  4.11T    261      0  31.4M      0
  raidz2     431G  4.11T    261      0  31.4M      0
    c4t1d0      -      -    155      0  6.28M      0
    c4t2d0      -      -    155      0  6.27M      0
    c4t3d0      -      -    155      0  6.27M      0
    c4t4d0      -      -    155      0  6.27M      0
    c4t5d0      -      -    155      0  6.27M      0
----------  -----  -----  -----  -----  -----  -----
The really ironic thing is that the 30MB/s send / receive was sending to a
virtual SATA disk which is stored (via sync NFS) on the server I''m
having problems with...
 
Ross

 
> Date: Thu, 16 Oct 2008 14:27:49 +0200
> From: carsten.aulbert at aei.mpg.de
> To: myxiplx at hotmail.com
> CC: zfs-discuss at opensolaris.org
> Subject: Re: [zfs-discuss] Improving zfs send performance
> 
> Hi Ross
> 
> Ross wrote:
>> Now though I don''t think it''s network at all. The end
result from that thread is that we can''t see any errors in the network
setup, and using nicstat and NFS I can show that the server is capable of
50-60MB/s over the gigabit link. Nicstat also shows clearly that both zfs send /
receive and mbuffer are only sending 1/5 of that amount of data over the
network.
>> 
>> I''ve completely run out of ideas of my own (but I do half
expect there''s a simple explanation I haven''t thought of). Can
anybody think of a reason why both zfs send / receive and mbuffer would be so
slow?
> 
> Try to separate the two things:
> 
> (1) Try /dev/zero -> mbuffer --- network ---> mbuffer> /dev/null
> 
> That should give you wirespeed
> 
> (2) Try zfs send | mbuffer> /dev/null
> 
> That should give you an idea how fast zfs send really is locally.
> 
> Carsten_________________________________________________________________
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/

Scott Williamson

2008-Oct-16 20:39 UTC

head link

[zfs-discuss] Improving zfs send performance

So I am zfs sending ~450 datasets between thumpers running SOL10U5 via ssh,
most are empty except maybe 10 that have a few GB of files.

I see the following output on one that contained ~1GB  of files in my send
report:

Output from zfs receive -v "received 1.07Gb stream in 30 seconds
(36.4Mb/sec)"

I have a few problems with this:

1. Should it not read 1.07GB for Bytes?

2. Should it not read that this was done at a rate of 36.4MB/s?

The output seems to be incorrect, but makes sense if you uppercase the b.

This is an underwhelming ~292Mb/s!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081016/88a8d24d/attachment.html>

Ross

2008-Oct-17 11:35 UTC

head link

[zfs-discuss] Improving zfs send performance

Ok, just did some more testing on this machine to try to find where my
bottlenecks are.  Something very odd is going on here.  As best I can tell there
are two separate problems now:

- something is throttling network output to 10MB/s
- something is throttling zfs send to around 20MB/s

The network throughput I''ve verified with mbuffer:

1.  A quick mbuffer test from /dev/zero to /dev/null gave me 565MB/s.
2.  On a test server, mbuffer sending from /dev/zero on one machine to /dev/null
on another gave me 37MB/s
3.  On the live server, mbuffer sending from /dev/zero to the same receiving
machine gave me just under 10MB/s.

This looks very much like mbuffer is throttled on this machine, but I know NFS
can give me 60-80MB/s.  Can anybody give me a clue as to what could be causing
this?


And the disk performance is just as confusing.  Again I used a test server to
provide a comparison, and this time used a zfs scrub with iostat to check the
performance possible on the disks.

Live server:  5 sets of 3 way mirrors
Test server:  5 disk raid-z2

1.  On the Live server, zfs send to /dev/null via mbuffer reports a speed of
21MB/s
     # zfs send pool at snapshot | mbuffer -s 128k -m 512M > /dev/null
2.  On the Test server, zfs send to /dev/null via mbuffer reports a speed of
35MB/s
3.  On the Live server, zpool scrub and iostat report a peak of 3k iops, and
283MB/s throughput.
4.  On the Test server, zpool scrub and iostat report a peak of 472 iops, and
53MB/s throughput.

Surely the send and scrub operations should give similar results?  Why is zpool
scrub running 10-15x faster than zfs send on the live server?

The iostat figures on the live server are particularly telling.

During a scrub (30s intervals):
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rc-pool      734G  1.55T  2.94K     41   189M   788K
  mirror     144G   320G    578      6  39.2M   166K
    c1t1d0      -      -    379      5  39.9M   166K
    c1t2d0      -      -    379      5  39.9M   166K
    c2t1d0      -      -    385      5  40.1M   166K
  mirror     147G   317G    633      2  37.8M   170K
    c1t3d0      -      -    389      2  38.7M   171K
    c2t2d0      -      -    393      2  38.9M   171K
    c2t0d0      -      -    384      2  38.9M   171K
  mirror     147G   317G    619      6  37.3M  57.5K
    c2t3d0      -      -    377      2  38.3M  57.9K
    c1t5d0      -      -    377      2  38.3M  57.9K
    c1t4d0      -      -    373      3  38.2M  57.9K
  mirror     148G   316G    638     10  37.6M  64.0K
    c2t4d0      -      -    375      4  38.5M  64.4K
    c2t5d0      -      -    386      6  38.2M  64.4K
    c1t6d0      -      -    384      6  38.2M  64.4K
  mirror     149G   315G    540      6  37.4M   164K
    c1t7d0      -      -    356      4  38.1M   164K
    c2t6d0      -      -    362      5  38.2M   164K
    c2t7d0      -      -    361      5  38.2M   164K
  c3d1p0      12K   504M      0      8      0   166K
----------  -----  -----  -----  -----  -----  -----

During a send (30s intervals):
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rc-pool      734G  1.55T    148     55  18.6M  1.71M
  mirror     144G   320G     25      6  3.15M   235K
    c1t1d0      -      -      8      3  1.02M   235K
    c1t2d0      -      -      7      3   954K   235K
    c2t1d0      -      -      9      3  1.19M   235K
  mirror     147G   317G     27      3  3.40M   203K
    c1t3d0      -      -      8      2  1.03M   203K
    c2t2d0      -      -      9      3  1.25M   203K
    c2t0d0      -      -      8      2  1.11M   203K
  mirror     147G   317G     32      2  4.12M   205K
    c2t3d0      -      -     11      1  1.45M   205K
    c1t5d0      -      -     10      1  1.34M   205K
    c1t4d0      -      -     10      1  1.34M   205K
  mirror     148G   316G     32      2  4.02M   201K
    c2t4d0      -      -     10      1  1.37M   201K
    c2t5d0      -      -      9      1  1.23M   201K
    c1t6d0      -      -     11      1  1.43M   201K
  mirror     149G   315G     31      6  3.89M   180K
    c1t7d0      -      -     11      2  1.45M   180K
    c2t6d0      -      -      8      2  1.10M   180K
    c2t7d0      -      -     10      2  1.35M   180K
  c3d1p0      12K   504M      0     34      0   727K
----------  -----  -----  -----  -----  -----  -----

Can anybody explain why zfs send could be so slow on one server?  Is anybody
else able to compare their iostat results for a zfs send and zpool scrub to see
if they also have such a huge difference between the figures?

thanks,

Ross
--
This message posted from opensolaris.org

Dimitri Aivaliotis

2008-Oct-17 13:18 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi Ross,

On Fri, Oct 17, 2008 at 1:35 PM, Ross <myxiplx at googlemail.com>
wrote:> Ok, just did some more testing on this machine to try to find where my
bottlenecks are.  Something very odd is going on here.  As best I can tell there
are two separate problems now:
>
> - something is throttling network output to 10MB/s

I''ll try to help you with this problem.

> The network throughput I''ve verified with mbuffer:
>
> 1.  A quick mbuffer test from /dev/zero to /dev/null gave me 565MB/s.
> 2.  On a test server, mbuffer sending from /dev/zero on one machine to
/dev/null on another gave me 37MB/s
> 3.  On the live server, mbuffer sending from /dev/zero to the same
receiving machine gave me just under 10MB/s.
>
> This looks very much like mbuffer is throttled on this machine, but I know
NFS can give me 60-80MB/s.  Can anybody give me a clue as to what could be
causing this?
>
Does your NFS mount go over a separate network?  If not, just ignore
this advice. :)

When first testing out ZFS over NFS performance, I ran into a similar
problem.  I had very nice graphs, all plateauing at 10MB/s, and was
getting frustrated at performance being so slow.  It turned out that
one of my links was 100Mbit.  I took a moment to breathe, learn from
my mistake (check the network links BEFORE running performance tests),
and ran my tests again.

Check your network links, make sure that it''s Gigabit all the way
through, and that you''re negotiating full-duplex.  A 100Mbit link will
give you just about 10MB/s throughput on network transfers.

- Dimitri

Ross

2008-Oct-17 13:51 UTC

head link

[zfs-discuss] Improving zfs send performance

Yup, that''s one of the first things I checked when it came out with
figures so close to 10MB/s.  All three servers are running full duplex
gigabit though, as reported by both Solaris and the switch.  And both
the NFS at 60+MB/s, and the zfs send / receive are all going over the
same network link, in some cases to the same servers.
--
This message posted from opensolaris.org

Scott Williamson

2008-Oct-17 15:34 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi All,

I have opened a ticket with sun support #66104157 regarding zfs send /
receive and will let you know what I find out.

Keep in mind that this is for Solaris 10 not opensolaris.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081017/423dc8ab/attachment.html>

Miles Nordin

2008-Oct-17 18:33 UTC

head link

[zfs-discuss] Improving zfs send performance

>>>>> "r" == Ross  <myxiplx at googlemail.com>
writes:
     r> figures so close to 10MB/s.  All three servers are running
     r> full duplex gigabit though

there is one tricky way 100Mbit/s could still bite you, but it''s
probably not happening to you.  It mostly affects home users with
unmanaged switches:

  http://www.smallnetbuilder.com/content/view/30212/54/
  http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html

because the big switch vendors all use pause frames safely:

 http://www.networkworld.com/netresources/0913flow2.html -- pause frames as
interpreted by netgear are harmful

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081017/526f78f3/attachment.bin>

Richard Elling

2008-Oct-17 18:48 UTC

head link

[zfs-discuss] Improving zfs send performance

Scott Williamson wrote:> Hi All,
>  
> I have opened a ticket with sun support #66104157 regarding zfs send / 
> receive and will let you know what I find out.
Thanks.
>  
> Keep in mind that this is for Solaris 10 not opensolaris.
Keep in mind that any changes required for Solaris 10 will first
be available in OpenSolaris, including any changes which may
have already been implemented.
 -- richard

Scott Williamson

2008-Oct-17 19:32 UTC

head link

[zfs-discuss] Improving zfs send performance

On Fri, Oct 17, 2008 at 2:48 PM, Richard Elling <Richard.Elling at
sun.com>wrote:
> Keep in mind that any changes required for Solaris 10 will first
> be available in OpenSolaris, including any changes which may
> have already been implemented.
>
For me (who uses SOL10) it is the only way I can get information about what
bugs and changes have been identified and helps me get stuff from
opensolaris into sol10. The last support ticket resulted in a solaris iSCSI
target to windows initiator patch to solaris 10 that made iSCSI targets on
ZFS actually work for us.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081017/1d0af393/attachment.html>

Carsten Aulbert

2008-Oct-18 07:24 UTC

head link

[zfs-discuss] Improving zfs send performance

Hi

Miles Nordin wrote:>>>>>> "r" == Ross  <myxiplx at
googlemail.com> writes:
> 
>      r> figures so close to 10MB/s.  All three servers are running
>      r> full duplex gigabit though
> 
> there is one tricky way 100Mbit/s could still bite you, but it''s
> probably not happening to you.  It mostly affects home users with
> unmanaged switches:
> 
>   http://www.smallnetbuilder.com/content/view/30212/54/
>  
http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
> 
> because the big switch vendors all use pause frames safely:
> 
>  http://www.networkworld.com/netresources/0913flow2.html -- pause frames as
interpreted by netgear are harmful
That rings a bell, Ross, are you using NFS via UDP or TCP? May it be
that your network has different performance levels for different
transport types? For our network we have disabled pause frames completey
and rely only on TCP internal mechanisms to prevent flooding/blocking.

Carsten

PS: the job where 25k files sizing up to 800 GB is now done - zfs send
took only 52 hrs and the speed was ~ 4.5 MB/s :(

Victor Latushkin

2008-Oct-20 05:52 UTC

head link

[zfs-discuss] Improving zfs send performance

Richard Elling ?????:>> Keep in mind that this is for Solaris 10 not opensolaris.
> 
> Keep in mind that any changes required for Solaris 10 will first
> be available in OpenSolaris, including any changes which may
> have already been implemented.
Indeed. For example, less than a week ago fix for the following two CRs 
(along with some others) was put back into Solaris Nevada:

6333409 traversal code should be able to issue multiple reads in parallel
6418042 want traversal in depth-first pre-order for quicker ''zfs
send''

This should have positive impact on ''zfs send'' performance.

Wbr,
victor

Scott Williamson

2008-Oct-20 14:48 UTC

head link

[zfs-discuss] Improving zfs send performance

On Mon, Oct 20, 2008 at 1:52 AM, Victor Latushkin
<Victor.Latushkin at sun.com>wrote
> Indeed. For example, less than a week ago fix for the following two CRs
> (along with some others) was put back into Solaris Nevada:
>
> 6333409 traversal code should be able to issue multiple reads in parallel
> 6418042 want traversal in depth-first pre-order for quicker ''zfs
send''
>
That is helpful Victor. Does anyone have a full list of CRs that I can
provide to sun support? I have tried searching the bugs database, but I
didn''t even find those two on my own.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081020/1ca93f50/attachment.html>

Roch

2008-Nov-12 11:49 UTC

head link

[zfs-discuss] Improving zfs send performance

Thomas, for long latency fat links, it should be quite
beneficial to set the socket buffer on the receive side
(instead of having users tune tcp_recv_hiwat).

throughput of a tcp connnection is gated by 
"receive socket buffer / round trip time".

Could that be Ross'' problem ?

-r



Ross Smith writes:
 > 
 > Thanks, that got it working.  I''m still only getting 10MB/s, so
it''s not solved my problem - I''ve still got a bottleneck
somewhere, but mbuffer is a huge improvement over standard zfs send / receive. 
It makes such a difference when you can actually see what''s going on.
 > 
 > 
 > ----------------------------------------
 > > Date: Wed, 15 Oct 2008 12:08:14 +0200
 > > From: thomas at maier-komor.de
 > > To: myxiplx at hotmail.com; zfs-discuss at opensolaris.org
 > > Subject: Re: [zfs-discuss] Improving zfs send performance
 > > 
 > > Ross schrieb:
 > >> Hi,
 > >> 
 > >> I''m just doing my first proper send/receive over the
network and I''m getting just 9.4MB/s over a gigabit link.  Would you be
able to provide an example of how to use mbuffer / socat with ZFS for a Solaris
beginner?
 > >> 
 > >> thanks,
 > >> 
 > >> Ross
 > >> --
 > >> This message posted from opensolaris.org
 > >> _______________________________________________
 > >> zfs-discuss mailing list
 > >> zfs-discuss at opensolaris.org
 > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > > 
 > > receiver> mbuffer -I sender:10000 -s 128k -m 512M | zfs receive
 > > 
 > > sender> zfs send mypool/myfilesystem at mysnapshot | mbuffer -s
128k -m
 > > 512M -O receiver:10000
 > > 
 > > BTW: I release a new version of mbuffer today.
 > > 
 > > HTH,
 > > Thomas
 > 
 > _________________________________________________________________
 > Make a mini you and download it into Windows Live Messenger
 > http://clk.atdmt.com/UKM/go/111354029/direct/01/
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Thomas Maier-Komor

2008-Nov-12 13:00 UTC

head link

[zfs-discuss] Improving zfs send performance

Roch schrieb:> Thomas, for long latency fat links, it should be quite
> beneficial to set the socket buffer on the receive side
> (instead of having users tune tcp_recv_hiwat).
> 
> throughput of a tcp connnection is gated by 
> "receive socket buffer / round trip time".
> 
> Could that be Ross'' problem ?
> 
> -r
> 
> 
Hmm, I''m not a TCP expert, but that sounds absolutely possible, if
Solaris 10 isn''t tuning the TCP buffer automatically. The default
receive buffer seems to be 48k (at least on a V240 running 118833-33).
So if the block size is something like 128k it would absolutely make
sense to tune the receive buffer to lower the rund trip time...

Ross: Would you like a patch to test if this is the case? Which version
of mbuffer are you currently using?

- Thomas

zfs discuss - Oct 2008 - Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance

[zfs-discuss] Improving zfs send performance