thr3ads.net - zfs discuss - [zfs-discuss] zfs send speed [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Paul Kraus

2009-Aug-18 20:16 UTC

[zfs-discuss] zfs send speed

Is the speed of a ''zfs send'' dependant on file size / number
of files ?

        We have a system with some large datasets (3.3 TB and about 35
million files) and conventional backups take a long time (using
Netbackup 6.5 a FULL takes between two and three days, differential
incrementals, even with very few files changing, take between 15 and
20 hours). We already use snapshots for day to day restores, but we
need the ''real'' backups for DR.

        I have been testing zfs send throughput and have not been
getting promising results. Note that this is NOT OpenSolaris, but
Solaris 10U6 (10/08) with the IDR for the snapshot interrupts resilver
bug.

Server: V480, 4 CPU, 16 GB RAM (test server, production is an M4000)
Storage: two SE-3511, each with one 512 GB LUN presented

Simple mirror layout:

pkraus at nyc-sted1:/IDR-test/ppk> zpool status
  pool: IDR-test
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Wed Jul  1 16:54:58 2009
config:

        NAME                                       STATE     READ WRITE CKSUM
        IDR-test                                   ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c6t600C0FF0000000000927852FB91AD308d0  ONLINE       0     0     0
            c6t600C0FF0000000000922614781B19008d0  ONLINE       0     0     0

errors: No known data errors
pkraus at nyc-sted1:/IDR-test/ppk>

pkraus at nyc-sted1:/IDR-test/ppk> zfs list
NAME                          USED  AVAIL  REFER  MOUNTPOINT
IDR-test                      101G   399G  24.3M  /IDR-test
IDR-test at 1250597527          96.8M      -   101M  -
IDR-test at 1250604834          20.1M      -  24.3M  -
IDR-test at 1250605236            16K      -  24.3M  -
IDR-test at 1250605400            20K      -  24.3M  -
IDR-test at 1250606582            20K      -  24.3M  -
IDR-test at 1250612553            20K      -  24.3M  -
IDR-test at 1250616026            20K      -  24.3M  -
IDR-test/dataset              101G   399G   100G  /IDR-test/dataset
IDR-test/dataset at 1250597527   313K      -  87.1G  -
IDR-test/dataset at 1250604834   266K      -  87.1G  -
IDR-test/dataset at 1250605236   187M      -  88.2G  -
IDR-test/dataset at 1250605400   192M      -  89.3G  -
IDR-test/dataset at 1250606582   246K      -  95.4G  -
IDR-test/dataset at 1250612553   233K      -  95.4G  -
IDR-test/dataset at 1250616026   230K      -   100G  -
pkraus at nyc-sted1:/IDR-test/ppk>

There are about 3.3 million files / directories in the
''dataset'',
files range in size from 1 KB to 100 KB.

pkraus at nyc-sted1:/IDR-test/ppk> time sudo zfs send
IDR-test/dataset at 1250616026 >/dev/null

real    91m19.024s
user    0m0.022s
sys     11m51.422s
pkraus at nyc-sted1:/IDR-test/ppk>

Which translates to a little over 18 MB/sec. and 600 files/sec. That
would mean almost 16 hours per TB. Better, but not much better than
NBU.

I do not think the SE-3511 is limiting us, as I have seen much higher
throughput on them when resilvering one or more mirrors.

Any thoughts as to why I am not getting better throughput ?

Thanks.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Designer, "The Pajama Game" @ Schenectady Light Opera
Company
( http://www.sloctheater.org/ )
-> Technical Advisor, Lunacon 2010 (http://www.lunacon.org/)
-> Technical Advisor, RPI Players

Paul Kraus

2009-Aug-18 20:22 UTC

head link

[zfs-discuss] zfs send speed

Posted from the wrong address the first time, sorry.

Is the speed of a ''zfs send'' dependant on file size / number
of files ?

? ? ? ?We have a system with some large datasets (3.3 TB and about 35
million files) and conventional backups take a long time (using
Netbackup 6.5 a FULL takes between two and three days, differential
incrementals, even with very few files changing, take between 15 and
20 hours). We already use snapshots for day to day restores, but we
need the ''real'' backups for DR.

? ? ? ?I have been testing zfs send throughput and have not been
getting promising results. Note that this is NOT OpenSolaris, but
Solaris 10U6 (10/08) with the IDR for the snapshot interrupts resilver
bug.

Server: V480, 4 CPU, 16 GB RAM (test server, production is an M4000)
Storage: two SE-3511, each with one 512 GB LUN presented

Simple mirror layout:

pkraus at nyc-sted1:/IDR-test/ppk> zpool status
?pool: IDR-test
?state: ONLINE
?scrub: resilver completed after 0h0m with 0 errors on Wed Jul ?1 16:54:58 2009
config:

? ? ? ?NAME ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? STATE ? ? READ WRITE CKSUM
? ? ? ?IDR-test ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ?mirror ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ?c6t600C0FF0000000000927852FB91AD308d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0
? ? ? ? ? ?c6t600C0FF0000000000922614781B19008d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0

errors: No known data errors
pkraus at nyc-sted1:/IDR-test/ppk>

pkraus at nyc-sted1:/IDR-test/ppk> zfs list
NAME ? ? ? ? ? ? ? ? ? ? ? ? ?USED ?AVAIL ?REFER ?MOUNTPOINT
IDR-test ? ? ? ? ? ? ? ? ? ? ?101G ? 399G ?24.3M ?/IDR-test
IDR-test at 1250597527 ? ? ? ? ?96.8M ? ? ?- ? 101M ?-
IDR-test at 1250604834 ? ? ? ? ?20.1M ? ? ?- ?24.3M ?-
IDR-test at 1250605236 ? ? ? ? ? ?16K ? ? ?- ?24.3M ?-
IDR-test at 1250605400 ? ? ? ? ? ?20K ? ? ?- ?24.3M ?-
IDR-test at 1250606582 ? ? ? ? ? ?20K ? ? ?- ?24.3M ?-
IDR-test at 1250612553 ? ? ? ? ? ?20K ? ? ?- ?24.3M ?-
IDR-test at 1250616026 ? ? ? ? ? ?20K ? ? ?- ?24.3M ?-
IDR-test/dataset ? ? ? ? ? ? ?101G ? 399G ? 100G ?/IDR-test/dataset
IDR-test/dataset at 1250597527 ? 313K ? ? ?- ?87.1G ?-
IDR-test/dataset at 1250604834 ? 266K ? ? ?- ?87.1G ?-
IDR-test/dataset at 1250605236 ? 187M ? ? ?- ?88.2G ?-
IDR-test/dataset at 1250605400 ? 192M ? ? ?- ?89.3G ?-
IDR-test/dataset at 1250606582 ? 246K ? ? ?- ?95.4G ?-
IDR-test/dataset at 1250612553 ? 233K ? ? ?- ?95.4G ?-
IDR-test/dataset at 1250616026 ? 230K ? ? ?- ? 100G ?-
pkraus at nyc-sted1:/IDR-test/ppk>

There are about 3.3 million files / directories in the
''dataset'',
files range in size from 1 KB to 100 KB.

pkraus at nyc-sted1:/IDR-test/ppk> time sudo zfs send
IDR-test/dataset at 1250616026 >/dev/null

real ? ?91m19.024s
user ? ?0m0.022s
sys ? ? 11m51.422s
pkraus at nyc-sted1:/IDR-test/ppk>

Which translates to a little over 18 MB/sec. and 600 files/sec. That
would mean almost 16 hours per TB. Better, but not much better than
NBU.

I do not think the SE-3511 is limiting us, as I have seen much higher
throughput on them when resilvering one or more mirrors.

Any thoughts as to why I am not getting better throughput ?

Thanks.

--
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Designer, "The Pajama Game" @ Schenectady Light Opera
Company
( http://www.sloctheater.org/ )
-> Technical Advisor, Lunacon 2010 (http://www.lunacon.org/)
-> Technical Advisor, RPI Players

Joseph L. Casale

2009-Aug-18 22:02 UTC

head link

[zfs-discuss] zfs send speed

>Is the speed of a ''zfs send'' dependant on file size /
number of files ?
I am going to say no, I have *far* inferior iron that I am running a backup
rig on, and doing a send/recv over ssh through gige and last night''s
replication
gave the following: "received 40.2GB stream in 3498 seconds
(11.8MB/sec)"
I have seen it as high as your figures but usually between this and your number.

I assumed it was a result of the ssh overhead (arcfour yielded the best
results).
>There are about 3.3 million files / directories in the
''dataset'',
>files range in size from 1 KB to 100 KB.
The number of files I am replicating would be ~100!

jlc

Nicolas Williams

2009-Aug-18 23:43 UTC

head link

[zfs-discuss] zfs send speed

On Tue, Aug 18, 2009 at 04:22:19PM -0400, Paul Kraus
wrote:> ? ? ? ?We have a system with some large datasets (3.3 TB and about 35
> million files) and conventional backups take a long time (using
> Netbackup 6.5 a FULL takes between two and three days, differential
> incrementals, even with very few files changing, take between 15 and
> 20 hours). We already use snapshots for day to day restores, but we
> need the ''real'' backups for DR.
zfs send will be very fast for "differential incrementals ... with very
few files changing" since zfs send is a block-level diff based on the
differences between the selected snapshots.  Where a traditional backup
tool would have to traverse the entire filesystem (modulo pruning based
on ctime/mtime), zfs send simply traverses a list of changed blocks
that''s kept up by ZFS as you make changes in the first place.

For a *full* backup zfs send and traditional backup tools will have
similar results as both will be I/O bound and both will have more or
less the same number of I/Os to do.

Caveat: zfs send formats are not guraranteed to be backwards
compatible, therefore zfs send is not suitable for long-term backups.

Nico
--

Mattias Pantzare

2009-Aug-19 00:54 UTC

head link

[zfs-discuss] zfs send speed

On Tue, Aug 18, 2009 at 22:22, Paul Kraus<pk1048 at gmail.com>
wrote:> Posted from the wrong address the first time, sorry.
>
> Is the speed of a ''zfs send'' dependant on file size /
number of files ?
>
> ? ? ? ?We have a system with some large datasets (3.3 TB and about 35
> million files) and conventional backups take a long time (using
> Netbackup 6.5 a FULL takes between two and three days, differential
> incrementals, even with very few files changing, take between 15 and
> 20 hours). We already use snapshots for day to day restores, but we
> need the ''real'' backups for DR.
Conventional backups can be faster that that! I have not used
netbackup but you should be able to configure netbackup to run several
backup streams in parallel. You may have to point netbackup to subdirs
instead of the file system root.

Mike Gerdts

2009-Aug-19 02:33 UTC

head link

[zfs-discuss] zfs send speed

On Tue, Aug 18, 2009 at 7:54 PM, Mattias Pantzare<pantzer at ludd.ltu.se>
wrote:> On Tue, Aug 18, 2009 at 22:22, Paul Kraus<pk1048 at gmail.com> wrote:
>> Posted from the wrong address the first time, sorry.
>>
>> Is the speed of a ''zfs send'' dependant on file size /
number of files ?
>>
>> ? ? ? ?We have a system with some large datasets (3.3 TB and about 35
>> million files) and conventional backups take a long time (using
>> Netbackup 6.5 a FULL takes between two and three days, differential
>> incrementals, even with very few files changing, take between 15 and
>> 20 hours). We already use snapshots for day to day restores, but we
>> need the ''real'' backups for DR.
>
> Conventional backups can be faster that that! I have not used
> netbackup but you should be able to configure netbackup to run several
> backup streams in parallel. You may have to point netbackup to subdirs
> instead of the file system root.
This was discussed in another thread as well.

http://opensolaris.org/jive/thread.jspa?threadID=109751&tstart=0

In particular...

http://opensolaris.org/jive/thread.jspa?threadID=109751&tstart=0#405121
http://opensolaris.org/jive/thread.jspa?threadID=109751&tstart=0#404589
http://opensolaris.org/jive/thread.jspa?threadID=109751&tstart=0#405835
http://opensolaris.org/jive/thread.jspa?threadID=109751&tstart=0#405308

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Paul Kraus

2009-Aug-19 15:43 UTC

head link

[zfs-discuss] zfs send speed

Thank you for all your replies, I''m collecting my responses in one
message below:

On Tue, Aug 18, 2009 at 7:43 PM, Nicolas
Williams<Nicolas.Williams at sun.com> wrote:> On Tue, Aug 18, 2009 at 04:22:19PM -0400, Paul Kraus wrote:
>> ? ? ? ?We have a system with some large datasets (3.3 TB and about 35
>> million files) and conventional backups take a long time (using
>> Netbackup 6.5 a FULL takes between two and three days, differential
>> incrementals, even with very few files changing, take between 15 and
>> 20 hours). We already use snapshots for day to day restores, but we
>> need the ''real'' backups for DR.
>
> zfs send will be very fast for "differential incrementals ... with
very
> few files changing" since zfs send is a block-level diff based on the
> differences between the selected snapshots. ?Where a traditional backup
> tool would have to traverse the entire filesystem (modulo pruning based
> on ctime/mtime), zfs send simply traverses a list of changed blocks
> that''s kept up by ZFS as you make changes in the first place.
Our testing indicates that for incremental zfs send the speed is very
good, and seems to be bandwidth limited and not limited by file count.
For example, while testing incremental sends I got the following
results:

~450,000 files sent, ~8.3 GB sent @ 690 files/sec. and 13 MB/sec.
~900,000 files sent, ~13 GB sent @ 890 files/sec. and 13 MB/sec.
~450,000 files sent, ~ 4.6 GB sent @ 1,800 files/sec. and 19 MB/sec.

Full zfs sends produced:

~2.5 million files, ~87 GB @ 500 files/sec. and 18 MB/sec.
~3.4 million files, ~ 100 GB @ 600 files/sec. and 19 MB/sec.
> For a *full* backup zfs send and traditional backup tools will have
> similar results as both will be I/O bound and both will have more or
> less the same number of I/Os to do.
The zfs send FULLS are in close agreement with what we are seeing with
a FULL NBU backup.
> Caveat: zfs send formats are not guraranteed to be backwards
> compatible, therefore zfs send is not suitable for long-term backups.
        Yup, we only need them for 5 weeks, and when we upgrade the
server (and ZFS version) we would need to do a new set of fulls.

On Tue, Aug 18, 2009 at 8:54 PM,  Mattias Pantzare <pantzer at
ludd.ltu.se> wrote:
> Conventional backups can be faster that that! I have not used
> netbackup but you should be able to configure netbackup to run several
> backup streams in parallel. You may have to point netbackup to subdirs
> instead of the file system root.
        We have over 180 filesystems on the production server right
now, we are really trying to avoid any manual customization of the
backup policy. In a previous incarnation this data lived on a Mac OS X
server in one FS (only about 4 TB total at that point), full backups
took so long that we manually configured three NBU policies with many
individual directories ... it was a nighmare as new data (and
directories) were added.

On Tue, Aug 18, 2009 at 10:33 PM, Mike Gerdts <mgerdts at gmail.com>
wrote:
> This was discussed in another thread as well.
>
> http://opensolaris.org/jive/thread.jspa?threadID=109751&tstart=0
        Thanks for that pointer. I had missed that thread in my
search, I just hadn''t hit the right keywords. This thread got me
thinking about our data layout. Currently the data broken up by both
department and project. Each department gets a zpool and each project
within the department a dataset/zfs. Departments range in size from
one mirrored pair of LUNs (512 GB) to 11 mirrored pairs of LUNs (5.5
TB). Projects range from a few KB to 3.3 TB (and 33 million files).
The data is all relatively small, images of documents, but there are
many, many of them.

        Is there any throughput penalty for the dataset being part of
a bigger zpool ? In other words, am I more likely to get better FULL
throughput if I move the data to a dedicated zpool instead of a child
dataset ? We *can* change our model to assign each project a separate
zpool, but that would be wasteful of space. Perhaps move a given
project to it''s own zpool when it grows to a certain size (>1 TB
maybe). But, if there would not be any performance advantage, it''s not
worth the effort.

        I had assumed that a full zfs send would just stream the
underlying zfs structure and not really deal with individual files,
but if the dataset is part of a shared zpool then I guess it has to
look at the files'' metadata to determine if a given file is part of
that dataset.

P.S. We are planning to move the backend stoage to JBOD (probably
J4400), but that is not where we are today, and we can''t count on that
happening soon.

-- 
{--------1---------2---------3---------4---------5---------6---------7---------}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Designer, "The Pajama Game" @ Schenectady Light Opera
Company
( http://www.sloctheater.org/ )
-> Technical Advisor, Lunacon 2010 (http://www.lunacon.org/)
-> Technical Advisor, RPI Players

Richard Elling

2009-Aug-19 22:32 UTC

head link

[zfs-discuss] zfs send speed

On Aug 18, 2009, at 1:16 PM, Paul Kraus wrote:
> Is the speed of a ''zfs send'' dependant on file size /
number of
> files ?
Not directly. It is dependent on the amount of changes per unit time.
>        We have a system with some large datasets (3.3 TB and about 35
> million files) and conventional backups take a long time (using
> Netbackup 6.5 a FULL takes between two and three days, differential
> incrementals, even with very few files changing, take between 15 and
> 20 hours). We already use snapshots for day to day restores, but we
> need the ''real'' backups for DR.
This is quite common.
>        I have been testing zfs send throughput and have not been
> getting promising results. Note that this is NOT OpenSolaris, but
> Solaris 10U6 (10/08) with the IDR for the snapshot interrupts resilver
> bug.
You will need to do this in parallel.

We have had some discussions about a possible white paper on this
topic, but as yet there is no funding.  So, it will remain in the  
world of
professional services for the time being.
  -- richard

Ian Collins

2009-Aug-20 10:55 UTC

head link

[zfs-discuss] zfs send speed

Paul Kraus wrote:> There are about 3.3 million files / directories in the
''dataset'',
> files range in size from 1 KB to 100 KB.
>
> pkraus at nyc-sted1:/IDR-test/ppk> time sudo zfs send
> IDR-test/dataset at 1250616026 >/dev/null
>
> real    91m19.024s
> user    0m0.022s
> sys     11m51.422s
> pkraus at nyc-sted1:/IDR-test/ppk>
>
> Which translates to a little over 18 MB/sec. and 600 files/sec. That
> would mean almost 16 hours per TB. Better, but not much better than
> NBU.
>
> I do not think the SE-3511 is limiting us, as I have seen much higher
> throughput on them when resilvering one or more mirrors.
>
> Any thoughts as to why I am not getting better throughput ?
>   With Solaris 10U7 I see about 35MB/sec between Thumpers using a direct 
socket connection rather than ssh for full sends and 7-12MB/sec for 
incrementals, depending on the data set.

-- 
Ian.

Joseph L. Casale

2009-Aug-20 13:37 UTC

head link

[zfs-discuss] zfs send speed

>With Solaris 10U7 I see about 35MB/sec between Thumpers using a direct
>socket connection rather than ssh for full sends and 7-12MB/sec for
>incrementals, depending on the data set.
Ian,
What''s the syntax you use for this procedure?

Ian Collins

2009-Aug-20 19:26 UTC

head link

[zfs-discuss] zfs send speed

Joseph L. Casale wrote:>> With Solaris 10U7 I see about 35MB/sec between Thumpers using a direct
>> socket connection rather than ssh for full sends and 7-12MB/sec for
>> incrementals, depending on the data set.
>>     
>
> Ian,
> What''s the syntax you use for this procedure?
>   I have my own application that uses large circular buffers and a socket 
connection between hosts.  The buffers keep data flowing during ZFS 
writes and the direct connection cuts out ssh.

-- 
Ian.

Joseph L. Casale

2009-Aug-20 21:56 UTC

head link

[zfs-discuss] zfs send speed

>I have my own application that uses large circular buffers and a socket
>connection between hosts.  The buffers keep data flowing during ZFS
>writes and the direct connection cuts out ssh.
Application, as in not script (something you can share)?

:)
jlc

Ian Collins

2009-Aug-21 07:29 UTC

head link

[zfs-discuss] zfs send speed

Joseph L. Casale wrote:>> I have my own application that uses large circular buffers and a socket
>> connection between hosts.  The buffers keep data flowing during ZFS
>> writes and the direct connection cuts out ssh.
>>     
>
> Application, as in not script (something you can share)?
>   
Not yet!

-- 
Ian.

zfs discuss - Aug 2009 - zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed

[zfs-discuss] zfs send speed