thr3ads.net - zfs discuss - [zfs-discuss] Announce: zfsdump [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Tristram Scott

2010-Jun-28 15:26 UTC

[zfs-discuss] Announce: zfsdump

For quite some time I have been using zfs send -R fsname at snapname | dd
of=/dev/rmt/1ln to make a tape backup of my zfs file system.  A few weeks back
the size of the file system grew to larger than would fit on a single DAT72
tape, and I once again searched for a simple solution to allow dumping of a zfs
file system to multiple tapes.  Once again I was disappointed...

I expect there are plenty of other ways this could have been handled, but none
leapt out at me.  I didn''t want to pay large sums of cash for a
commercial backup product, and I didn''t see that Amanda would be an
easy thing to fit into my existing scripts.  In particular, (and I could well be
reading this incorrectly) it seems that the commercial products, Amanda, star,
all are dumping the zfs file system file by file (with or without ACLs).  I
found none which would allow me to dump the file system and its snapshots,
unless I used zfs send to a scratch disk, and dumped to tape from there.  But,
of course, that assumes I have a scratch disk large enough.

So, I have implemented zfsdump as a ksh script.  The method is as follows:
1. Make a bunch of fifos.
2. Pipe the stream from zfs send to split, with split writing to the fifos (in
sequence).
3. Use dd to copy from the fifos to tape(s).

When the first tape is complete, zfsdump returns.  One then calls it again,
specifying that the second tape is to be used, and so on.
>From the man page:
     Example 1.  Dump the @Tues snapshot of the  tank  filesystem
     to  the  non-rewinding,  non-compressing  tape,  with a 36GB
     capacity:

          zfsdump -z tank at Tues -a "-R" -f /dev/rmt/1ln  -s  36864
-t 0

     For the second tape:

          zfsdump -z tank at Tues -a "-R" -f /dev/rmt/1ln  -s  36864
-t 1

If you would like to try it out, download the package from:
http://www.quantmodels.co.uk/zfsdump/

I have packaged it up, so do the usual pkgadd stuff to install.

Please, though, [b]try this out with caution[/b].  Build a few test file
systems, and see that it works for you.
[b]It comes without warranty of any kind.[/b]


Tristram
-- 
This message posted from opensolaris.org

Brian Kolaci

2010-Jun-28 15:51 UTC

head link

[zfs-discuss] Announce: zfsdump

I use Bacula which works very well (much better than Amanda did).
You may be able to customize it to do direct zfs send/receive, however I find
that although they are great for copying file systems to other machines, they
are inadequate for backups unless you always intend to restore the whole file
system.  Most people want to restore a file or directory tree of files, not a
whole file system.  In the past 25 years of backups and restores, I''ve
never had to restore a whole file system.  I get requests for a few files, or
somebody''s mailbox or somebody''s http document root.
You can directly install it from CSW (or blastwave).

On 6/28/2010 11:26 AM, Tristram Scott wrote:> For quite some time I have been using zfs send -R fsname at snapname | dd
of=/dev/rmt/1ln to make a tape backup of my zfs file system.  A few weeks back
the size of the file system grew to larger than would fit on a single DAT72
tape, and I once again searched for a simple solution to allow dumping of a zfs
file system to multiple tapes.  Once again I was disappointed...
>
> I expect there are plenty of other ways this could have been handled, but
none leapt out at me.  I didn''t want to pay large sums of cash for a
commercial backup product, and I didn''t see that Amanda would be an
easy thing to fit into my existing scripts.  In particular, (and I could well be
reading this incorrectly) it seems that the commercial products, Amanda, star,
all are dumping the zfs file system file by file (with or without ACLs).  I
found none which would allow me to dump the file system and its snapshots,
unless I used zfs send to a scratch disk, and dumped to tape from there.  But,
of course, that assumes I have a scratch disk large enough.
>
> So, I have implemented zfsdump as a ksh script.  The method is as follows:
> 1. Make a bunch of fifos.
> 2. Pipe the stream from zfs send to split, with split writing to the fifos
(in sequence).
> 3. Use dd to copy from the fifos to tape(s).
>
> When the first tape is complete, zfsdump returns.  One then calls it again,
specifying that the second tape is to be used, and so on.
>
>  From the man page:
>
>       Example 1.  Dump the @Tues snapshot of the  tank  filesystem
>       to  the  non-rewinding,  non-compressing  tape,  with a 36GB
>       capacity:
>
>            zfsdump -z tank at Tues -a "-R" -f /dev/rmt/1ln  -s 
36864 -t 0
>
>       For the second tape:
>
>            zfsdump -z tank at Tues -a "-R" -f /dev/rmt/1ln  -s 
36864 -t 1
>
> If you would like to try it out, download the package from:
> http://www.quantmodels.co.uk/zfsdump/
>
> I have packaged it up, so do the usual pkgadd stuff to install.
>
> Please, though, [b]try this out with caution[/b].  Build a few test file
systems, and see that it works for you.
> [b]It comes without warranty of any kind.[/b]
>
>
> Tristram

Tristram Scott

2010-Jun-28 16:26 UTC

head link

[zfs-discuss] Announce: zfsdump

> I use Bacula which works very well (much better than
> Amanda did).
> You may be able to customize it to do direct zfs
> send/receive, however I find that although they are
> great for copying file systems to other machines,
> they are inadequate for backups unless you always
> intend to restore the whole file system.  Most people
> want to restore a file or directory tree of files,
> not a whole file system.  In the past 25 years of
> backups and restores, I''ve never had to restore a
> whole file system.  I get requests for a few files,
> or somebody''s mailbox or somebody''s http document
> root.
> You can directly install it from CSW (or blastwave).
Thanks for your comments, Brian.  I should look at Bacula in more detail.

As for full restore versus ad hoc requests for files I just deleted, my
experience is mostly similar to yours, although I have had need for full system
restore more than once.

For the restore of a few files here and there, I believe this is now well
handled with zfs snapshots.  I have always found these requests to be down to
human actions.  The need for full system restore has (almost) always been
hardware failure.

If the file was there an hour ago, or yesterday, or last week, or last month,
then we have it in a snapshot.

If the disk died horribly during a power outage (grrr!) then it would be very
nice to be able to restore not only the full file system, but also the snapshots
too.  The only way I know of achieving that is by using zfs send etc.
> 
> On 6/28/2010 11:26 AM, Tristram Scott wrote:[snip]
> >
> > Tristram
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>-- 
This message posted from opensolaris.org

Brian Kolaci

2010-Jun-28 17:00 UTC

head link

[zfs-discuss] Announce: zfsdump

On Jun 28, 2010, at 12:26 PM, Tristram Scott wrote:
>> I use Bacula which works very well (much better than
>> Amanda did).
>> You may be able to customize it to do direct zfs
>> send/receive, however I find that although they are
>> great for copying file systems to other machines,
>> they are inadequate for backups unless you always
>> intend to restore the whole file system.  Most people
>> want to restore a file or directory tree of files,
>> not a whole file system.  In the past 25 years of
>> backups and restores, I''ve never had to restore a
>> whole file system.  I get requests for a few files,
>> or somebody''s mailbox or somebody''s http document
>> root.
>> You can directly install it from CSW (or blastwave).
> 
> Thanks for your comments, Brian.  I should look at Bacula in more detail.
> 
> As for full restore versus ad hoc requests for files I just deleted, my
experience is mostly similar to yours, although I have had need for full system
restore more than once.
> 
> For the restore of a few files here and there, I believe this is now well
handled with zfs snapshots.  I have always found these requests to be down to
human actions.  The need for full system restore has (almost) always been
hardware failure.
> 
> If the file was there an hour ago, or yesterday, or last week, or last
month, then we have it in a snapshot.
> 
> If the disk died horribly during a power outage (grrr!) then it would be
very nice to be able to restore not only the full file system, but also the
snapshots too.  The only way I know of achieving that is by using zfs send etc.
> 
I like snapshots when I''m making a major change to the system or for
cloning.  So to me, snapshots are good for transaction based operations.  Such
as stopping & flushing a database, take a snapshot, then resume the
database.  Then you can back up the snapshot with Bacula and destroy the
snapshot when the backup is complete.  I have Bacula configured with a
pre-backup and post-backup scripts to do just that.  When you do the restore, it
will create something that "looks" like a snapshot from the file
system perspective, but isn''t really one.

But if you''re looking for a copy of a file from a specific date, Bacula
retains that.  In fact you specify the retention period you want and
you''ll have access to any/all individual files on a per date basis. 
You can retain the files for months or years if you like, and you specify that
in the Bacula config file as to how long you want to keep the tapes around.  So
it really comes down to your use-case.

Edward Ned Harvey

2010-Jun-29 02:30 UTC

head link

[zfs-discuss] Announce: zfsdump

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Tristram Scott
> 
> If you would like to try it out, download the package from:
> http://www.quantmodels.co.uk/zfsdump/
I haven''t tried this yet, but thank you very much!

Other people have pointed out bacula is able to handle multiple tapes, and
individual file restores.  However, the disadvantage of
bacula/tar/cpio/rsync etc is that they all have to walk the entire
filesystem searching for things that have changed.

The advantage of "zfs send" (assuming incremental backups) is that it
already knows what''s changed, and it can generate a continuous
datastream
almost instantly.  Something like 1-2 orders of magnitude faster per
incremental backup.

Asif Iqbal

2010-Jun-29 04:07 UTC

head link

[zfs-discuss] Announce: zfsdump

On Mon, Jun 28, 2010 at 11:26 AM, Tristram Scott
<tristram.scott at quantmodels.co.uk> wrote:> For quite some time I have been using zfs send -R fsname at snapname | dd
of=/dev/rmt/1ln to make a tape backup of my zfs file system. ?A few weeks back
the size of the file system grew to larger than would fit on a single DAT72
tape, and I once again searched for a simple solution to allow dumping of a zfs
file system to multiple tapes. ?Once again I was disappointed...
>
> I expect there are plenty of other ways this could have been handled, but
none leapt out at me. ?I didn''t want to pay large sums of cash for a
commercial backup product, and I didn''t see that Amanda would be an
easy thing to fit into my existing scripts. ?In particular, (and I could well be
reading this incorrectly) it seems that the commercial products, Amanda, star,
all are dumping the zfs file system file by file (with or without ACLs). ?I
found none which would allow me to dump the file system and its snapshots,
unless I used zfs send to a scratch disk, and dumped to tape from there. ?But,
of course, that assumes I have a scratch disk large enough.
>
> So, I have implemented zfsdump as a ksh script. ?The method is as follows:
> 1. Make a bunch of fifos.
> 2. Pipe the stream from zfs send to split, with split writing to the fifos
(in sequence).
would be nice if i could pipe the zfs send stream to a split and then
send of those splitted stream over the
network to a remote system. it would help sending it over to remote
system quicker. can your tool do that?

something like this

                           s | -----> | j
          zfs send     p | -----> | o   zfs recv
           (local)       l  | -----> | i    (remote)
                           t  | -----> | n

> 3. Use dd to copy from the fifos to tape(s).
>
> When the first tape is complete, zfsdump returns. ?One then calls it again,
specifying that the second tape is to be used, and so on.
>
> From the man page:
>
> ? ? Example 1. ?Dump the @Tues snapshot of the ?tank ?filesystem
> ? ? to ?the ?non-rewinding, ?non-compressing ?tape, ?with a 36GB
> ? ? capacity:
>
> ? ? ? ? ?zfsdump -z tank at Tues -a "-R" -f /dev/rmt/1ln ?-s
?36864 -t 0
>
> ? ? For the second tape:
>
> ? ? ? ? ?zfsdump -z tank at Tues -a "-R" -f /dev/rmt/1ln ?-s
?36864 -t 1
>
> If you would like to try it out, download the package from:
> http://www.quantmodels.co.uk/zfsdump/
>
> I have packaged it up, so do the usual pkgadd stuff to install.
>
> Please, though, [b]try this out with caution[/b]. ?Build a few test file
systems, and see that it works for you.
> [b]It comes without warranty of any kind.[/b]
>
>
> Tristram
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Tristram Scott

2010-Jun-29 12:17 UTC

head link

[zfs-discuss] Announce: zfsdump

> 
> would be nice if i could pipe the zfs send stream to
> a split and then
> send of those splitted stream over the
> network to a remote system. it would help sending it
> over to remote
> system quicker. can your tool do that?
> 
> something like this
> 
>                            s | -----> | j
> -----> | o   zfs recv
>            (local)       l  | -----> | i    (remote)
>                 t  | -----> | n
>  copy from the fifos to tape(s).
> 
> Asif Iqbal
I did look at doing this, with the intention of allowing simultaneous streams to
multiple tape drives, but put the idea to one side.

I thought of providing interleaved streams, but wasn''t happy with the
idea that the whole process would block when one of the pipes stalled.

I also contemplated dividing the stream into several large chunks, but for them
to run simultaneously that seemed to require several reads of the original dump
stream.  Besides the expense of this approach,  I am not certain that repeated
zfs send streams have exactly the same byte content.

I think that probably the best approach would be the interleaved streams.

That said, I am not sure how this would necessarily help with the situation you
describe.  Isn''t the limiting factor going to be the network bandwidth
between remote machines?  Won''t you end up with four streams running at
quarter speed?
-- 
This message posted from opensolaris.org

Asif Iqbal

2010-Jun-29 14:19 UTC

head link

[zfs-discuss] Announce: zfsdump

On Tue, Jun 29, 2010 at 8:17 AM, Tristram Scott
<tristram.scott at quantmodels.co.uk> wrote:>>
>> would be nice if i could pipe the zfs send stream to
>> a split and then
>> send of those splitted stream over the
>> network to a remote system. it would help sending it
>> over to remote
>> system quicker. can your tool do that?
>>
>> something like this
>>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ?s | -----> | j
>> -----> | o ? zfs recv
>> ? ? ? ? ? ?(local) ? ? ? l ?| -----> | i ? ?(remote)
>> ? ? ? ? ? ? ? ? t ?| -----> | n
>> ?copy from the fifos to tape(s).
>>
>
>> Asif Iqbal
>
> I did look at doing this, with the intention of allowing simultaneous
streams to multiple tape drives, but put the idea to one side.
>
> I thought of providing interleaved streams, but wasn''t happy with
the idea that the whole process would block when one of the pipes stalled.
>
> I also contemplated dividing the stream into several large chunks, but for
them to run simultaneously that seemed to require several reads of the original
dump stream. ?Besides the expense of this approach, ?I am not certain that
repeated zfs send streams have exactly the same byte content.
>
> I think that probably the best approach would be the interleaved streams.
>
> That said, I am not sure how this would necessarily help with the situation
you describe. ?Isn''t the limiting factor going to be the network
bandwidth between remote machines? ?Won''t you end up with four streams
running at quarter speed?
if, for example, the network pipe is bigger then one unsplitted stream
of zfs send | zfs recv then splitting it to multiple streams should
optimize the network bandwidth, shouldn''t it ?

> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>


-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Kyle McDonald

2010-Jun-29 14:31 UTC

head link

[zfs-discuss] Announce: zfsdump

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 6/28/2010 10:30 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Tristram Scott
>>
>> If you would like to try it out, download the package from:
>> http://www.quantmodels.co.uk/zfsdump/
> 
> I haven''t tried this yet, but thank you very much!
> 
> Other people have pointed out bacula is able to handle multiple tapes, and
> individual file restores.  However, the disadvantage of
> bacula/tar/cpio/rsync etc is that they all have to walk the entire
> filesystem searching for things that have changed.
A compromise here might be to feed those tools the output from the new
ZFS diff command (which ''diffs'' 2 snapshots.) when it arrives.

That might get somethign close to "the best of both worlds".

 -Kyle
> 
> The advantage of "zfs send" (assuming incremental backups) is
that it
> already knows what''s changed, and it can generate a continuous
datastream
> almost instantly.  Something like 1-2 orders of magnitude faster per
> incremental backup.
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (MingW32)

iQEcBAEBAgAGBQJMKgO2AAoJEEADRM+bKN5wqF0IAJMN1+41+WSEy8qR4QrxFkPc
VgHv976VjY/mf2EujeSLQOwHEzx4bEfAnA7DjehQqim0YXSvo5jIDXwEZYkoCBaU
TsD6RQucks23fJUhsf0XKZNXZkpe7dqxGFXbOVd8so12LoYaB4/ZfZMdaQrhOHX8
CwyjS22YCvgxYTEUXs52RSwBg8Qw/sxjMYNa2D/iJPgZ8qtezNiiJD3bb8b30TRy
0YFHnAaC6V4/iyDvh+NpixPflaLMFmCkSh55zK1rBVHNJ7npUpZEFAKUZOXq/q38
bttGomj5gJSaoI8u8NGqADuh4Bk7JbkqKncXGJ6gxwW0pyIEplI3tS6yCTHgP/w=Hhu9
-----END PGP SIGNATURE-----

Tristram Scott

2010-Jun-29 14:43 UTC

head link

[zfs-discuss] Announce: zfsdump

> 
> if, for example, the network pipe is bigger then one
> unsplitted stream
> of zfs send | zfs recv then splitting it to multiple
> streams should
> optimize the network bandwidth, shouldn''t it ?
> 
Well, I guess so.  But I wonder, what is the bottle neck here.  If it is the
rate at which zfs send can stream data, there is a good chance that is limited
by disk read.  If we split it into four pipes, I still think you are going to
see four quarter rate reads.
-- 
This message posted from opensolaris.org

Tristram Scott

2010-Jun-29 16:39 UTC

head link

[zfs-discuss] Announce: zfsdump

evik wrote:
> Reading this list for a while made it clear that zfs send is not a
> backup solution, it can be used for cloning the filesystem to a backup
> array if you are consuming the stream with zfs receive so you get
> notified immediately about errors. Even one bitflip will render the
> stream unusable and you will loose all data, not just part of your
> backup cause zfs receive will restore the whole filesystem or nothing
> at all depending on the correctness of the stream.
> 
> You can use par2 or something similar to try to protect the stream
> against bit flips but that would require a lot of free storage space
> to recover from errors.
> 
> e
The all or nothing aspect does make me nervous, but there are things 
which can be done about it.  The first step, I think, is to calculate a 
checksum of the data stream(s).

      -k chkfile.
           Calculates MD5 checksums for  each  tape  and  for  the
           stream  as a whole. These are written to chkfile, or if
           specified as -, then to stdout.

Run the dump stream back through digest -a md5 and verify that it is intact.

Certainly, using an error correcting code could help us out, but at 
additional expense, both computational and storage.

Personally, for disaster recovery purposes, I think that verifying the 
data after writing to tape is good enough.  What I am looking to guard 
against is the unlikely event that I have a hardware (or software) 
failure, or serious human error.  This is okay with the zfs send stream, 
unless, of course, we get a data corruption on the tape.  I think the 
correlation between hardware failure today and tape corruption since 
yesterday / last week when I last backed up must be pretty small.

In the event that I reach for the tape and find it corrupted, I go back 
a week to the previous full dump stream.

Clearly the strength of the backup solution needs to match the value of 
the data, and especially the cost of not having the data.  For our large 
database applications we mirror to a remote location, and use tape 
backup.  But still, I find the ability to restore the zfs filesystem 
with all its snapshots very useful, which is why I choose to work with 
zfs send.

Tristram

Edward Ned Harvey

2010-Jun-30 16:54 UTC

head link

[zfs-discuss] Announce: zfsdump

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Asif Iqbal
> 
> would be nice if i could pipe the zfs send stream to a split and then
> send of those splitted stream over the
> network to a remote system. it would help sending it over to remote
> system quicker. can your tool do that?
Does that make sense?  I assume the network is the bottleneck; the only way
the multiple streams would go any faster than a single stream would be
because you''re multithreading and hogging all the bandwidth for
yourself,
instead of sharing fairly with the httpd or whatever other server is trying
to use the bandwidth.

If you''re talking about streaming to a bunch of separate tape drives
(or
whatever) on a bunch of separate systems because the recipient storage is
the bottleneck instead of the network ... then "split" probably
isn''t the
most useful way to distribute those streams.  Because "split" is
serial.
You would really want to "stripe" your data to all those various
destinations, so they could all be writing simultaneously.  But this seems
like a very specialized scenario, that I think is probably very unusual.

Asif Iqbal

2010-Jun-30 16:59 UTC

head link

[zfs-discuss] Announce: zfsdump

On Wed, Jun 30, 2010 at 12:54 PM, Edward Ned Harvey
<solaris2 at nedharvey.com> wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> bounces at opensolaris.org] On Behalf Of Asif Iqbal
>>
>> would be nice if i could pipe the zfs send stream to a split and then
>> send of those splitted stream over the
>> network to a remote system. it would help sending it over to remote
>> system quicker. can your tool do that?
>
> Does that make sense? ?I assume the network is the bottleneck; the only way
> the multiple streams would go any faster than a single stream would be
> because you''re multithreading and hogging all the bandwidth for
yourself,
> instead of sharing fairly with the httpd or whatever other server is trying
> to use the bandwidth.
currently to speed up the zfs send| zfs recv I am using mbuffer. It
moves the data
lot faster than using netcat (or ssh) as the transport method

that is why I thought may be transport it like axel does better than wget.
axel let you create multiple pipes, so you get the data multiple times faster
than with wget.

>
> If you''re talking about streaming to a bunch of separate tape
drives (or
> whatever) on a bunch of separate systems because the recipient storage is
> the bottleneck instead of the network ... then "split" probably
isn''t the
> most useful way to distribute those streams. ?Because "split" is
serial.
> You would really want to "stripe" your data to all those various
> destinations, so they could all be writing simultaneously. ?But this seems
> like a very specialized scenario, that I think is probably very unusual.
>
>


-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Edward Ned Harvey

2010-Jul-01 14:57 UTC

head link

[zfs-discuss] Announce: zfsdump

> From: Asif Iqbal [mailto:vadud3 at gmail.com]
> 
> currently to speed up the zfs send| zfs recv I am using mbuffer. It
> moves the data
> lot faster than using netcat (or ssh) as the transport method
Yup, this works because network and disk latency can both be variable.  So
without buffering, your data stream must instantaneously go the speed of
whichever is slower:  The disk or the network.

But when you use buffering, you''re able to go as fast as the network at
all
times.  You remove the effect of transient disk latency.

> that is why I thought may be transport it like axel does better than
> wget.
> axel let you create multiple pipes, so you get the data multiple times
> faster
> than with wget.
If you''re using axel to download something from the internet, the
reason
it''s faster than wget is because your data stream is competing against
all
the other users of the internet, to get something from that server across
some WAN.  Inherently, all the routers and servers on the internet will
treat each data stream fairly (except when explicitly configured to be
unfair.)  So when you axel some file from the internet using multiple
threads, instead of wget''ing with a single thread, you''re
unfairly hogging
the server and WAN bandwidth between your site and the remote site.  Slowing
down everyone else on the internet who are running with only 1 thread each.

Assuming your zfs send backup is going local, on a LAN, you almost certainly
do not want to do that.

If your zfs send is going across the WAN ... maybe you do want to
multithread the datastream.  But you better ensure it''s encrypted.

Daniel Carosone

2010-Jul-03 23:17 UTC

head link

[zfs-discuss] Announce: zfsdump

On Wed, Jun 30, 2010 at 12:54:19PM -0400, Edward Ned Harvey
wrote:> If you''re talking about streaming to a bunch of separate tape
drives (or
> whatever) on a bunch of separate systems because the recipient storage is
> the bottleneck instead of the network ... then "split" probably
isn''t the
> most useful way to distribute those streams.  Because "split" is
serial.
> You would really want to "stripe" your data to all those various
> destinations, so they could all be writing simultaneously.  But this seems
> like a very specialized scenario, that I think is probably very unusual.
At this point, I will repeat my recommendation about using
zpool-in-files as a backup (staging) target.  Depending where you
host, and how you combine the files, you can achieve these scenarios
without clunkery, and with all the benefits a zpool provides.

 1 - Create a bunch of files, sized appropriately for your eventual backup
     media unit (e.g. tape).  

 2 - make a zpool out of them, in whatever vdev arrangement suits your
     space and error tolerance needs (plain stripe or raidz or both).
     Set compression, dedup etc (encryption, one day) as suits you, too.

 3 - zfs send | zfs recv into this pool-of-files.  rsync from non-zfs
     hosts, too, if you like.

 4 - scrub, if you like

 5 - write the files to tape, or into whatever file-oriented backup
     solution you prefer (perhaps at a less frequent schedule than
     sends).

 6 - goto 3 (incremental sends for later updates)

I came up with this scheme when zpool was the only forwards-compatible
format, before the send stream format was a committed interface too.
However, there are still several other reasons why this is preferable
to backing up send streams directly.

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100704/08b11e66/attachment.bin>

Tristram Scott

2010-Jul-05 11:56 UTC

head link

[zfs-discuss] Announce: zfsdump

> At this point, I will repeat my recommendation about
> using
> zpool-in-files as a backup (staging) target.
>  Depending where you
> ost, and how you combine the files, you can achieve
> these scenarios
> without clunkery, and with all the benefits a zpool
> provides.
> 
This is another good scheme.

I see a number of points to consider when choosing amongst the various
suggestions for backing up zfs file systems.  In no particular order, I have
these:

1. Does it work in place, or need an intermediate copy on disk?
2. Does it respect ACLs?
3. Does it respect zfs snapshots?
4. Does it allow random access to files, or only full file system restore?
5. Can it (mostly) survive partial data corruption?
6. Can it handle file systems larger than a single tape?
7. Can it stream to multiple tapes in parallel?
8. Does it understand the concept of incremental backups?

I still see this as a serious gap in the offering of zfs.  Clearly so do many
other people, as there are a lot of methods offered to handle at least some of
the above.
-- 
This message posted from opensolaris.org

Joerg Schilling

2010-Jul-05 15:55 UTC

head link

[zfs-discuss] Announce: zfsdump

Tristram Scott <tristram.scott at quantmodels.co.uk> wrote:
> I see a number of points to consider when choosing amongst the various
suggestions for backing up zfs file systems.  In no particular order, I have
these:
Let me fill this out for star ;-)
> 1. Does it work in place, or need an intermediate copy on disk?
Yes
> 2. Does it respect ACLs?
not yet (because of missing interest from Sun)
If people show interest, a ZFS ACL implementation would not take much time
as there is already UFS ACL support in star.
> 3. Does it respect zfs snapshots?
Yes
Star recommends to run incrementals on snapshots. Star incrementals
will work correclty if the snapshot just creates a new filesystem ID but 
leaves inode numbers identical (this is how it works with UFS snapshots).
> 4. Does it allow random access to files, or only full file system restore?
Yes
> 5. Can it (mostly) survive partial data corruption?
Yes for data curruption in the archive, for data currupion in ZFS -> see ZFS
> 6. Can it handle file systems larger than a single tape?
Yes
> 7. Can it stream to multiple tapes in parallel?
There is Hardware for this task (check for "TAPE RAID")
> 8. Does it understand the concept of incremental backups?
Yes

And regarding the speed for incrementals:

A scan on a Sunfire X 4540 with a typical mix of small and large files (1.5 TB 
of filesystem data in 7.7 million files) takes 20 minutes. There seems to be a 
performance problem in the ZFS implementation: The data is made from 4 copies 
of identical file sets, each 370 GB in size and the performance degrades after 
some time. During parsing the first set of files, the performance is 4x higher, 
so this 1.5 TB test could have been finished in 5 minutes.....
This test was done with an empty cache. With a populated cache, the incremental
scan is much faster and takes only 4 minutes.

It seems that incrementals at user space level still are feasible.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de                (uni)  
       joerg.schilling at fokus.fraunhofer.de (work) Blog:
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily

zfs discuss - Jun 2010 - Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump

[zfs-discuss] Announce: zfsdump