thr3ads.net - zfs discuss - [zfs-discuss] ZFS send | verify

If this information is useful, please help other people find it:
Share via:

Edward Ned Harvey

2009-Dec-04 16:50 UTC

[zfs-discuss] ZFS send | verify | receive

If there were a ?zfs send? datastream saved someplace, is there a way to
verify the integrity of that datastream without doing a ?zfs receive? and
occupying all that disk space?

I am aware that ?zfs send? is not a backup solution, due to vulnerability of
even a single bit error, and lack of granularity, and other reasons.
However ...  There is an attraction to ?zfs send? as an augmentation to the
commercial backup tools we use, because ?zfs receive? doesn?t require any
special software packages or license keys to do a restore in the event of a
complete filesystem restore.  Hate that catch-22 when you can?t restore
because the backup tool is inside the backup file.

If we ever need to restore the complete dataset ... Most likely there will
be no error on the tapes, so if we have an error-free saved ?zfs send?
stream available, then ?zfs receive? would be the best possible tool to
recover the whole filesystem.

So the question is:  I?ve read the ?zfs manual? and I don?t see any ?zfs
verify? command.  The closest I see is the ?zfs receive ?n? command, but I
am not sure this command would actually checksum and verify the datastream.
Is there some way for me to verify a datastream without actually doing the
?zfs receive??

Thanks...
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091204/1356146a/attachment.html>

Julien Gabel

2009-Dec-04 17:08 UTC

head link

[zfs-discuss] ZFS send | verify | receive

> If there were a ?zfs send? datastream saved someplace, is there a way to
> verify the integrity of that datastream without doing a ?zfs receive? and
> occupying all that disk space?
Depending of your version of OS, I think the following post from Richard Elling
will be of great interest to you:
-
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.html

-- 
julien.
http://blog.thilelli.net/

Edward Ned Harvey

2009-Dec-05 00:11 UTC

head link

[zfs-discuss] ZFS send | verify | receive

> Depending of your version of OS, I think the following post from Richard
> Elling
> will be of great interest to you:
> - 
>
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.
> html
Thanks!  :-)
No, wait! ....

According to that page, if you "zfs receive -n" then you should get a
0 exit
status for success, and 1 for error.

Unfortunately, I''ve been sitting here and testing just now ...  I
created a
"zfs send" datastream, then I made a copy of it and toggled a bit in
the
middle to make it corrupt ...

I found that the "zfs receive -n" always returns 0 exit status, even
if the
data stream is corrupt.  In order to get the "1" exit status, you have
to
get rid of the "-n" which unfortunately means writing the completely
restored filesystem to disk.

I''ve sent a message to Richard to notify him of the error on his page. 
But
it would seem, the zstreamdump must be the only way to verify the integrity
of a stored data stream.  I haven''t tried it yet, and I''m out
of time for
today...

Sriram Narayanan

2009-Dec-05 04:23 UTC

head link

[zfs-discuss] ZFS send | verify | receive

If feasible, you may want to generate MD5 sums on the streamed output
and then use these for verification.

-- Sriram

On 12/5/09, Edward Ned Harvey <solaris at nedharvey.com>
wrote:>> Depending of your version of OS, I think the following post from
Richard
>> Elling
>> will be of great interest to you:
>> -
>>
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.
>> html
>
> Thanks!  :-)
> No, wait! ....
>
> According to that page, if you "zfs receive -n" then you should
get a 0 exit
> status for success, and 1 for error.
>
> Unfortunately, I''ve been sitting here and testing just now ...  I
created a
> "zfs send" datastream, then I made a copy of it and toggled a bit
in the
> middle to make it corrupt ...
>
> I found that the "zfs receive -n" always returns 0 exit status,
even if the
> data stream is corrupt.  In order to get the "1" exit status, you
have to
> get rid of the "-n" which unfortunately means writing the
completely
> restored filesystem to disk.
>
> I''ve sent a message to Richard to notify him of the error on his
page.  But
> it would seem, the zstreamdump must be the only way to verify the integrity
> of a stored data stream.  I haven''t tried it yet, and I''m
out of time for
> today...
>
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
-- 
Sent from my mobile device

Seth Heeren

2009-Dec-05 10:33 UTC

head link

[zfs-discuss] ZFS send | verify | receive

Well what does _that_ verify?

It will verify that no bits provably broke during transport.

It will still leave the chance of getting an incompatible stream, an
incomplete stream (kill the dump), or plain corrupted data. Of course,
the chance of the latter should be extremely small in server-grade hardware.

$0.02

Sriram Narayanan wrote:> If feasible, you may want to generate MD5 sums on the streamed output
> and then use these for verification.
>
> -- Sriram
>
> On 12/5/09, Edward Ned Harvey <solaris at nedharvey.com> wrote:
>   
>>> Depending of your version of OS, I think the following post from
Richard
>>> Elling
>>> will be of great interest to you:
>>> -
>>>
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.
>>> html
>>>       
>> Thanks!  :-)
>> No, wait! ....
>>
>> According to that page, if you "zfs receive -n" then you
should get a 0 exit
>> status for success, and 1 for error.
>>
>> Unfortunately, I''ve been sitting here and testing just now ...
I created a
>> "zfs send" datastream, then I made a copy of it and toggled a
bit in the
>> middle to make it corrupt ...
>>
>> I found that the "zfs receive -n" always returns 0 exit
status, even if the
>> data stream is corrupt.  In order to get the "1" exit status,
you have to
>> get rid of the "-n" which unfortunately means writing the
completely
>> restored filesystem to disk.
>>
>> I''ve sent a message to Richard to notify him of the error on
his page.  But
>> it would seem, the zstreamdump must be the only way to verify the
integrity
>> of a stored data stream.  I haven''t tried it yet, and
I''m out of time for
>> today...
>>
>>
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>     
>
>

Bob Friesenhahn

2009-Dec-05 15:22 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Sat, 5 Dec 2009, Sriram Narayanan wrote:
> If feasible, you may want to generate MD5 sums on the streamed output
> and then use these for verification.
You can also stream into a gzip or lzop wrapper in order to obtain the 
benefit of incremental CRCs and some compression as well.  As long as 
the wrapper is generated on the sending side (and not subject to 
problems like truncation) it should be quite useful for verifying that 
the stream has not been corrupted.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Richard Elling

2009-Dec-05 16:17 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Dec 4, 2009, at 4:11 PM, Edward Ned Harvey wrote:
>> Depending of your version of OS, I think the following post from  
>> Richard
>> Elling
>> will be of great interest to you:
>> -
>>
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams
>> .
>> html
>
> Thanks!  :-)
> No, wait! ....
>
> According to that page, if you "zfs receive -n" then you should
get
> a 0 exit
> status for success, and 1 for error.
>
> Unfortunately, I''ve been sitting here and testing just now ...  I
> created a
> "zfs send" datastream, then I made a copy of it and toggled a bit
in
> the
> middle to make it corrupt ...
>
> I found that the "zfs receive -n" always returns 0 exit status,
even
> if the
> data stream is corrupt.  In order to get the "1" exit status, you
> have to
> get rid of the "-n" which unfortunately means writing the
completely
> restored filesystem to disk.
I believe it will depend on the nature of the corruption.  Regardless,
the answer is to use zstreamdump.
  -- richard

dick hoogendijk

2009-Dec-05 16:42 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote:
> You can also stream into a gzip or lzop wrapper in order to obtain the 
> benefit of incremental CRCs and some compression as well.
Can you give an example command line for this option please?

Seth Heeren

2009-Dec-05 16:51 UTC

head link

[zfs-discuss] ZFS send | verify | receive

Bob Friesenhahn wrote:> On Sat, 5 Dec 2009, Sriram Narayanan wrote:
>
>> If feasible, you may want to generate MD5 sums on the streamed output
>> and then use these for verification.
>
> You can also stream into a gzip or lzop wrapper in order to obtain the
> benefit of incremental CRCs and some compression as well.  As long as
> the wrapper is generated on the sending side (and not subject to
> problems like truncation) it should be quite useful for verifying that
> the stream has not been corrupted.Same deal as with MD5 sums. It doesn''t guarantee that the stream is
''receivable'' on the receiver.
Now, unless your wrapper is able to retransmit on CRC error, a MD5 would
be vastly superior due to qualilty of error detection.
Both techniques would be optimal (although I''d suspect the compression
doesn''t help. I should think the send/recv streams will be compressed
as
it is).>
> Bob
> -- 
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
> http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bob Friesenhahn

2009-Dec-05 17:32 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Sat, 5 Dec 2009, dick hoogendijk wrote:
> On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote:
>
>> You can also stream into a gzip or lzop wrapper in order to obtain the
>> benefit of incremental CRCs and some compression as well.
>
> Can you give an example command line for this option please?
Something like

   zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz

should work nicely.  Zfs send sends to its standard output so it is 
just a matter of adding another filter program on its output.  This 
could be streamed over ssh or some other streaming network transfer 
protocol.

Later, you can do ''gzip -t mysnap.gz'' on the machine where the
snapshot file is stored to verify that it has not been corrupted in 
storage or transfer.

lzop (not part of Solaris) is much faster than gzip but can be used in 
a similar way since it is patterned after gzip.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Mike Gerdts

2009-Dec-05 19:03 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Sat, Dec 5, 2009 at 11:32 AM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Sat, 5 Dec 2009, dick hoogendijk wrote:
>
>> On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote:
>>
>>> You can also stream into a gzip or lzop wrapper in order to obtain
the
>>> benefit of incremental CRCs and some compression as well.
>>
>> Can you give an example command line for this option please?
>
> Something like
>
> ?zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz
>
> should work nicely. ?Zfs send sends to its standard output so it is just a
> matter of adding another filter program on its output. ?This could be
> streamed over ssh or some other streaming network transfer protocol.
>
> Later, you can do ''gzip -t mysnap.gz'' on the machine
where the snapshot file
> is stored to verify that it has not been corrupted in storage or transfer.
>
> lzop (not part of Solaris) is much faster than gzip but can be used in a
> similar way since it is patterned after gzip.
It seems as though a similar filter could be created to create and
inject an error correcting code into the stream.  That is:

zfs send $snap | ecc -i  > /somestorage/mysnap.ecc
ecc -o < /somestorage/mysnap | zfs receive ...

I''m not aware of an existing  ecc program, but I can''t imagine
it
would be hard to create one.  There seems to already be an
implementation of Reed-Solomon encoding in ON that could likely be
used as a starting point.

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Edward Ned Harvey

2009-Dec-06 15:28 UTC

head link

[zfs-discuss] ZFS send | verify | receive

> If feasible, you may want to generate MD5 sums on the streamed output
> and then use these for verification.
That''s actually not a bad idea.  It should be kinda obvious, but I
hadn''t
thought of it because it''s sort-of duplicating existing functionality.

I do have a "multipipe" script that behaves similar to "tee"
but "tee" can
only output to stdout and a file.  "multipipe" launches any number of
processes, and pipes stdin to all of the child processes.  I normally use
this when creating a large datastream ... I generate the datastream, and I
want to md5 the uncompressed datastream, and I also want to gzip the
uncompressed datastream.  I don''t want to generate the filestream
twice.
Then I will gunzip | md5 to check the sum.

I also have a "threadzip" script, because gzip is invariably the
bottleneck
in the data stream.  Utilize those extra cores!!!  ;-)

I plan to release these things open source soon, so if anyone has interest,
please let me know.

Edward Ned Harvey

2009-Dec-06 15:43 UTC

head link

[zfs-discuss] ZFS send | verify | receive

Where exactly do you get zstreamdump?
I found a link to zstreamdump.c ... but is that it?  Shouldn''t it be
part of
a source tarball or something?

Does it matter what OS?  Every reference I see for zstreamdump is about
opensolaris.  But I''m running solaris.

Colin Raven

2009-Dec-06 15:46 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Sat, Dec 5, 2009 at 17:17, Richard Elling <richard.elling at
gmail.com>wrote:
> On Dec 4, 2009, at 4:11 PM, Edward Ned Harvey wrote:
>
>  Depending of your version of OS, I think the following post from Richard
>>> Elling
>>> will be of great interest to you:
>>> -
>>>
>>>
http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams
>>> .
>>> html
>>>
>>
>> Thanks!  :-)
>> No, wait! ....
>>
>> According to that page, if you "zfs receive -n" then you
should get a 0
>> exit
>> status for success, and 1 for error.
>>
>> Unfortunately, I''ve been sitting here and testing just now ...
I created
>> a
>> "zfs send" datastream, then I made a copy of it and toggled a
bit in the
>> middle to make it corrupt ...
>>
>> I found that the "zfs receive -n" always returns 0 exit
status, even if
>> the
>> data stream is corrupt.  In order to get the "1" exit status,
you have to
>> get rid of the "-n" which unfortunately means writing the
completely
>> restored filesystem to disk.
>>
>
> I believe it will depend on the nature of the corruption.  Regardless,
> the answer is to use zstreamdump.

Richard, do you know of any usage examples of zfstreamdump? I''ve been
searching for examples since you posted this, and don''t see anything
that
shows how to use it in practice. argh.
-C
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091206/b0916a1f/attachment.html>

Bob Friesenhahn

2009-Dec-06 15:54 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Sun, 6 Dec 2009, Edward Ned Harvey wrote:>
> I also have a "threadzip" script, because gzip is invariably the
bottleneck
> in the data stream.  Utilize those extra cores!!!  ;-)
Gzip can be a bit slow.  Luckily there is ''lzop'' which is
quite a lot
more CPU efficient on i386 and AMD64, and even on SPARC.  If the 
compressor is able to keep up with the network and disk, then it is 
fast enough.  See "http://www.lzop.org/".

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Seth Heeren

2009-Dec-06 16:01 UTC

head link

[zfs-discuss] ZFS send | verify | receive

Edward Ned Harvey wrote:>> If feasible, you may want to generate MD5 sums on the streamed output
>> and then use these for verification.
>>     
>
> That''s actually not a bad idea.  It should be kinda obvious, but I
hadn''t
> thought of it because it''s sort-of duplicating existing
functionality.
>
> I do have a "multipipe" script that behaves similar to
"tee" but "tee" can
> only output to stdout and a file.  In my POSIX universe I can just do

    zfs send ... | pv | tee >(md5sum) >(sha256sum) | gzip | tee
>(md5sum> .md5.zipped) | ssh remote etc. etc.
> "multipipe" launches any number of
> processes, and pipes stdin to all of the child processes.  I normally use
> this when creating a large datastream ... I generate the datastream, and I
> want to md5 the uncompressed datastream, and I also want to gzip the
> uncompressed datastream.  I don''t want to generate the filestream
twice.
> Then I will gunzip | md5 to check the sum.
>
> I also have a "threadzip" script, because gzip is invariably the
bottleneck
> in the data stream.  Utilize those extra cores!!!  ;-)
>
> I plan to release these things open source soon, so if anyone has interest,
> please let me know.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Seth Heeren

2009-Dec-06 16:01 UTC

head link

[zfs-discuss] ZFS send | verify | receive

Bob Friesenhahn wrote:> On Sun, 6 Dec 2009, Edward Ned Harvey wrote:
>>
>> I also have a "threadzip" script, because gzip is invariably
the
>> bottleneck
>> in the data stream.  Utilize those extra cores!!!  ;-)
>
> Gzip can be a bit slow.  Luckily there is ''lzop'' which is
quite a lot
> more CPU efficient on i386 and AMD64, and even on SPARC.  If the
> compressor is able to keep up with the network and disk, then it is
> fast enough.  See "http://www.lzop.org/".
I use the excellent pbzip2

    zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...

Utilizes those 8 cores quite well :)>
> Bob
> -- 
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
> http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Edward Ned Harvey

2009-Dec-06 16:58 UTC

head link

[zfs-discuss] ZFS send | verify | receive

> Gzip can be a bit slow.  Luckily there is ''lzop'' which is
quite a lot
> more CPU efficient on i386 and AMD64, and even on SPARC.  If the
> compressor is able to keep up with the network and disk, then it is
> fast enough.  See "http://www.lzop.org/".
In my development/testing this week, I did "time zfs send | gzip --fast
>
somefile.gz" and also "time zfs send | threadzip --threads=8 >
somefile.tz"
...

Threadzip performed 10x faster (hardly a performance I expect from lzop) and
compressed about 2-3% smaller than gzip.  Also hardly a performance I could
expect from lzop.

The key is multiple cores.  I''m on an 8-core xeon.

As for "fast enough," the metric I''m using is:  Can the
compressor keep up
with IO?  I do this:  "time zfs send > /dev/null" and "time
zfs send |
[compressor] > /dev/null" to see if the compressor has an impact on
performance.

I''m only at rev 1.0 of threadzip, and it is *far* from optimized.  But
it''s
still an order of magnitude better than the alternatives.  So it''ll
only get
better from here.

Edward Ned Harvey

2009-Dec-06 17:15 UTC

head link

[zfs-discuss] ZFS send | verify | receive

> I use the excellent pbzip2
> 
>     zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...
> 
> Utilizes those 8 cores quite well :)
This (pbzip2) sounds promising, and it must be better than what I wrote.
;-)  But I don''t understand the syntax you''ve got above, using
tee,
redirecting to something in parens.  I haven''t been able to do this yet
on
my own system.  Can you please give me an example to simultaneously generate
md5sum and gzip?

This is how I currently do it:
cat somefile | multipipe "md5sum > somefile.md5sum" "gzip >
somefile.gz"
End result is:
	somefile
	somefile.md5sum
	somefile.gz

Bob Friesenhahn

2009-Dec-06 17:47 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Sun, 6 Dec 2009, Edward Ned Harvey wrote:> Threadzip performed 10x faster (hardly a performance I expect from lzop)
and
> compressed about 2-3% smaller than gzip.  Also hardly a performance I could
> expect from lzop.
>
> The key is multiple cores.  I''m on an 8-core xeon.
I am glad to see that you found a use for all those cores.

As a simple test here, on AMD64 and Solaris 10 I see 3.6X less CPU 
consumption from ''lzop -3'' than from ''gzip
-3''.  With lots of
background activity (zfs scrub of the pool), this increases to a 4X 
advantage.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Seth Heeren

2009-Dec-06 18:12 UTC

head link

[zfs-discuss] ZFS send | verify | receive

Edward Ned Harvey wrote:>> I use the excellent pbzip2
>>
>>     zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...
>>
>> Utilizes those 8 cores quite well :)
>>     
>
> This (pbzip2) sounds promising, and it must be better than what I wrote.
> ;-)  But I don''t understand the syntax you''ve got above,
using tee,
> redirecting to something in parens.  I haven''t been able to do
this yet on
> my own system.  Can you please give me an example to simultaneously
generate
> md5sum and gzip?
>
> This is how I currently do it:
> cat somefile | multipipe "md5sum > somefile.md5sum" "gzip
> somefile.gz"
> End result is:
> 	somefile
> 	somefile.md5sum
> 	somefile.gz
>
>   Well the theory is simple. "tee" is quite sufficient, because it will
not just operate on files. It will operate on _file descriptors_ big
difference. A file descriptor can point to a whole slew of things, among
which are files and pipes, socket files, fifo''s or whatever the heck
your brand of UNIX wants to call those.

Now, the shell usually gives you a lot of usual syntax for that

    ls > /dev/stderr
is usually a synonym for
    ls > /proc/self/fd/2

On to the topic of pipes...

You could make the ''anonymous'' filedescriptors that your shell
opens up
internally to link the pipe processes together, explicit like so:

    mkfifo /tmp/myzippipe
    mkfifo /tmp/myhashpipe
    (zfs send ... | tee /tmp/myzippipe /tmp/myhashpipe)&
    (cat /tmp/myzippipe | gzip > zipped_stream)&
    (cat /tmp/myhashpipe | md5sum > MD5SUMs)&
    wait
    unlink /tmp/my*pipe

All that is painfully verbose, leaves dangling fifo''s on errors, has
security issues (fifo''s on /tmp?) and looks like a clutch. It appears
that a number of shells (i think i remember using this on bash, sh, ksh)
support the nifty and obvious shorthand

    cat >(subshell command line)

which will be replaced (like in command line, environment, glob and
other expansion) by the proper filedescriptor like

    cat /dev/fd/23

Of course the actual number would be ''random''; depending on
shell,
processes running etc.

This makes your needed multi-tee a snap:

    cat my_log_file | tee >(gzip > my_log_file.gz) >(wc -l)
>(md5sum) |
sort | uniq -c

This will do all your hearts desires at once :) Note how the >(subshell)
notation allows you to do most anything your shell supports, including
using aliases, functions, redirection exactly like you would in
$(subshell) [1].

Well I''ll stop here, because I''m sure ''man
$0'' in your favourite shell
will tell you more info more pertinent without requiring quite so many
keystrokes on my part


Cheers,
Seth

[1] Beware that it _is_ a subshell, so you cannot update shell
variables, certain things will not be inherited from the parent shell
(especially in security restricted environments)

sgheeren

2009-Dec-06 18:14 UTC

head link

[zfs-discuss] ZFS send | verify | receive

Edward Ned Harvey wrote:>> I use the excellent pbzip2
>>
>>     zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ...
>>
>> Utilizes those 8 cores quite well :)
>>     
>
> This (pbzip2) sounds promising, and it must be better than what I wrote.
> ;-)  But I don''t understand the syntax you''ve got above,
using tee,
> redirecting to something in parens.  I haven''t been able to do
this yet on
> my own system.  Can you please give me an example to simultaneously
generate
> md5sum and gzip?
>
> This is how I currently do it:
> cat somefile | multipipe "md5sum > somefile.md5sum" "gzip
> somefile.gz"
> End result is:
> 	somefile
> 	somefile.md5sum
> 	somefile.gz
>So that would be

cat somefile | tee >(md5sum > somefile.md5sum) | gzip > somefile.gz

Amado Gramajo

2009-Dec-06 18:43 UTC

head link

[zfs-discuss] ZFS send | verify | receive

AcghhjkkNnmuUiui

----- Original Message -----
From: zfs-discuss-bounces at opensolaris.org <zfs-discuss-bounces at
opensolaris.org>
To: Edward Ned Harvey <solaris at nedharvey.com>
Cc: ZFS discuss <zfs-discuss at opensolaris.org>
Sent: Sun Dec 06 10:54:11 2009
Subject: Re: [zfs-discuss] ZFS send | verify | receive

On Sun, 6 Dec 2009, Edward Ned Harvey wrote:>
> I also have a "threadzip" script, because gzip is invariably the
bottleneck
> in the data stream.  Utilize those extra cores!!!  ;-)
Gzip can be a bit slow.  Luckily there is ''lzop'' which is
quite a lot
more CPU efficient on i386 and AMD64, and even on SPARC.  If the 
compressor is able to keep up with the network and disk, then it is 
fast enough.  See "http://www.lzop.org/".

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

**********************************************************************
This communication and all information (including, but not limited to,
 market prices/levels and data) contained therein (the "Information")
is
 for informational purposes only, is confidential, may be legally
 privileged and is the intellectual property of ICAP plc and its affiliates
 ("ICAP") or third parties. No confidentiality or privilege is waived
or
 lost by any mistransmission. The Information is not, and should not
 be construed as, an offer, bid or solicitation in relation to any
 financial instrument or as an official confirmation of any transaction.
 The Information is not warranted, including, but not limited, as to
 completeness, timeliness or accuracy and is subject to change
 without notice. ICAP assumes no liability for use or misuse of the
 Information. All representations and warranties are expressly
 disclaimed. The Information does not necessarily reflect the views of
 ICAP. Access to the Information by anyone else other than the
 recipient is unauthorized and any disclosure, copying, distribution or
 any action taken or omitted to be taken in reliance on it is prohibited. If
 you receive this message in error, please immediately delete it and all
 copies of it from your system, destroy any hard copies of it and
 notify the sender.
**********************************************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091206/a8cb0d26/attachment.html>

Julien Gabel

2009-Dec-06 20:53 UTC

head link

[zfs-discuss] ZFS send | verify | receive

>> Depending of your version of OS, I think the following post from
Richard
>> Elling will be of great interest to you:
> Where exactly do you get zstreamdump?
> I found a link to zstreamdump.c ... but is that it? ?Shouldn''t it
be part of
> a source tarball or something?
> Does it matter what OS? ?Every reference I see for zstreamdump is about
> opensolaris. ?But I''m running solaris.
OS means Operating System, or OpenSolaris.  This is in the second
meaning I wrote OS in my answer.  It was not obvious you were using
Solaris 10 though.  Sorry about that.

(FYI, zstreamdump seems to be an addition to build 125.)

-- 
julien.
http://blog.thilelli.net/

Edward Ned Harvey

2009-Dec-06 23:35 UTC

head link

[zfs-discuss] ZFS send | verify | receive

> OS means Operating System, or OpenSolaris.  This is in the second
> meaning I wrote OS in my answer.  It was not obvious you were using
> Solaris 10 though.  Sorry about that.
> 
> (FYI, zstreamdump seems to be an addition to build 125.)
Oh - I never connected OS to OpenSolaris.  ;-)

So I gather it''s not a downloadable item.  If zstreamdump is in your
operating system then great, and if not, it''s not available until you
upgrade your operating system.  Right?

Edward Ned Harvey

2009-Dec-07 00:02 UTC

head link

[zfs-discuss] ZFS send | verify | receive

> I see 3.6X less CPU
> consumption from ''lzop -3'' than from ''gzip
-3''.
Where do you get lzop from?  I don''t see any binaries on their site,
nor
blastwave, nor opencsw.  And I am having difficulty building it from source.

Bob Friesenhahn

2009-Dec-07 00:59 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Sun, 6 Dec 2009, Edward Ned Harvey wrote:
>> I see 3.6X less CPU
>> consumption from ''lzop -3'' than from ''gzip
-3''.
>
> Where do you get lzop from?  I don''t see any binaries on their
site, nor
> blastwave, nor opencsw.  And I am having difficulty building it from
source.
I just built it from source. :-)

First one has to build and install the lzo 2.03 library (from 
http://www.oberhumer.com/opensource/lzo/) and then build lzop.

I used GCC, but not the archaic version that Sun provides with Solaris 
10.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Richard Elling

2009-Dec-07 01:23 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Dec 5, 2009, at 11:03 AM, Mike Gerdts wrote:
> On Sat, Dec 5, 2009 at 11:32 AM, Bob Friesenhahn
> <bfriesen at simple.dallas.tx.us> wrote:
>> On Sat, 5 Dec 2009, dick hoogendijk wrote:
>>
>>> On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote:
>>>
>>>> You can also stream into a gzip or lzop wrapper in order to  
>>>> obtain the
>>>> benefit of incremental CRCs and some compression as well.
>>>
>>> Can you give an example command line for this option please?
>>
>> Something like
>>
>>  zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz
>>
>> should work nicely.  Zfs send sends to its standard output so it is  
>> just a
>> matter of adding another filter program on its output.  This could be
>> streamed over ssh or some other streaming network transfer protocol.
>>
>> Later, you can do ''gzip -t mysnap.gz'' on the machine
where the
>> snapshot file
>> is stored to verify that it has not been corrupted in storage or  
>> transfer.
>>
>> lzop (not part of Solaris) is much faster than gzip but can be used  
>> in a
>> similar way since it is patterned after gzip.
>
> It seems as though a similar filter could be created to create and
> inject an error correcting code into the stream.  That is:
>
> zfs send $snap | ecc -i  > /somestorage/mysnap.ecc
> ecc -o < /somestorage/mysnap | zfs receive ...
>
> I''m not aware of an existing  ecc program, but I can''t
imagine it
> would be hard to create one.  There seems to already be an
> implementation of Reed-Solomon encoding in ON that could likely be
> used as a starting point.
>
>
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c
It all depends on the failure you want to protect against.  If you
don''t know the failure mode, you won''t be very effective. For
example, to protect against a unrecoverable read on a single
disk sector, you need an ECC that can recover 512 bytes.  It is
this thought process that led to the original RAID work (and is
one reason why nobody does RAID-2). By contrast, if you are
a working at the media level, then it is not uncommon to have
errors that affect a few contiguous bytes, and an ECC code can
be effective (AIUI, 40% of the bits on a modern HDD are not data).
  -- richard

Richard Elling

2009-Dec-07 01:33 UTC

head link

[zfs-discuss] ZFS send | verify | receive

On Dec 6, 2009, at 3:35 PM, Edward Ned Harvey wrote:
>> OS means Operating System, or OpenSolaris.  This is in the second
>> meaning I wrote OS in my answer.  It was not obvious you were using
>> Solaris 10 though.  Sorry about that.
>>
>> (FYI, zstreamdump seems to be an addition to build 125.)
>
> Oh - I never connected OS to OpenSolaris.  ;-)
>
> So I gather it''s not a downloadable item.  If zstreamdump is in
your
> operating system then great, and if not, it''s not available until
you
> upgrade your operating system.  Right?
... or use a virtual machine.
  -- richard

Edward Ned Harvey

2009-Dec-07 02:09 UTC

head link

[zfs-discuss] ZFS send | verify | receive

Oh well.  I built LZO, and can''t seem to link it in the lzop build,
despite
correctly setting the FLAGS variables they say in the INSTALL file. 
I''d
love to provide an lzop comparison, but can''t get it.  I give up ... 
Also,
can''t build python-lzo.  Also would be sweet, but hey.

For whoever cares, here is the comparison that I do have:

I''m doing a "zfs send" of my rpool, piping through the named
compressor, and
dump to /dev/null.  rpool is on a 2-disk mirror, SATA 7200
2 sockets 4 core Xeons (total 8 cores, capable of 16 threads)
System idle in all respects, except this activity.

Threadzip is using zlib (similar or same as gzip) breaking stream into 5M
chunks and parallel threading the compression of those chunks.

------------------------------------------- pass1
9.52GB  2m14.578s-------no compression
5.69GB  2m15.963s-------threadzip 32 threads --fast
5.69GB  2m13.609s-------threadzip 16 threads --fast
5.69GB  2m21.968s-------threadzip 8 threads --fast
(Above, "zfs send" is the bottleneck.  Don''t know if
compressor can go
faster.)
(Below, the compressor is the bottleneck.)
5.69GB  3m17.789s-------threadzip 4 threads --fast
5.56GB  3m29.619s-------threadzip 16 threads --best
5.56GB  4m24.761s-------threadzip 8 threads --best
5.44GB  5m13.139s-------pbzip2 auto
5.44GB  5m21.030s-------pbzip2 16 processes
5.44GB  6m4.915s--------pbzip2 8 processes
5.70GB  7m41.209s-------gzip --fast
------------------------------------------- pass2
9.52GB  2m17.858s-------no compression
5.69GB  2m13.446s-------threadzip 32 threads --fast
5.69GB  2m9.842s--------threadzip 16 threads --fast
5.69GB  2m22.388s-------threadzip 8 threads --fast
(Above, "zfs send" is the bottleneck.  Don''t know if
compressor can go
faster.)
(Below, the compressor is the bottleneck.)
5.69GB  3m10.701s-------threadzip 4 threads --fast
5.56GB  3m27.772s-------threadzip 16 threads --best
5.56GB  4m22.409s-------threadzip 8 threads --best
5.44GB  5m15.247s-------pbzip2 auto
5.44GB  5m21.089s-------pbzip2 16 processes
5.44GB  6m5.412s--------pbzip2 8 processes
5.70GB  7m22.505s-------gzip --fast

Edward Ned Harvey

2009-Dec-07 02:17 UTC

head link

[zfs-discuss] ZFS send | verify | receive

>     cat my_log_file | tee >(gzip > my_log_file.gz) >(wc -l)
>(md5sum) |
> sort | uniq -c
That is great.  ;-)  Thank you very much.

Reasonably Related Threads

Search for more apparently analagous threads

zfs discuss - Dec 2009 - ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

[zfs-discuss] ZFS send | verify | receive

Reasonably Related Threads