If there were a ?zfs send? datastream saved someplace, is there a way to verify the integrity of that datastream without doing a ?zfs receive? and occupying all that disk space? I am aware that ?zfs send? is not a backup solution, due to vulnerability of even a single bit error, and lack of granularity, and other reasons. However ... There is an attraction to ?zfs send? as an augmentation to the commercial backup tools we use, because ?zfs receive? doesn?t require any special software packages or license keys to do a restore in the event of a complete filesystem restore. Hate that catch-22 when you can?t restore because the backup tool is inside the backup file. If we ever need to restore the complete dataset ... Most likely there will be no error on the tapes, so if we have an error-free saved ?zfs send? stream available, then ?zfs receive? would be the best possible tool to recover the whole filesystem. So the question is: I?ve read the ?zfs manual? and I don?t see any ?zfs verify? command. The closest I see is the ?zfs receive ?n? command, but I am not sure this command would actually checksum and verify the datastream. Is there some way for me to verify a datastream without actually doing the ?zfs receive?? Thanks... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091204/1356146a/attachment.html>
> If there were a ?zfs send? datastream saved someplace, is there a way to > verify the integrity of that datastream without doing a ?zfs receive? and > occupying all that disk space?Depending of your version of OS, I think the following post from Richard Elling will be of great interest to you: - http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams.html -- julien. http://blog.thilelli.net/
> Depending of your version of OS, I think the following post from Richard > Elling > will be of great interest to you: > - > http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams. > htmlThanks! :-) No, wait! .... According to that page, if you "zfs receive -n" then you should get a 0 exit status for success, and 1 for error. Unfortunately, I''ve been sitting here and testing just now ... I created a "zfs send" datastream, then I made a copy of it and toggled a bit in the middle to make it corrupt ... I found that the "zfs receive -n" always returns 0 exit status, even if the data stream is corrupt. In order to get the "1" exit status, you have to get rid of the "-n" which unfortunately means writing the completely restored filesystem to disk. I''ve sent a message to Richard to notify him of the error on his page. But it would seem, the zstreamdump must be the only way to verify the integrity of a stored data stream. I haven''t tried it yet, and I''m out of time for today...
If feasible, you may want to generate MD5 sums on the streamed output and then use these for verification. -- Sriram On 12/5/09, Edward Ned Harvey <solaris at nedharvey.com> wrote:>> Depending of your version of OS, I think the following post from Richard >> Elling >> will be of great interest to you: >> - >> http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams. >> html > > Thanks! :-) > No, wait! .... > > According to that page, if you "zfs receive -n" then you should get a 0 exit > status for success, and 1 for error. > > Unfortunately, I''ve been sitting here and testing just now ... I created a > "zfs send" datastream, then I made a copy of it and toggled a bit in the > middle to make it corrupt ... > > I found that the "zfs receive -n" always returns 0 exit status, even if the > data stream is corrupt. In order to get the "1" exit status, you have to > get rid of the "-n" which unfortunately means writing the completely > restored filesystem to disk. > > I''ve sent a message to Richard to notify him of the error on his page. But > it would seem, the zstreamdump must be the only way to verify the integrity > of a stored data stream. I haven''t tried it yet, and I''m out of time for > today... > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Sent from my mobile device
Well what does _that_ verify? It will verify that no bits provably broke during transport. It will still leave the chance of getting an incompatible stream, an incomplete stream (kill the dump), or plain corrupted data. Of course, the chance of the latter should be extremely small in server-grade hardware. $0.02 Sriram Narayanan wrote:> If feasible, you may want to generate MD5 sums on the streamed output > and then use these for verification. > > -- Sriram > > On 12/5/09, Edward Ned Harvey <solaris at nedharvey.com> wrote: > >>> Depending of your version of OS, I think the following post from Richard >>> Elling >>> will be of great interest to you: >>> - >>> http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams. >>> html >>> >> Thanks! :-) >> No, wait! .... >> >> According to that page, if you "zfs receive -n" then you should get a 0 exit >> status for success, and 1 for error. >> >> Unfortunately, I''ve been sitting here and testing just now ... I created a >> "zfs send" datastream, then I made a copy of it and toggled a bit in the >> middle to make it corrupt ... >> >> I found that the "zfs receive -n" always returns 0 exit status, even if the >> data stream is corrupt. In order to get the "1" exit status, you have to >> get rid of the "-n" which unfortunately means writing the completely >> restored filesystem to disk. >> >> I''ve sent a message to Richard to notify him of the error on his page. But >> it would seem, the zstreamdump must be the only way to verify the integrity >> of a stored data stream. I haven''t tried it yet, and I''m out of time for >> today... >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> > >
On Sat, 5 Dec 2009, Sriram Narayanan wrote:> If feasible, you may want to generate MD5 sums on the streamed output > and then use these for verification.You can also stream into a gzip or lzop wrapper in order to obtain the benefit of incremental CRCs and some compression as well. As long as the wrapper is generated on the sending side (and not subject to problems like truncation) it should be quite useful for verifying that the stream has not been corrupted. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Dec 4, 2009, at 4:11 PM, Edward Ned Harvey wrote:>> Depending of your version of OS, I think the following post from >> Richard >> Elling >> will be of great interest to you: >> - >> http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams >> . >> html > > Thanks! :-) > No, wait! .... > > According to that page, if you "zfs receive -n" then you should get > a 0 exit > status for success, and 1 for error. > > Unfortunately, I''ve been sitting here and testing just now ... I > created a > "zfs send" datastream, then I made a copy of it and toggled a bit in > the > middle to make it corrupt ... > > I found that the "zfs receive -n" always returns 0 exit status, even > if the > data stream is corrupt. In order to get the "1" exit status, you > have to > get rid of the "-n" which unfortunately means writing the completely > restored filesystem to disk.I believe it will depend on the nature of the corruption. Regardless, the answer is to use zstreamdump. -- richard
On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote:> You can also stream into a gzip or lzop wrapper in order to obtain the > benefit of incremental CRCs and some compression as well.Can you give an example command line for this option please?
Bob Friesenhahn wrote:> On Sat, 5 Dec 2009, Sriram Narayanan wrote: > >> If feasible, you may want to generate MD5 sums on the streamed output >> and then use these for verification. > > You can also stream into a gzip or lzop wrapper in order to obtain the > benefit of incremental CRCs and some compression as well. As long as > the wrapper is generated on the sending side (and not subject to > problems like truncation) it should be quite useful for verifying that > the stream has not been corrupted.Same deal as with MD5 sums. It doesn''t guarantee that the stream is ''receivable'' on the receiver. Now, unless your wrapper is able to retransmit on CRC error, a MD5 would be vastly superior due to qualilty of error detection. Both techniques would be optimal (although I''d suspect the compression doesn''t help. I should think the send/recv streams will be compressed as it is).> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Sat, 5 Dec 2009, dick hoogendijk wrote:> On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote: > >> You can also stream into a gzip or lzop wrapper in order to obtain the >> benefit of incremental CRCs and some compression as well. > > Can you give an example command line for this option please?Something like zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz should work nicely. Zfs send sends to its standard output so it is just a matter of adding another filter program on its output. This could be streamed over ssh or some other streaming network transfer protocol. Later, you can do ''gzip -t mysnap.gz'' on the machine where the snapshot file is stored to verify that it has not been corrupted in storage or transfer. lzop (not part of Solaris) is much faster than gzip but can be used in a similar way since it is patterned after gzip. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Sat, Dec 5, 2009 at 11:32 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> On Sat, 5 Dec 2009, dick hoogendijk wrote: > >> On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote: >> >>> You can also stream into a gzip or lzop wrapper in order to obtain the >>> benefit of incremental CRCs and some compression as well. >> >> Can you give an example command line for this option please? > > Something like > > ?zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz > > should work nicely. ?Zfs send sends to its standard output so it is just a > matter of adding another filter program on its output. ?This could be > streamed over ssh or some other streaming network transfer protocol. > > Later, you can do ''gzip -t mysnap.gz'' on the machine where the snapshot file > is stored to verify that it has not been corrupted in storage or transfer. > > lzop (not part of Solaris) is much faster than gzip but can be used in a > similar way since it is patterned after gzip.It seems as though a similar filter could be created to create and inject an error correcting code into the stream. That is: zfs send $snap | ecc -i > /somestorage/mysnap.ecc ecc -o < /somestorage/mysnap | zfs receive ... I''m not aware of an existing ecc program, but I can''t imagine it would be hard to create one. There seems to already be an implementation of Reed-Solomon encoding in ON that could likely be used as a starting point. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_raidz.c -- Mike Gerdts http://mgerdts.blogspot.com/
> If feasible, you may want to generate MD5 sums on the streamed output > and then use these for verification.That''s actually not a bad idea. It should be kinda obvious, but I hadn''t thought of it because it''s sort-of duplicating existing functionality. I do have a "multipipe" script that behaves similar to "tee" but "tee" can only output to stdout and a file. "multipipe" launches any number of processes, and pipes stdin to all of the child processes. I normally use this when creating a large datastream ... I generate the datastream, and I want to md5 the uncompressed datastream, and I also want to gzip the uncompressed datastream. I don''t want to generate the filestream twice. Then I will gunzip | md5 to check the sum. I also have a "threadzip" script, because gzip is invariably the bottleneck in the data stream. Utilize those extra cores!!! ;-) I plan to release these things open source soon, so if anyone has interest, please let me know.
Where exactly do you get zstreamdump? I found a link to zstreamdump.c ... but is that it? Shouldn''t it be part of a source tarball or something? Does it matter what OS? Every reference I see for zstreamdump is about opensolaris. But I''m running solaris.
On Sat, Dec 5, 2009 at 17:17, Richard Elling <richard.elling at gmail.com>wrote:> On Dec 4, 2009, at 4:11 PM, Edward Ned Harvey wrote: > > Depending of your version of OS, I think the following post from Richard >>> Elling >>> will be of great interest to you: >>> - >>> >>> http://richardelling.blogspot.com/2009/10/check-integrity-of-zfs-send-streams >>> . >>> html >>> >> >> Thanks! :-) >> No, wait! .... >> >> According to that page, if you "zfs receive -n" then you should get a 0 >> exit >> status for success, and 1 for error. >> >> Unfortunately, I''ve been sitting here and testing just now ... I created >> a >> "zfs send" datastream, then I made a copy of it and toggled a bit in the >> middle to make it corrupt ... >> >> I found that the "zfs receive -n" always returns 0 exit status, even if >> the >> data stream is corrupt. In order to get the "1" exit status, you have to >> get rid of the "-n" which unfortunately means writing the completely >> restored filesystem to disk. >> > > I believe it will depend on the nature of the corruption. Regardless, > the answer is to use zstreamdump.Richard, do you know of any usage examples of zfstreamdump? I''ve been searching for examples since you posted this, and don''t see anything that shows how to use it in practice. argh. -C -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091206/b0916a1f/attachment.html>
On Sun, 6 Dec 2009, Edward Ned Harvey wrote:> > I also have a "threadzip" script, because gzip is invariably the bottleneck > in the data stream. Utilize those extra cores!!! ;-)Gzip can be a bit slow. Luckily there is ''lzop'' which is quite a lot more CPU efficient on i386 and AMD64, and even on SPARC. If the compressor is able to keep up with the network and disk, then it is fast enough. See "http://www.lzop.org/". Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Edward Ned Harvey wrote:>> If feasible, you may want to generate MD5 sums on the streamed output >> and then use these for verification. >> > > That''s actually not a bad idea. It should be kinda obvious, but I hadn''t > thought of it because it''s sort-of duplicating existing functionality. > > I do have a "multipipe" script that behaves similar to "tee" but "tee" can > only output to stdout and a file.In my POSIX universe I can just do zfs send ... | pv | tee >(md5sum) >(sha256sum) | gzip | tee >(md5sum> .md5.zipped) | ssh remote etc. etc. > "multipipe" launches any number of > processes, and pipes stdin to all of the child processes. I normally use > this when creating a large datastream ... I generate the datastream, and I > want to md5 the uncompressed datastream, and I also want to gzip the > uncompressed datastream. I don''t want to generate the filestream twice. > Then I will gunzip | md5 to check the sum. > > I also have a "threadzip" script, because gzip is invariably the bottleneck > in the data stream. Utilize those extra cores!!! ;-) > > I plan to release these things open source soon, so if anyone has interest, > please let me know. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Bob Friesenhahn wrote:> On Sun, 6 Dec 2009, Edward Ned Harvey wrote: >> >> I also have a "threadzip" script, because gzip is invariably the >> bottleneck >> in the data stream. Utilize those extra cores!!! ;-) > > Gzip can be a bit slow. Luckily there is ''lzop'' which is quite a lot > more CPU efficient on i386 and AMD64, and even on SPARC. If the > compressor is able to keep up with the network and disk, then it is > fast enough. See "http://www.lzop.org/".I use the excellent pbzip2 zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ... Utilizes those 8 cores quite well :)> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Gzip can be a bit slow. Luckily there is ''lzop'' which is quite a lot > more CPU efficient on i386 and AMD64, and even on SPARC. If the > compressor is able to keep up with the network and disk, then it is > fast enough. See "http://www.lzop.org/".In my development/testing this week, I did "time zfs send | gzip --fast > somefile.gz" and also "time zfs send | threadzip --threads=8 > somefile.tz" ... Threadzip performed 10x faster (hardly a performance I expect from lzop) and compressed about 2-3% smaller than gzip. Also hardly a performance I could expect from lzop. The key is multiple cores. I''m on an 8-core xeon. As for "fast enough," the metric I''m using is: Can the compressor keep up with IO? I do this: "time zfs send > /dev/null" and "time zfs send | [compressor] > /dev/null" to see if the compressor has an impact on performance. I''m only at rev 1.0 of threadzip, and it is *far* from optimized. But it''s still an order of magnitude better than the alternatives. So it''ll only get better from here.
> I use the excellent pbzip2 > > zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ... > > Utilizes those 8 cores quite well :)This (pbzip2) sounds promising, and it must be better than what I wrote. ;-) But I don''t understand the syntax you''ve got above, using tee, redirecting to something in parens. I haven''t been able to do this yet on my own system. Can you please give me an example to simultaneously generate md5sum and gzip? This is how I currently do it: cat somefile | multipipe "md5sum > somefile.md5sum" "gzip > somefile.gz" End result is: somefile somefile.md5sum somefile.gz
On Sun, 6 Dec 2009, Edward Ned Harvey wrote:> Threadzip performed 10x faster (hardly a performance I expect from lzop) and > compressed about 2-3% smaller than gzip. Also hardly a performance I could > expect from lzop. > > The key is multiple cores. I''m on an 8-core xeon.I am glad to see that you found a use for all those cores. As a simple test here, on AMD64 and Solaris 10 I see 3.6X less CPU consumption from ''lzop -3'' than from ''gzip -3''. With lots of background activity (zfs scrub of the pool), this increases to a 4X advantage. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Edward Ned Harvey wrote:>> I use the excellent pbzip2 >> >> zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ... >> >> Utilizes those 8 cores quite well :) >> > > This (pbzip2) sounds promising, and it must be better than what I wrote. > ;-) But I don''t understand the syntax you''ve got above, using tee, > redirecting to something in parens. I haven''t been able to do this yet on > my own system. Can you please give me an example to simultaneously generate > md5sum and gzip? > > This is how I currently do it: > cat somefile | multipipe "md5sum > somefile.md5sum" "gzip > somefile.gz" > End result is: > somefile > somefile.md5sum > somefile.gz > >Well the theory is simple. "tee" is quite sufficient, because it will not just operate on files. It will operate on _file descriptors_ big difference. A file descriptor can point to a whole slew of things, among which are files and pipes, socket files, fifo''s or whatever the heck your brand of UNIX wants to call those. Now, the shell usually gives you a lot of usual syntax for that ls > /dev/stderr is usually a synonym for ls > /proc/self/fd/2 On to the topic of pipes... You could make the ''anonymous'' filedescriptors that your shell opens up internally to link the pipe processes together, explicit like so: mkfifo /tmp/myzippipe mkfifo /tmp/myhashpipe (zfs send ... | tee /tmp/myzippipe /tmp/myhashpipe)& (cat /tmp/myzippipe | gzip > zipped_stream)& (cat /tmp/myhashpipe | md5sum > MD5SUMs)& wait unlink /tmp/my*pipe All that is painfully verbose, leaves dangling fifo''s on errors, has security issues (fifo''s on /tmp?) and looks like a clutch. It appears that a number of shells (i think i remember using this on bash, sh, ksh) support the nifty and obvious shorthand cat >(subshell command line) which will be replaced (like in command line, environment, glob and other expansion) by the proper filedescriptor like cat /dev/fd/23 Of course the actual number would be ''random''; depending on shell, processes running etc. This makes your needed multi-tee a snap: cat my_log_file | tee >(gzip > my_log_file.gz) >(wc -l) >(md5sum) | sort | uniq -c This will do all your hearts desires at once :) Note how the >(subshell) notation allows you to do most anything your shell supports, including using aliases, functions, redirection exactly like you would in $(subshell) [1]. Well I''ll stop here, because I''m sure ''man $0'' in your favourite shell will tell you more info more pertinent without requiring quite so many keystrokes on my part Cheers, Seth [1] Beware that it _is_ a subshell, so you cannot update shell variables, certain things will not be inherited from the parent shell (especially in security restricted environments)
Edward Ned Harvey wrote:>> I use the excellent pbzip2 >> >> zfs send ... | tee >(md5sum) | pbzip2 | ssh remote ... >> >> Utilizes those 8 cores quite well :) >> > > This (pbzip2) sounds promising, and it must be better than what I wrote. > ;-) But I don''t understand the syntax you''ve got above, using tee, > redirecting to something in parens. I haven''t been able to do this yet on > my own system. Can you please give me an example to simultaneously generate > md5sum and gzip? > > This is how I currently do it: > cat somefile | multipipe "md5sum > somefile.md5sum" "gzip > somefile.gz" > End result is: > somefile > somefile.md5sum > somefile.gz >So that would be cat somefile | tee >(md5sum > somefile.md5sum) | gzip > somefile.gz
AcghhjkkNnmuUiui ----- Original Message ----- From: zfs-discuss-bounces at opensolaris.org <zfs-discuss-bounces at opensolaris.org> To: Edward Ned Harvey <solaris at nedharvey.com> Cc: ZFS discuss <zfs-discuss at opensolaris.org> Sent: Sun Dec 06 10:54:11 2009 Subject: Re: [zfs-discuss] ZFS send | verify | receive On Sun, 6 Dec 2009, Edward Ned Harvey wrote:> > I also have a "threadzip" script, because gzip is invariably the bottleneck > in the data stream. Utilize those extra cores!!! ;-)Gzip can be a bit slow. Luckily there is ''lzop'' which is quite a lot more CPU efficient on i386 and AMD64, and even on SPARC. If the compressor is able to keep up with the network and disk, then it is fast enough. See "http://www.lzop.org/". Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ********************************************************************** This communication and all information (including, but not limited to, market prices/levels and data) contained therein (the "Information") is for informational purposes only, is confidential, may be legally privileged and is the intellectual property of ICAP plc and its affiliates ("ICAP") or third parties. No confidentiality or privilege is waived or lost by any mistransmission. The Information is not, and should not be construed as, an offer, bid or solicitation in relation to any financial instrument or as an official confirmation of any transaction. The Information is not warranted, including, but not limited, as to completeness, timeliness or accuracy and is subject to change without notice. ICAP assumes no liability for use or misuse of the Information. All representations and warranties are expressly disclaimed. The Information does not necessarily reflect the views of ICAP. Access to the Information by anyone else other than the recipient is unauthorized and any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it is prohibited. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it and notify the sender. ********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091206/a8cb0d26/attachment.html>
>> Depending of your version of OS, I think the following post from Richard >> Elling will be of great interest to you:> Where exactly do you get zstreamdump? > I found a link to zstreamdump.c ... but is that it? ?Shouldn''t it be part of > a source tarball or something? > Does it matter what OS? ?Every reference I see for zstreamdump is about > opensolaris. ?But I''m running solaris.OS means Operating System, or OpenSolaris. This is in the second meaning I wrote OS in my answer. It was not obvious you were using Solaris 10 though. Sorry about that. (FYI, zstreamdump seems to be an addition to build 125.) -- julien. http://blog.thilelli.net/
> OS means Operating System, or OpenSolaris. This is in the second > meaning I wrote OS in my answer. It was not obvious you were using > Solaris 10 though. Sorry about that. > > (FYI, zstreamdump seems to be an addition to build 125.)Oh - I never connected OS to OpenSolaris. ;-) So I gather it''s not a downloadable item. If zstreamdump is in your operating system then great, and if not, it''s not available until you upgrade your operating system. Right?
> I see 3.6X less CPU > consumption from ''lzop -3'' than from ''gzip -3''.Where do you get lzop from? I don''t see any binaries on their site, nor blastwave, nor opencsw. And I am having difficulty building it from source.
On Sun, 6 Dec 2009, Edward Ned Harvey wrote:>> I see 3.6X less CPU >> consumption from ''lzop -3'' than from ''gzip -3''. > > Where do you get lzop from? I don''t see any binaries on their site, nor > blastwave, nor opencsw. And I am having difficulty building it from source.I just built it from source. :-) First one has to build and install the lzo 2.03 library (from http://www.oberhumer.com/opensource/lzo/) and then build lzop. I used GCC, but not the archaic version that Sun provides with Solaris 10. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Dec 5, 2009, at 11:03 AM, Mike Gerdts wrote:> On Sat, Dec 5, 2009 at 11:32 AM, Bob Friesenhahn > <bfriesen at simple.dallas.tx.us> wrote: >> On Sat, 5 Dec 2009, dick hoogendijk wrote: >> >>> On Sat, 2009-12-05 at 09:22 -0600, Bob Friesenhahn wrote: >>> >>>> You can also stream into a gzip or lzop wrapper in order to >>>> obtain the >>>> benefit of incremental CRCs and some compression as well. >>> >>> Can you give an example command line for this option please? >> >> Something like >> >> zfs send mysnapshot | gzip -c -3 > /somestorage/mysnap.gz >> >> should work nicely. Zfs send sends to its standard output so it is >> just a >> matter of adding another filter program on its output. This could be >> streamed over ssh or some other streaming network transfer protocol. >> >> Later, you can do ''gzip -t mysnap.gz'' on the machine where the >> snapshot file >> is stored to verify that it has not been corrupted in storage or >> transfer. >> >> lzop (not part of Solaris) is much faster than gzip but can be used >> in a >> similar way since it is patterned after gzip. > > It seems as though a similar filter could be created to create and > inject an error correcting code into the stream. That is: > > zfs send $snap | ecc -i > /somestorage/mysnap.ecc > ecc -o < /somestorage/mysnap | zfs receive ... > > I''m not aware of an existing ecc program, but I can''t imagine it > would be hard to create one. There seems to already be an > implementation of Reed-Solomon encoding in ON that could likely be > used as a starting point. > > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/vdev_raidz.cIt all depends on the failure you want to protect against. If you don''t know the failure mode, you won''t be very effective. For example, to protect against a unrecoverable read on a single disk sector, you need an ECC that can recover 512 bytes. It is this thought process that led to the original RAID work (and is one reason why nobody does RAID-2). By contrast, if you are a working at the media level, then it is not uncommon to have errors that affect a few contiguous bytes, and an ECC code can be effective (AIUI, 40% of the bits on a modern HDD are not data). -- richard
On Dec 6, 2009, at 3:35 PM, Edward Ned Harvey wrote:>> OS means Operating System, or OpenSolaris. This is in the second >> meaning I wrote OS in my answer. It was not obvious you were using >> Solaris 10 though. Sorry about that. >> >> (FYI, zstreamdump seems to be an addition to build 125.) > > Oh - I never connected OS to OpenSolaris. ;-) > > So I gather it''s not a downloadable item. If zstreamdump is in your > operating system then great, and if not, it''s not available until you > upgrade your operating system. Right?... or use a virtual machine. -- richard
Oh well. I built LZO, and can''t seem to link it in the lzop build, despite correctly setting the FLAGS variables they say in the INSTALL file. I''d love to provide an lzop comparison, but can''t get it. I give up ... Also, can''t build python-lzo. Also would be sweet, but hey. For whoever cares, here is the comparison that I do have: I''m doing a "zfs send" of my rpool, piping through the named compressor, and dump to /dev/null. rpool is on a 2-disk mirror, SATA 7200 2 sockets 4 core Xeons (total 8 cores, capable of 16 threads) System idle in all respects, except this activity. Threadzip is using zlib (similar or same as gzip) breaking stream into 5M chunks and parallel threading the compression of those chunks. ------------------------------------------- pass1 9.52GB 2m14.578s-------no compression 5.69GB 2m15.963s-------threadzip 32 threads --fast 5.69GB 2m13.609s-------threadzip 16 threads --fast 5.69GB 2m21.968s-------threadzip 8 threads --fast (Above, "zfs send" is the bottleneck. Don''t know if compressor can go faster.) (Below, the compressor is the bottleneck.) 5.69GB 3m17.789s-------threadzip 4 threads --fast 5.56GB 3m29.619s-------threadzip 16 threads --best 5.56GB 4m24.761s-------threadzip 8 threads --best 5.44GB 5m13.139s-------pbzip2 auto 5.44GB 5m21.030s-------pbzip2 16 processes 5.44GB 6m4.915s--------pbzip2 8 processes 5.70GB 7m41.209s-------gzip --fast ------------------------------------------- pass2 9.52GB 2m17.858s-------no compression 5.69GB 2m13.446s-------threadzip 32 threads --fast 5.69GB 2m9.842s--------threadzip 16 threads --fast 5.69GB 2m22.388s-------threadzip 8 threads --fast (Above, "zfs send" is the bottleneck. Don''t know if compressor can go faster.) (Below, the compressor is the bottleneck.) 5.69GB 3m10.701s-------threadzip 4 threads --fast 5.56GB 3m27.772s-------threadzip 16 threads --best 5.56GB 4m22.409s-------threadzip 8 threads --best 5.44GB 5m15.247s-------pbzip2 auto 5.44GB 5m21.089s-------pbzip2 16 processes 5.44GB 6m5.412s--------pbzip2 8 processes 5.70GB 7m22.505s-------gzip --fast