Mickaël CANÉVET
2012-Jan-24 09:05 UTC
[zfs-discuss] zfs send recv without uncompressing data stream
Hi, Unless I misunderstood something, zfs send of a volume that has compression activated, uncompress it. So if I do a zfs send|zfs receive from a compressed volume to a compressed volume, my data are uncompressed and compressed again. Right ? Is there a more effective way to do it (without decompression and recompression) ? Cheers, Micka?l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120124/fbb01dd2/attachment.bin>
Jim Klimov
2012-Jan-24 15:52 UTC
[zfs-discuss] zfs send recv without uncompressing data stream
2012-01-24 13:05, Micka?l CAN?VET wrote:> Hi, > > Unless I misunderstood something, zfs send of a volume that has > compression activated, uncompress it. So if I do a zfs send|zfs receive > from a compressed volume to a compressed volume, my data are > uncompressed and compressed again. Right ? > > Is there a more effective way to do it (without decompression and > recompression) ?While I can not confirm or deny this statement, it was my impression as well. Rationale being that the two systems might demand different compression (i.e. "lzjb" or "none" on the original system and "gzip-9" on the backup one). Just like you probably have different VDEV layouts, etc. Or perhaps even different encryption or dedup settings. Compression, like many other components, lives on the layer "under" logical storage (userdata blocks), and gets applied to newly written blocks only (i.e. your datasets can have a mix of different compression levels for different files or even blocks within a file, if you switched the methods during dataset lifetime). Actually I would not be surprised if zfs-send userdata stream is even above the block level (i.e. it would seem normal to me if many small userdata blocks of original pool might become one big block on the recipient). So while some optimizations are possible, I think they would violate layering quite much. But, for example, it might make sense for zfs-send to include the original compression algorithm information into the sent stream and send the compressed data (less network traffic or intermediate storage requirement, to say the least - at zero price of recompression to something perhaps more efficient), and if the recipient dataset''s algorithm differs - unpack and recompress it on the receiving side. If that''s not done already :) So far my over-the-net zfs sends are piped into gzip or pigz, ssh and gunzip, and that often speeds up the overall transfer. Probably can be done with less overhead by "ssh -C" for implementations that have it. //Jim
Richard Elling
2012-Jan-24 17:53 UTC
[zfs-discuss] zfs send recv without uncompressing data stream
On Jan 24, 2012, at 7:52 AM, Jim Klimov wrote:> 2012-01-24 13:05, Micka?l CAN?VET wrote: >> Hi, >> >> Unless I misunderstood something, zfs send of a volume that has >> compression activated, uncompress it. So if I do a zfs send|zfs receive >> from a compressed volume to a compressed volume, my data are >> uncompressed and compressed again. Right ?correct>> >> Is there a more effective way to do it (without decompression and >> recompression) ? > > > While I can not confirm or deny this statement, it was my > impression as well. Rationale being that the two systems > might demand different compression (i.e. "lzjb" or "none" > on the original system and "gzip-9" on the backup one). > Just like you probably have different VDEV layouts, etc. > Or perhaps even different encryption or dedup settings.that "feature" falls out of the implementation.> > Compression, like many other components, lives on the > layer "under" logical storage (userdata blocks), and > gets applied to newly written blocks only (i.e. your > datasets can have a mix of different compression levels > for different files or even blocks within a file, if > you switched the methods during dataset lifetime). > > Actually I would not be surprised if zfs-send userdata > stream is even above the block level (i.e. it would seem > normal to me if many small userdata blocks of original > pool might become one big block on the recipient). > > So while some optimizations are possible, I think they > would violate layering quite much.data in the ARC is uncompressed. compression/decompression occurs in the ZIO pipeline layer below the DSL.> > But, for example, it might make sense for zfs-send to > include the original compression algorithm information > into the sent stream and send the compressed data (less > network traffic or intermediate storage requirement, > to say the least - at zero price of recompression to > something perhaps more efficient), and if the recipient > dataset''s algorithm differs - unpack and recompress it > on the receiving side. > > If that''s not done already :)the compression parameter value is sent, but as you mentioned above, blocks in a snapshot can be compressed with different algorithms, so you only actually get the last setting at time of snapshot.> > So far my over-the-net zfs sends are piped into gzip > or pigz, ssh and gunzip, and that often speeds up the > overall transfer. Probably can be done with less overhead > by "ssh -C" for implementations that have it.the UNIX philosophy is in play here :-) Sending the data uncompressed to stdout allows you to pipe it into various transport or transform programs. -- richard -- ZFS Performance and Training Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120124/987d6304/attachment.html>
Jim Klimov
2012-Jan-24 18:37 UTC
[zfs-discuss] zfs send recv without uncompressing data stream
2012-01-24 19:52, Jim Klimov wrote:> 2012-01-24 13:05, Micka?l CAN?VET wrote: >> Hi, >> >> Unless I misunderstood something, zfs send of a volume that has >> compression activated, uncompress it. So if I do a zfs send|zfs receive >> from a compressed volume to a compressed volume, my data are >> uncompressed and compressed again. Right ? >> >> Is there a more effective way to do it (without decompression and >> recompression) ? > > > Rationale being that the two systems > might demand different compression (i.e. "lzjb" or "none" > on the original system and "gzip-9" on the backup one).One more rationale - compatibility, including future-proof somewhat (the zfs-send format explicitly does not guarantee that it won''t change incompatibly). I mean stransfer of data between systems that do not implement the same set of compression algoritms in ZFS. Say, as a developer I find a way to use bzip2 or 7zip to compress my local system''s blocks (just like gzip appeared recently, after there were only lzjb and none). If I zfs-send the compressed blocks as they are, another system won''t be able to interpret them unless it supports the same algorithm and format. And since zfs-send can be used via files (i.e. distribution media with flar-like archives), there is no way of dialog between zfs-sender and zfs-recipient to agree on a common format, beside using a fixed predefined one - uncompressed. Using external programs to wrap that in the Unix way gets out of ZFS''s scope and can be arranged by other software on the OSes. HTH, //Jim
David Magda
2012-Jan-24 19:16 UTC
[zfs-discuss] zfs send recv without uncompressing data stream
On Tue, January 24, 2012 13:37, Jim Klimov wrote:> One more rationale - compatibility, including future-proof > somewhat (the zfs-send format explicitly does not guarantee > that it won''t change incompatibly). I mean stransfer of data > between systems that do not implement the same set of > compression algoritms in ZFS.The format of ''zfs send'' has now been committed:> The format of the stream is committed. You will be able to receive your'' > streams on future versions of ZFS.http://docs.oracle.com/cd/E19253-01/816-5166/zfs-1m/index.html This was fixed in some update of Solaris 10, though I can''t find the exact one. http://hub.opensolaris.org/bin/view/Community+Group+on/2008042301
Edward Ned Harvey
2012-Jan-25 14:05 UTC
[zfs-discuss] zfs send recv without uncompressing data stream
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Micka?l CAN?VET > > Unless I misunderstood something, zfs send of a volume that has > compression activated, uncompress it. So if I do a zfs send|zfs receive > from a compressed volume to a compressed volume, my data are > uncompressed and compressed again. Right ? > > Is there a more effective way to do it (without decompression and > recompression) ?Better yet, zfs send is decompressing, and then you''re probably piping to gzip or lzop or something, and piping to ssh, so it got re-compressed and encrypted. Then at the receiving end, it gets decrypted, decompressed, and recompressed. ;-) While there are lots of reasons behind this, I think you''ll find usually it doesn''t matter. Only if you have really fast disks, or underpowered processor, or super-duper massive compression (like gzip, or worse... gzip-9) then it matters. Default compression is very fast and lightweight.