Henk Langeveld
2007-Jan-10 16:26 UTC
[zfs-discuss] zfs internal working - compression question
roland:>> i created two zfs filesystems based on image-files used as devices - >> i.e. i created them on top of two empty files, exactly the same size. >> >> then i enabled compression on one of them (zfs set compression=on compressedzfs) >> >> after copying a large file to both filesystems, i unmounted them, >> exported them and did a gzip on those zfs imagefiles. >> >> after gzipping, the imagefile with compression=on is nearly twice as >> big as the imagefile with compression=off. >> >> this is something i wouldn`t have expected.Tomas ?gren wrote:> The compression used in ZFS isn''t as good as in gzip, because that would > take too much cpu (and I''ve heard they just snatched the code from the > kernel panic crash dump thing which isn''t allowed to allocate memory for > instance).. It''s a simple form of Lempel-Ziv.. instead it''s a "quick and > kinda good" compression algorithm.. Before compression, data could be > quite compressible (either with a fast & larger-endresult or slow & > smaller-endresult).. but after compression (even from a non-ideal > algorithm), the data is close to random data which is quite hard to > re-compress.. > > Try the difference between zfs -> zfs+gzip vs gzip -> gzip+gzip..Another factor that contributes to the difference is that the ZFS compression is per-block. Gzip remembers patterns throughout the whole file, resulting in much smaller storage for later repetitions. This principle has been used for comparing text bodies for style analysis and text recognition. Compressing a sample in combination with a larger corpus representative for a particular source (author, style) will result in smaller files for samples that better match the source. -- Henk Langeveld <henk@hlangeveld.nl> _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Tomas Ă–gren
2007-Jan-10 16:26 UTC
[zfs-discuss] zfs internal working - compression question
On 29 December, 2006 - roland sent me these 1,0K bytes:> Hello ! > > I have come across some weirdness, i would like to understand. > > it`s not an issue, but i`m just wondering about. > > i created two zfs filesystems based on image-files used as devices - > i.e. i created them on top of two empty files, exactly the same size. > > then i enabled compression on one of them (zfs set compression=on compressedzfs) > > after copying a large file to both filesystems, i unmounted them, > exported them and did a gzip on those zfs imagefiles. > > after gzipping, the imagefile with compression=on is nearly twice as > big as the imagefile with compression=off. > > this is something i wouldn`t have expected. > > ok, i didn`t expect the same size, but i never would have expected > such BIG difference, since we are basically re-compressing data which > is already compressed. > > what`s causing this effect? > can someone probably explain this ?The compression used in ZFS isn''t as good as in gzip, because that would take too much cpu (and I''ve heard they just snatched the code from the kernel panic crash dump thing which isn''t allowed to allocate memory for instance).. It''s a simple form of Lempel-Ziv.. instead it''s a "quick and kinda good" compression algorithm.. Before compression, data could be quite compressible (either with a fast & larger-endresult or slow & smaller-endresult).. but after compression (even from a non-ideal algorithm), the data is close to random data which is quite hard to re-compress.. Try the difference between zfs -> zfs+gzip vs gzip -> gzip+gzip.. /Tomas -- Tomas ?gren, stric@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello ! I have come across some weirdness, i would like to understand. it`s not an issue, but i`m just wondering about. i created two zfs filesystems based on image-files used as devices - i.e. i created them on top of two empty files, exactly the same size. then i enabled compression on one of them (zfs set compression=on compressedzfs) after copying a large file to both filesystems, i unmounted them, exported them and did a gzip on those zfs imagefiles. after gzipping, the imagefile with compression=on is nearly twice as big as the imagefile with compression=off. this is something i wouldn`t have expected. ok, i didn`t expect the same size, but i never would have expected such BIG difference, since we are basically re-compressing data which is already compressed. what`s causing this effect? can someone probably explain this ? regards roland This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss