On the heels of the LZO compression thread, I bring you a 7zip compression thread! Shown here as the open source system with the best compression ratio: http://en.wikipedia.org/wiki/Data_compression#Comparative Shown here on a SPARC system with the best compression ratios and good CPU usage: http://warp.povusers.org/ArchiverComparison/ Obviously 7zip is far more CPU-intensive than anything in use with ZFS today. But maybe with all these processor cores coming down the road, a high-end compression system is just the thing for ZFS to use. This message posted from opensolaris.org
MC <rac <at> eastlink.ca> writes:> > Obviously 7zip is far more CPU-intensive than anything in use with ZFS > today. But maybe with all these processor cores coming down the road, > a high-end compression system is just the thing for ZFS to use.I am not sure you realize the scale of things here. Assuming the worst case: that lzjb (default ZFS compression algorithm) performs as bad as lha in [1], 7zip would compress your data only 20-30% better at the cost of being 4x-5x slower ! Also, in most cases, the bottleneck in data compression is the CPU, so switching to 7zip would reduce the I/O throughput by about 4x. [1] http://warp.povusers.org/ArchiverComparison -marc
Hello Marc, Sunday, July 29, 2007, 9:57:13 PM, you wrote: MB> MC <rac <at> eastlink.ca> writes:>> >> Obviously 7zip is far more CPU-intensive than anything in use with ZFS >> today. But maybe with all these processor cores coming down the road, >> a high-end compression system is just the thing for ZFS to use.MB> I am not sure you realize the scale of things here. Assuming the worst case: MB> that lzjb (default ZFS compression algorithm) performs as bad as lha in [1], MB> 7zip would compress your data only 20-30% better at the cost of being 4x-5x MB> slower ! MB> Also, in most cases, the bottleneck in data compression is the CPU, so MB> switching to 7zip would reduce the I/O throughput by about 4x. 1. it depends on a specific case - sometimes it''s cpu sometimes not 2. sometimes you don''t really care about cpu - you have hundreds TBs of data rarely used and then squeezing 20-30% more space is a huge benefit - especially when you only read those files once they are written -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com
MC <rac at eastlink.ca> wrote:> On the heels of the LZO compression thread, I bring you a 7zip compression thread! > > Shown here as the open source system with the best compression ratio: http://en.wikipedia.org/wiki/Data_compression#Comparative > > Shown here on a SPARC system with the best compression ratios and good CPU usage: http://warp.povusers.org/ArchiverComparison/ > > > Obviously 7zip is far more CPU-intensive than anything in use with ZFS today. But maybe with all these processor cores coming down the road, a high-end compression system is just the thing for ZFS to use.You would need a complete new clean C implementation for the algorithm. If someone would do this, I would be happy to also use the code for star ,-) J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
> Hello Marc, > > Sunday, July 29, 2007, 9:57:13 PM, you wrote: > > MB> MC <rac <at> eastlink.ca> writes: > >> > >> Obviously 7zip is far more CPU-intensive than > anything in use with ZFS > >> today. But maybe with all these processor cores > coming down the road, > >> a high-end compression system is just the thing > for ZFS to use. > > MB> I am not sure you realize the scale of things > here. Assuming the worst case: > MB> that lzjb (default ZFS compression algorithm) > performs as bad as lha in [1], > MB> 7zip would compress your data only 20-30% better > at the cost of being 4x-5x > MB> slower ! > > MB> Also, in most cases, the bottleneck in data > compression is the CPU, so > MB> switching to 7zip would reduce the I/O throughput > by about 4x. > > 1. it depends on a specific case - sometimes it''s cpu > sometimes not > > 2. sometimes you don''t really care about cpu - you > have hundreds TBs > of data rarely used and then squeezing 20-30% more > space is a huge > benefit - especially when you only read those files > once they are > written* disks are probably cheaper than CPUs * it looks to me like 7z may also be RAM-hungry; and there are probably better ways to use the RAM, too No doubt it''s an option that would serve _someone_ well despite its shortcomings. But are there enough such someones to make it worthwhile? This message posted from opensolaris.org
"Richard L. Hamilton" <rlhamil at smart.net> wrote:> * disks are probably cheaper than CPUs > > * it looks to me like 7z may also be RAM-hungry; and there are probably > better ways to use the RAM, tooThe main problem with the currently available 7z implementation is that it has been written in C++ and not in a clean way. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily