We have groups generating terabytes a day of image data from lab instruments and saving them to an X4500. We have tried lzbj : compressratio = 1.13 in 11 hours , 1.3 TB -> 1.1 TB gzip -9 : compress ratio = 1.68 in > 37 hours, 1.3 TB -> .75 TB The filesystem performance was noticably laggy (ie ls took > 10 seconds) while gzip -9 compression was used do you have any idea if lossless jpeg compression is being planned for ZFS? We can envisage of 1.3 TB, > .8 TB will be images and if we could get better or equivalent compression on jpeg lossless compression with less impact on the filesystem than gzip -9 compression, that would be worthwhile, if it worked. -- This message posted from opensolaris.org
On Sep 4, 2009, at 12:23 PM, Len Zaifman wrote:> We have groups generating terabytes a day of image data from lab > instruments and saving them to an X4500.Wouldn''t it be easier to compress at the application, or between the application and the archiving file system?> We have tried lzbj : compressratio = 1.13 in 11 hours , 1.3 TB -> > 1.1 TB > gzip -9 : compress ratio = 1.68 in > 37 hours, > 1.3 TB -> .75 TB > > The filesystem performance was noticably laggy (ie ls took > 10 > seconds) while gzip -9 compression was used > > do you have any idea if lossless jpeg compression is being planned > for ZFS? We can envisage of 1.3 TB, > .8 TB will be images and if we > could get better or equivalent compression on jpeg lossless > compression with less impact on the filesystem than gzip -9 > compression, that would be worthwhile, if it worked.I don''t know of anyone working on that specific compression scheme, but I''ve put together some thoughts on the subject of adding a new compressor to ZFS. Perhaps others could comment? http://richardelling.blogspot.com/2009/08/justifying-new-compression-algorithms.html -- richard
Louis-Frédéric Feuillette
2009-Sep-04 21:12 UTC
[zfs-discuss] zfs compression algorithm : jpeg ??
On Fri, 2009-09-04 at 13:41 -0700, Richard Elling wrote:> On Sep 4, 2009, at 12:23 PM, Len Zaifman wrote: > > > We have groups generating terabytes a day of image data from lab > > instruments and saving them to an X4500. > > Wouldn''t it be easier to compress at the application, or between the > application and the archiving file system?Preamble: I am actively doing research into image set compression, specifically jpeg2000, so this is my point of reference. I think it would be easier to compress at the application level. I would suggest getting the image from the source, then use lossless jpeg2000 compression on it, saving the result to an uncompressed ZFS pool. JPEG2000 uses arithmetic encoding to do the final compression step. Arithmetic encoding has a higher compression rate (in general) than gzip-9, lzbj or others. There is an opensource implementation of jpeg2000 called jasper[1]. Jasper is the reference implementation for jpeg2000, meaning that all other jpeg2000 programs must verify it''s output to that of jasper (kinda). Saving the jpeg2000 image to an uncompressed ZFS partition will be the fastest thing. Since jpeg2000 is already compressed, trying to compress it will not yeild any storage space reduction, in fact it may _increase_ the size of the data stored on disk. Since good compression algorithms result in random data you can see why running on a compressed pool would be bad for performance. [1] http://www.ece.uvic.ca/~mdadams/jasper On a side note, if you want to know how Arithmetic encoding works, Wikipedia[2] has a real nice explanation. Suffice it to say, in theory ( Without considering implementation details ) arithmetic encoding can encode _any_ data at the rate of data_entropy*num_of_symbols + data_symbol_table. In practice this doesn''t happen due to floating point overflows and some other issues. [2] http://en.wikipedia.org/wiki/Arithmetic_coding -- Louis-Fr?d?ric Feuillette <jebnor at gmail.com>
On Fri, Sep 04, 2009 at 01:41:15PM -0700, Richard Elling wrote:> On Sep 4, 2009, at 12:23 PM, Len Zaifman wrote: > >We have groups generating terabytes a day of image data from lab > >instruments and saving them to an X4500. > > Wouldn''t it be easier to compress at the application, or between the > application and the archiving file system?Especially when it comes to reading the images back! ZFS compression is transparent. You can''t write uncompressed data then read back compressed data. And compression is at the block level, not for the whole file, so even if you could read it back compressed, it wouldn''t be in a useful format. Most people want to transfer data compressed, particularly images. So compressing at the application level in this case seems best to me. Nico --
On Fri, 4 Sep 2009, Louis-Fr?d?ric Feuillette wrote:> > JPEG2000 uses arithmetic encoding to do the final compression step. > Arithmetic encoding has a higher compression rate (in general) than > gzip-9, lzbj or others. There is an opensource implementation of > jpeg2000 called jasper[1]. Jasper is the reference implementation for > jpeg2000, meaning that all other jpeg2000 programs must verify it''s > output to that of jasper (kinda).Jasper is incredibly slow and consumes large amount of memory. Other JPEG2000 programs are validated by how many times faster they are than Jasper. :-) Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/