Hello zfs-discuss, http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable-standalone.git;a=commit;h=eecfe5255c533fefd38072a04e4afb56c40d9719 "If compression for a given set of pages fails to make them smaller, the file is flagged to avoid future compression attempts later." Maybe that''s a good one - so if couple of blocks do not compress then flag it in file metadata and do not try to compress any blocks within the file anymore. Of course for some files it will be suboptimal so maybe a dataset option? -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Robert Milkowski wrote:> Hello zfs-discuss, > > http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable-standalone.git;a=commit;h=eecfe5255c533fefd38072a04e4afb56c40d9719 > "If compression for a given set of pages fails to make them smaller, the > file is flagged to avoid future compression attempts later." > > Maybe that''s a good one - so if couple of blocks do not compress then > flag it in file metadata and do not try to compress any blocks within > the file anymore. Of course for some files it will be suboptimal so > maybe a dataset option?I don''t understand why having a couple of blocks in a file not compressible should cause the whole file not to be - that to me seems like a bad idea. What if for example the file is a disk image and the first couple of blocks aren''t compressible but huge chunks of it are ? ZFS does compression at the block level and attempts it on every write. If a given block doesn''t compress sufficiently well (hardcoded 12.5%) or at all then the block is tagged as ZIO_COMPRESS_OFF in the blkptr. That doesn''t impact any other blocks though. So what would the dataset option you mention actually do ? What problem do you think needs solved here ? -- Darren J Moffat
Hello Darren, Monday, November 3, 2008, 12:44:29 PM, you wrote: DJM> Robert Milkowski wrote:>> Hello zfs-discuss, >> >> http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable-standalone.git;a=commit;h=eecfe5255c533fefd38072a04e4afb56c40d9719 >> "If compression for a given set of pages fails to make them smaller, the >> file is flagged to avoid future compression attempts later." >> >> Maybe that''s a good one - so if couple of blocks do not compress then >> flag it in file metadata and do not try to compress any blocks within >> the file anymore. Of course for some files it will be suboptimal so >> maybe a dataset option?DJM> I don''t understand why having a couple of blocks in a file not DJM> compressible should cause the whole file not to be - that to me seems DJM> like a bad idea. DJM> What if for example the file is a disk image and the first couple of DJM> blocks aren''t compressible but huge chunks of it are ? DJM> ZFS does compression at the block level and attempts it on every write. DJM> If a given block doesn''t compress sufficiently well (hardcoded 12.5%) DJM> or at all then the block is tagged as ZIO_COMPRESS_OFF in the blkptr. DJM> That doesn''t impact any other blocks though. DJM> So what would the dataset option you mention actually do ? DJM> What problem do you think needs solved here ? Well, let''s say you have a file server with lots of different documents, pictures, etc. Some of these files are jpegs, gifs, zip files, etc. - they won''t compress at all. Currently ZFS will try to do compression for each block of these files anyway, each time realizing that it''s below 12.5% - it will be burning CPU cycles for no real advantage. With gzip compression it could save quite a lot of cpu cycles. I know that some files could compress very badly at the beginning and very good later on - that''s why I believe the behavior should be tunable. Now, the good filter could be to use MAGIC numbers within files or approach btrfs come up with, or maybe even both combined. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
On Mon, 3 Nov 2008, Robert Milkowski wrote:> > Maybe that''s a good one - so if couple of blocks do not compress then > flag it in file metadata and do not try to compress any blocks within > the file anymore. Of course for some files it will be suboptimal so > maybe a dataset option?This is interesting but probably a bad idea. There are many files which contain a mix of compressable and uncompressable blocks. It is quite easy to create these. One easy way to create such files is via the ''tar'' command. If compression is too slow, then another approach is to monitor the backlog and skip compressing blocks if the backlog is too high. Then use a background scan which compresses blocks when the system is idle. This background scan can have the positive effect that an uncompressed filesystem can be fully converted to a compressed filesystem even if compression is enabled after most files are already written. There would need to be a flag which indicates if the block has already been evaluated for compression or if it was originally uncompressed, or skipped due to load. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Mon, 3 Nov 2008, Robert Milkowski wrote:> Now, the good filter could be to use MAGIC numbers within files or > approach btrfs come up with, or maybe even both combined.You are suggesting that ZFS should detect a GIF or JPEG image stored in a database BLOB. That is pretty fancy functionality. ;-) Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Mon, 3 Nov 2008, Robert Milkowski wrote: >> Maybe that''s a good one - so if couple of blocks do not compress then >> flag it in file metadata and do not try to compress any blocks within >> the file anymore. Of course for some files it will be suboptimal so >> maybe a dataset option? > > This is interesting but probably a bad idea. There are many files > which contain a mix of compressable and uncompressable blocks. It is > quite easy to create these. One easy way to create such files is via > the ''tar'' command. > > If compression is too slow, then another approach is to monitor the > backlog and skip compressing blocks if the backlog is too high.We kind of do that already in that we stop compressing if we aren''t "converging to sync" quick enough because compressing requires we do new allocations as the block size is smaller. > Then> use a background scan which compresses blocks when the system is idle.There is already a plan for this type of functionality.> This background scan can have the positive effect that an uncompressed > filesystem can be fully converted to a compressed filesystem even if > compression is enabled after most files are already written.Or if it wasn''t initially created with compression=on or if it was but later the value of compression= was changed. > There> would need to be a flag which indicates if the block has already been > evaluated for compression or if it was originally uncompressed, or > skipped due to load.The blkptr_t (on disk) will have ZIO_COMPRESS_OFF if the block wasn''t compressed for any reason. That can easily be compared with the property for the dataset. The only part that is missing is a reason code. -- Darren J Moffat
przemolicc at poczta.fm
2008-Nov-04 06:55 UTC
[zfs-discuss] zfs compression - btrfs compression
On Mon, Nov 03, 2008 at 12:33:52PM -0600, Bob Friesenhahn wrote:> On Mon, 3 Nov 2008, Robert Milkowski wrote: > > Now, the good filter could be to use MAGIC numbers within files or > > approach btrfs come up with, or maybe even both combined. > > You are suggesting that ZFS should detect a GIF or JPEG image stored > in a database BLOB. That is pretty fancy functionality. ;-)Maybe some general approach (not strictly GIF- or JPEG-oriented) could be useful. Give people a choice and they will love ZFS even more. Regards Przemyslaw Bak (przemol) -- http://przemol.blogspot.com/ ---------------------------------------------------------------------- Konkurs! Wygraj telewizor LCD! Sprawdz >> http://link.interia.pl/f1f61
przemolicc at poczta.fm wrote:> On Mon, Nov 03, 2008 at 12:33:52PM -0600, Bob Friesenhahn wrote: >> On Mon, 3 Nov 2008, Robert Milkowski wrote: >>> Now, the good filter could be to use MAGIC numbers within files or >>> approach btrfs come up with, or maybe even both combined. >> You are suggesting that ZFS should detect a GIF or JPEG image stored >> in a database BLOB. That is pretty fancy functionality. ;-) > > Maybe some general approach (not strictly GIF- or JPEG-oriented) > could be useful. > > Give people a choice and they will love ZFS even more.But what is the choice you guys are asking for ? You can already control the compression setting on a per dataset basis what more do you really need ? Remembering that the more knobs there are to turn the harder it is to reason about what is going on and predict behaviour. I just don''t see what the problem is with the current situation. You either want compression or you don''t, this works at the block level in ZFS and it already has checks and balances in place to ensure that it doesn''t needlessly burn CPU (decompression is usually what is more expensive than compression). Can someone who cares about this put a proposal together to show what this would look like from a ZFS properties view and what it would actually do, because I''m not getting it. -- Darren J Moffat