thr3ads.net - zfs discuss - [zfs-discuss] zfs compression

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2008-Nov-03 11:47 UTC

[zfs-discuss] zfs compression - btrfs compression

Hello zfs-discuss,

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable-standalone.git;a=commit;h=eecfe5255c533fefd38072a04e4afb56c40d9719
"If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later."

Maybe that''s a good one - so if couple of blocks do not compress then
flag it in file metadata and do not try to compress any blocks within
the file anymore. Of course for some files it will be suboptimal so
maybe a dataset option?



-- 
Best regards,
 Robert Milkowski                          mailto:milek at task.gda.pl
                                     http://milek.blogspot.com

Darren J Moffat

2008-Nov-03 12:44 UTC

head link

[zfs-discuss] zfs compression - btrfs compression

Robert Milkowski wrote:> Hello zfs-discuss,
> 
>
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable-standalone.git;a=commit;h=eecfe5255c533fefd38072a04e4afb56c40d9719
> "If compression for a given set of pages fails to make them smaller,
the
> file is flagged to avoid future compression attempts later."
> 
> Maybe that''s a good one - so if couple of blocks do not compress
then
> flag it in file metadata and do not try to compress any blocks within
> the file anymore. Of course for some files it will be suboptimal so
> maybe a dataset option?
I don''t understand why having a couple of blocks in a file not 
compressible should cause the whole file not to be - that to me seems 
like a bad idea.

What if for example the file is a disk image and the first couple of 
blocks aren''t compressible but huge chunks of it are ?

ZFS does compression at the block level and attempts it on every write. 
  If a given block doesn''t compress sufficiently well (hardcoded 12.5%)
or at all then the block is tagged as ZIO_COMPRESS_OFF in the blkptr. 
That doesn''t impact any other blocks though.

So what would the dataset option you mention actually do ?

What problem do you think needs solved here ?

-- 
Darren J Moffat

Robert Milkowski

2008-Nov-03 13:19 UTC

head link

[zfs-discuss] zfs compression - btrfs compression

Hello Darren,

Monday, November 3, 2008, 12:44:29 PM, you wrote:

DJM> Robert Milkowski wrote:>> Hello zfs-discuss,
>> 
>>
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable-standalone.git;a=commit;h=eecfe5255c533fefd38072a04e4afb56c40d9719
>> "If compression for a given set of pages fails to make them
smaller, the
>> file is flagged to avoid future compression attempts later."
>> 
>> Maybe that''s a good one - so if couple of blocks do not
compress then
>> flag it in file metadata and do not try to compress any blocks within
>> the file anymore. Of course for some files it will be suboptimal so
>> maybe a dataset option?
DJM> I don''t understand why having a couple of blocks in a file not 
DJM> compressible should cause the whole file not to be - that to me seems 
DJM> like a bad idea.

DJM> What if for example the file is a disk image and the first couple of 
DJM> blocks aren''t compressible but huge chunks of it are ?

DJM> ZFS does compression at the block level and attempts it on every write.
DJM>   If a given block doesn''t compress sufficiently well
(hardcoded 12.5%)
DJM> or at all then the block is tagged as ZIO_COMPRESS_OFF in the blkptr. 
DJM> That doesn''t impact any other blocks though.

DJM> So what would the dataset option you mention actually do ?

DJM> What problem do you think needs solved here ?

Well, let''s say you have a file server with lots of different
documents, pictures, etc. Some of these files are jpegs, gifs, zip
files, etc. - they won''t compress at all. Currently ZFS will try to do
compression for each block of these files anyway, each time realizing
that it''s below 12.5% - it will be burning CPU cycles for no real
advantage. With gzip compression it could save quite a lot of cpu
cycles. I know that some files could compress very badly at the
beginning and very good later on - that''s why I believe the behavior
should be tunable.

Now, the good filter could be to use MAGIC numbers within files or
approach btrfs come up with, or maybe even both combined.

-- 
Best regards,
 Robert Milkowski                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Bob Friesenhahn

2008-Nov-03 18:30 UTC

head link

[zfs-discuss] zfs compression - btrfs compression

On Mon, 3 Nov 2008, Robert Milkowski wrote:>
> Maybe that''s a good one - so if couple of blocks do not compress
then
> flag it in file metadata and do not try to compress any blocks within
> the file anymore. Of course for some files it will be suboptimal so
> maybe a dataset option?
This is interesting but probably a bad idea.  There are many files 
which contain a mix of compressable and uncompressable blocks.  It is 
quite easy to create these.  One easy way to create such files is via 
the ''tar'' command.

If compression is too slow, then another approach is to monitor the 
backlog and skip compressing blocks if the backlog is too high.  Then 
use a background scan which compresses blocks when the system is idle. 
This background scan can have the positive effect that an uncompressed 
filesystem can be fully converted to a compressed filesystem even if 
compression is enabled after most files are already written.  There 
would need to be a flag which indicates if the block has already been 
evaluated for compression or if it was originally uncompressed, or 
skipped due to load.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2008-Nov-03 18:33 UTC

head link

[zfs-discuss] zfs compression - btrfs compression

On Mon, 3 Nov 2008, Robert Milkowski wrote:> Now, the good filter could be to use MAGIC numbers within files or
> approach btrfs come up with, or maybe even both combined.
You are suggesting that ZFS should detect a GIF or JPEG image stored 
in a database BLOB.  That is pretty fancy functionality. ;-)

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Darren J Moffat

2008-Nov-03 18:37 UTC

head link

[zfs-discuss] zfs compression - btrfs compression

Bob Friesenhahn wrote:> On Mon, 3 Nov 2008, Robert Milkowski wrote:
>> Maybe that''s a good one - so if couple of blocks do not
compress then
>> flag it in file metadata and do not try to compress any blocks within
>> the file anymore. Of course for some files it will be suboptimal so
>> maybe a dataset option?
> 
> This is interesting but probably a bad idea.  There are many files 
> which contain a mix of compressable and uncompressable blocks.  It is 
> quite easy to create these.  One easy way to create such files is via 
> the ''tar'' command.
> 
> If compression is too slow, then another approach is to monitor the 
> backlog and skip compressing blocks if the backlog is too high.  
We kind of do that already in that we stop compressing if we aren''t 
"converging to sync" quick enough because compressing requires we do
new
allocations as the block size is smaller.

 >			Then> use a background scan which compresses blocks when the system is idle. 
There is already a plan for this type of functionality.
> This background scan can have the positive effect that an uncompressed 
> filesystem can be fully converted to a compressed filesystem even if 
> compression is enabled after most files are already written.  
Or if it wasn''t initially created with compression=on or if it was but 
later the value of compression= was changed.

 > 								There> would need to be a flag which indicates if the block has already been 
> evaluated for compression or if it was originally uncompressed, or 
> skipped due to load.
The blkptr_t (on disk) will have ZIO_COMPRESS_OFF if the block wasn''t 
compressed for any reason.  That can easily be compared with the 
property for the dataset.  The only part that is missing is a reason code.

-- 
Darren J Moffat

przemolicc at poczta.fm

2008-Nov-04 06:55 UTC

head link

[zfs-discuss] zfs compression - btrfs compression

On Mon, Nov 03, 2008 at 12:33:52PM -0600, Bob Friesenhahn
wrote:> On Mon, 3 Nov 2008, Robert Milkowski wrote:
> > Now, the good filter could be to use MAGIC numbers within files or
> > approach btrfs come up with, or maybe even both combined.
> 
> You are suggesting that ZFS should detect a GIF or JPEG image stored 
> in a database BLOB.  That is pretty fancy functionality. ;-)
Maybe some general approach (not strictly GIF- or JPEG-oriented)
could be useful.

Give people a choice and they will love ZFS even more.

Regards
Przemyslaw Bak (przemol)
--
http://przemol.blogspot.com/





















----------------------------------------------------------------------
Konkurs! Wygraj telewizor LCD!
Sprawdz >> http://link.interia.pl/f1f61

Darren J Moffat

2008-Nov-04 10:13 UTC

head link

[zfs-discuss] zfs compression - btrfs compression

przemolicc at poczta.fm wrote:> On Mon, Nov 03, 2008 at 12:33:52PM -0600, Bob Friesenhahn wrote:
>> On Mon, 3 Nov 2008, Robert Milkowski wrote:
>>> Now, the good filter could be to use MAGIC numbers within files or
>>> approach btrfs come up with, or maybe even both combined.
>> You are suggesting that ZFS should detect a GIF or JPEG image stored 
>> in a database BLOB.  That is pretty fancy functionality. ;-)
> 
> Maybe some general approach (not strictly GIF- or JPEG-oriented)
> could be useful.
> 
> Give people a choice and they will love ZFS even more.
But what is the choice you guys are asking for ?

You can already control the compression setting on a per dataset basis 
what more do you really need ?

Remembering that the more knobs there are to turn the harder it is to 
reason about what is going on and predict behaviour.

I just don''t see what the problem is with the current situation.

You either want compression or you don''t, this works at the block level
in ZFS and it already has checks and balances in place to ensure that it 
doesn''t needlessly burn CPU (decompression is usually what is more 
expensive than compression).

Can someone who cares about this put a proposal together to show what 
this would look like from a ZFS properties view and what it would 
actually do, because I''m not getting it.

-- 
Darren J Moffat

zfs discuss - Nov 2008 - zfs compression - btrfs compression

[zfs-discuss] zfs compression - btrfs compression

[zfs-discuss] zfs compression - btrfs compression

[zfs-discuss] zfs compression - btrfs compression

[zfs-discuss] zfs compression - btrfs compression

[zfs-discuss] zfs compression - btrfs compression

[zfs-discuss] zfs compression - btrfs compression

[zfs-discuss] zfs compression - btrfs compression

[zfs-discuss] zfs compression - btrfs compression