thr3ads.net - zfs discuss - [zfs-discuss] ZFS compression / ARC interaction [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Andrew Miller

2006-Dec-07 21:27 UTC

[zfs-discuss] ZFS compression / ARC interaction

Quick question about the interaction of ZFS filesystem compression and the
filesystem cache.  We have an Opensolaris (actually Nexenta alpha-6) box running
RRD collection.   These files seem to be quite compressible.  A test filesystem
containing about 3,000 of these files shows a compressratio of 12.5x.

My question is about how the filesystem cache works with compressed files.  Does
the fscache keep a copy of the compressed data, or the uncompressed blocks?   To
update one of these RRD files, I believe the whole contents are read into
memory, modified, and then written back out.   If the filesystem cache
maintained a copy of the compressed data, a lot more, maybe more than 10x more,
of these files could be maintained in the cache.  That would mean we could have
a lot more data files without ever needing to do a physical read.

Looking at the source code overview, it looks like the compression happens
"underneath" the ARC layer, so by that I am assuming the uncompressed
blocks are cached, but I wanted to ask to be sure.

Thanks!
-Andy
 
 
This message posted from opensolaris.org

Mark Maybee

2006-Dec-07 22:37 UTC

head link

[zfs-discuss] ZFS compression / ARC interaction

Andrew Miller wrote:> Quick question about the interaction of ZFS filesystem compression and the
filesystem cache.  We have an Opensolaris (actually Nexenta alpha-6) box running
RRD collection.   These files seem to be quite compressible.  A test filesystem
containing about 3,000 of these files shows a compressratio of 12.5x.
> 
> My question is about how the filesystem cache works with compressed files. 
Does the fscache keep a copy of the compressed data, or the uncompressed blocks?
To update one of these RRD files, I believe the whole contents are read into
memory, modified, and then written back out.   If the filesystem cache
maintained a copy of the compressed data, a lot more, maybe more than 10x more,
of these files could be maintained in the cache.  That would mean we could have
a lot more data files without ever needing to do a physical read.
> 
> Looking at the source code overview, it looks like the compression happens
"underneath" the ARC layer, so by that I am assuming the uncompressed
blocks are cached, but I wanted to ask to be sure.
> 
> Thanks!
> -Andy
>  Yup, your assumption is correct.  We currently do compression below the
ARC.  We have contemplated caching data in compressed form, but have not
really explored the idea fully yet.

-Mark

Wee Yeh Tan

2006-Dec-07 22:54 UTC

head link

[zfs-discuss] ZFS compression / ARC interaction

On 12/8/06, Mark Maybee <Mark.Maybee at sun.com>
wrote:> Yup, your assumption is correct.  We currently do compression below the
> ARC.  We have contemplated caching data in compressed form, but have not
> really explored the idea fully yet.
Hmm... interesting idea.

That will incur CPU to do a decompress when the page is reclaimed but
reduce memory pressure.  What implications this will have on
encryption?

-- 
Just me,
Wire ...

Andrew Miller

2006-Dec-07 23:01 UTC

head link

[zfs-discuss] Re: ZFS compression / ARC interaction

> Looking at the source code overview, it looks like
> the compression happens "underneath" the ARC layer,
> so by that I am assuming the uncompressed blocks are
>  cached, but I wanted to ask to be sure.
>  
>  Thanks!
>  -Andy
>   
> Yup, your assumption is correct.  We currently do
> compression below the
> ARC.  We have contemplated caching data in compressed
> form, but have not
> really explored the idea fully yet.
> 
> -Mark
> _______________________________________________
Mark,

Thanks for the quick response!    I imagine the compression will still help
quite a bit anyway, since ultimately there''s a lot less data to write
back to the disk.   A compressed cache would be an interesting tuneable
parameter - it would be great for these types of files, and also for some of the
things we keep here in databases.  (A lot of text/blobs and as such highly
compressible).

My colleagues and I are really impressed with the design and performance of ZFS
- Keep up the good work!

-Andy
 
 
This message posted from opensolaris.org

Mike Gerdts

2006-Dec-07 23:31 UTC

head link

[zfs-discuss] ZFS compression / ARC interaction

On 12/7/06, Andrew Miller <z2amiller at gmail.com>
wrote:> Quick question about the interaction of ZFS filesystem compression and the
filesystem cache.  We have an Opensolaris (actually Nexenta alpha-6) box running
RRD collection.   These files seem to be quite compressible.  A test filesystem
containing about 3,000 of these files shows a compressratio of 12.5x.
Be careful here.  If you are using files that have no data in them yet
you will get much better compression than later in life.  Judging by
the fact that you got only 12.5x, I suspect that your files are at
least partially populated.  Expect the compression to get worse over
time.

Looking at some RRD files that come from a very active (e.g. numbers
vary frequently) servers with data filling about 2/3 of the configured
time periods, I see the following rates:

1.8 mpstat.rrd
1.8 vmstat.rrd
1.9 exacct_PROJECT_user.oracle.rrd
2.0 net-ce2.rrd
2.1 iostat-c14.rrd
2.1 iostat-c15.rrd
2.1 iostat-c16.rrd
. . .
7.6 net-ce912005.rrd
7.7 net-ce912016.rrd
9.1 exacct_PROJECT_user.gemsadm.rrd
12.2 exacct_PROJECT_exacct_interval.rrd
18.1 exacct_PROJECT_user.patrol.rrd
18.1 exacct_PROJECT_user.precise.rrd
18.1 exacct_PROJECT_user.precise6.rrd
31.8 net-ce8.rrd
39.6 net-eri3.rrd
45.1 net-eri2.rrd

The first column is the compression ratio.  The net-eri{2,3} files are
almost empty.
>
> My question is about how the filesystem cache works with compressed files. 
Does the fscache keep a copy of the compressed data, or the uncompressed blocks?
To update one of these RRD files, I believe the whole contents are read into
memory, modified, and then written back out.   If the filesystem cache
maintained a copy of the compressed data, a lot more, maybe more than 10x more,
of these files could be maintained in the cache.  That would mean we could have
a lot more data files without ever needing to do a physical read.
Here is an insert of a value:

25450:  open("/opt/perfstat/rrd/somehost/iostat-c4.rrd", O_RDWR) = 3
25450:  fstat64(3, 0xFFBFF5E0)                          = 0
25450:  fstat64(3, 0xFFBFF640)                          = 0
25450:  fstat64(3, 0xFFBFF4E8)                          = 0
25450:  ioctl(3, TCGETA, 0xFFBFF5CC)                    Err#25 ENOTTY
25450:  read(3, " R R D\0 0 0 0 1\0\0\0\0".., 8192)     = 8192
25450:  llseek(3, 0, SEEK_CUR)                          = 8192
25450:  lseek(3, 0xFFFFFC68, SEEK_CUR)                  = 7272
25450:  fcntl(3, F_SETLK, 0xFFBFF7D0)                   = 0
25450:  llseek(3, 0, SEEK_CUR)                          = 7272
25450:  lseek(3, 2230952, SEEK_SET)                     = 2230952
25450:  write(3, " @ x S = pA3D7\v ?E6 f f".., 64)      = 64
25450:  lseek(3, 1864, SEEK_SET)                        = 1864
25450:  write(3, " E xA0 # U N K N\0\0\0\0".., 5408)    = 5408
25450:  close(3)                                        = 0

Notice that it does the following:

Open the file
Read the first 8K
Seek to a particular spot
Take a lock
Seek
Write 64 bytes
seek
Write 5408 bytes
close

The rrd file in question is 8.6 MB.  There was 8KB of reads and 5472
bytes of write.  This is one of the big wins over the current binary
rrd format over the original ASCII version that came with MRTG.

Mike

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Andrew Miller

2006-Dec-08 00:54 UTC

head link

[zfs-discuss] Re: ZFS compression / ARC interaction

> Be careful here.  If you are using files that have no
> data in them yet
> you will get much better compression than later in
> life.  Judging by
> the fact that you got only 12.5x, I suspect that your
> files are at
> least partially populated.  Expect the compression to
> get worse over
> time.
I do expect it to get somewhat worse over time -- I don''t expect such
compression forever but didn''t want to get too detailed in my original
question. :-)   A lot of the data points I''m collecting (40%+) are
quite static or change slowly over time - representing, for example, disk space,
TCP errors (hopefully always zero! :-)) or JVM jstats of development JVM
instances that get only small occasional bursts in activity.

(snip)> Read the first 8K
> Seek to a particular spot
> Take a lock
> Seek
> Write 64 bytes
> seek
> Write 5408 bytes
> close
Interesting, that looks a lot different than what I''m seeing.  Maybe
something different in the implementation (I''m using perl RRDs and RRD
1.2.11).  Note that RRDFILE.rrd is 125504 bytes on disk.  I''ll have to
look into it a little deeper as it certainly would help performance to just read
the preamble and modify the pieces of the RRD that need to change.  Maybe
it''s because my RRD''s are quite small, they contain only one
DS and I tuned down the number and length of the RRA''s while fighting
the performance issues that ultimately ended in me moving this to an opensolaris
based box.

open("RRDFILE.rrd", O_RDONLY) = 14
fstat64(14, 0x080473E0)                         = 0
fstat64(14, 0x08047310)                         = 0
ioctl(14, TCGETA, 0x080473AC)                   Err#25 ENOTTY
read(14, " R R D\0 0 0 0 3\0\0\0\0".., 125952)  = 125504
llseek(14, 0, SEEK_CUR)                         = 125504
lseek(14, 21600, SEEK_SET)                      = 21600
lseek(14, 1600, SEEK_SET)                       = 1600
read(14, "\0\0\0\0\0\0F8FF\0\0\0\0".., 125952)  = 123904
llseek(14, 0xFFFFFFFFFFFE24F8, SEEK_CUR)        = 3896
close(14)                                       = 0
open("RRDFILE.rrd", O_RDWR) = 14
fstat64(14, 0x08047320)                         = 0
fstat64(14, 0x08047250)                         = 0
ioctl(14, TCGETA, 0x080472EC)                   Err#25 ENOTTY
read(14, " R R D\0 0 0 0 3\0\0\0\0".., 125952)  = 125504
llseek(14, 0, SEEK_CUR)                         = 125504
lseek(14, 0, SEEK_END)                          = 125504
llseek(14, 0, SEEK_CUR)                         = 125504
llseek(14, 0, SEEK_CUR)                         = 125504
lseek(14, 1504, SEEK_SET)                       = 1504
fcntl(14, F_SETLK, 0x08047430)                  = 0
mmap(0x00000000, 125504, PROT_READ|PROT_WRITE, MAP_SHARED, 14, 0) = 0xFE25F000
munmap(0xFE25F000, 125504)                      = 0
llseek(14, 0, SEEK_CUR)                         = 1504
lseek(14, 880, SEEK_SET)                        = 880
write(14, " :B0 x E\0\0\0\0 1 5\0\0".., 624)    = 624
close(14)                                       = 0

(I did an strace on linux, also, which is using RRD 1.0.49, and it looks about
the same - appears to read the whole thing.  Maybe it''s something in
RRDs or the way I''m using it)

Thanks for spending some of your time analyzing my problem.  :-)

-Andy
 
 
This message posted from opensolaris.org

Darren J Moffat

2006-Dec-08 09:16 UTC

head link

[zfs-discuss] ZFS compression / ARC interaction

Mark Maybee wrote:> Yup, your assumption is correct.  We currently do compression below the
> ARC.  We have contemplated caching data in compressed form, but have not
> really explored the idea fully yet.
The same applies to encryption, the current play is to encrypt just 
after where we currently compress.   There are advantages to caching the 
encrypted rather than clear content in some cases, and disadvantages in 
others (it depends on what risk you are attempting to mitigate by using 
encryption).  My first cut at encryption for ZFS did actually encrypt 
what was in the ARC - but that was a mistake because I had no hook to 
decrypt it.

-- 
Darren J Moffat

zfs discuss - Dec 2006 - ZFS compression / ARC interaction

[zfs-discuss] ZFS compression / ARC interaction

[zfs-discuss] ZFS compression / ARC interaction

[zfs-discuss] ZFS compression / ARC interaction

[zfs-discuss] Re: ZFS compression / ARC interaction

[zfs-discuss] ZFS compression / ARC interaction

[zfs-discuss] Re: ZFS compression / ARC interaction

[zfs-discuss] ZFS compression / ARC interaction