Quick question about the interaction of ZFS filesystem compression and the filesystem cache. We have an Opensolaris (actually Nexenta alpha-6) box running RRD collection. These files seem to be quite compressible. A test filesystem containing about 3,000 of these files shows a compressratio of 12.5x. My question is about how the filesystem cache works with compressed files. Does the fscache keep a copy of the compressed data, or the uncompressed blocks? To update one of these RRD files, I believe the whole contents are read into memory, modified, and then written back out. If the filesystem cache maintained a copy of the compressed data, a lot more, maybe more than 10x more, of these files could be maintained in the cache. That would mean we could have a lot more data files without ever needing to do a physical read. Looking at the source code overview, it looks like the compression happens "underneath" the ARC layer, so by that I am assuming the uncompressed blocks are cached, but I wanted to ask to be sure. Thanks! -Andy This message posted from opensolaris.org
Andrew Miller wrote:> Quick question about the interaction of ZFS filesystem compression and the filesystem cache. We have an Opensolaris (actually Nexenta alpha-6) box running RRD collection. These files seem to be quite compressible. A test filesystem containing about 3,000 of these files shows a compressratio of 12.5x. > > My question is about how the filesystem cache works with compressed files. Does the fscache keep a copy of the compressed data, or the uncompressed blocks? To update one of these RRD files, I believe the whole contents are read into memory, modified, and then written back out. If the filesystem cache maintained a copy of the compressed data, a lot more, maybe more than 10x more, of these files could be maintained in the cache. That would mean we could have a lot more data files without ever needing to do a physical read. > > Looking at the source code overview, it looks like the compression happens "underneath" the ARC layer, so by that I am assuming the uncompressed blocks are cached, but I wanted to ask to be sure. > > Thanks! > -Andy >Yup, your assumption is correct. We currently do compression below the ARC. We have contemplated caching data in compressed form, but have not really explored the idea fully yet. -Mark
On 12/8/06, Mark Maybee <Mark.Maybee at sun.com> wrote:> Yup, your assumption is correct. We currently do compression below the > ARC. We have contemplated caching data in compressed form, but have not > really explored the idea fully yet.Hmm... interesting idea. That will incur CPU to do a decompress when the page is reclaimed but reduce memory pressure. What implications this will have on encryption? -- Just me, Wire ...
> Looking at the source code overview, it looks like > the compression happens "underneath" the ARC layer, > so by that I am assuming the uncompressed blocks are > cached, but I wanted to ask to be sure. > > Thanks! > -Andy > > Yup, your assumption is correct. We currently do > compression below the > ARC. We have contemplated caching data in compressed > form, but have not > really explored the idea fully yet. > > -Mark > _______________________________________________Mark, Thanks for the quick response! I imagine the compression will still help quite a bit anyway, since ultimately there''s a lot less data to write back to the disk. A compressed cache would be an interesting tuneable parameter - it would be great for these types of files, and also for some of the things we keep here in databases. (A lot of text/blobs and as such highly compressible). My colleagues and I are really impressed with the design and performance of ZFS - Keep up the good work! -Andy This message posted from opensolaris.org
On 12/7/06, Andrew Miller <z2amiller at gmail.com> wrote:> Quick question about the interaction of ZFS filesystem compression and the filesystem cache. We have an Opensolaris (actually Nexenta alpha-6) box running RRD collection. These files seem to be quite compressible. A test filesystem containing about 3,000 of these files shows a compressratio of 12.5x.Be careful here. If you are using files that have no data in them yet you will get much better compression than later in life. Judging by the fact that you got only 12.5x, I suspect that your files are at least partially populated. Expect the compression to get worse over time. Looking at some RRD files that come from a very active (e.g. numbers vary frequently) servers with data filling about 2/3 of the configured time periods, I see the following rates: 1.8 mpstat.rrd 1.8 vmstat.rrd 1.9 exacct_PROJECT_user.oracle.rrd 2.0 net-ce2.rrd 2.1 iostat-c14.rrd 2.1 iostat-c15.rrd 2.1 iostat-c16.rrd . . . 7.6 net-ce912005.rrd 7.7 net-ce912016.rrd 9.1 exacct_PROJECT_user.gemsadm.rrd 12.2 exacct_PROJECT_exacct_interval.rrd 18.1 exacct_PROJECT_user.patrol.rrd 18.1 exacct_PROJECT_user.precise.rrd 18.1 exacct_PROJECT_user.precise6.rrd 31.8 net-ce8.rrd 39.6 net-eri3.rrd 45.1 net-eri2.rrd The first column is the compression ratio. The net-eri{2,3} files are almost empty.> > My question is about how the filesystem cache works with compressed files. Does the fscache keep a copy of the compressed data, or the uncompressed blocks? To update one of these RRD files, I believe the whole contents are read into memory, modified, and then written back out. If the filesystem cache maintained a copy of the compressed data, a lot more, maybe more than 10x more, of these files could be maintained in the cache. That would mean we could have a lot more data files without ever needing to do a physical read.Here is an insert of a value: 25450: open("/opt/perfstat/rrd/somehost/iostat-c4.rrd", O_RDWR) = 3 25450: fstat64(3, 0xFFBFF5E0) = 0 25450: fstat64(3, 0xFFBFF640) = 0 25450: fstat64(3, 0xFFBFF4E8) = 0 25450: ioctl(3, TCGETA, 0xFFBFF5CC) Err#25 ENOTTY 25450: read(3, " R R D\0 0 0 0 1\0\0\0\0".., 8192) = 8192 25450: llseek(3, 0, SEEK_CUR) = 8192 25450: lseek(3, 0xFFFFFC68, SEEK_CUR) = 7272 25450: fcntl(3, F_SETLK, 0xFFBFF7D0) = 0 25450: llseek(3, 0, SEEK_CUR) = 7272 25450: lseek(3, 2230952, SEEK_SET) = 2230952 25450: write(3, " @ x S = pA3D7\v ?E6 f f".., 64) = 64 25450: lseek(3, 1864, SEEK_SET) = 1864 25450: write(3, " E xA0 # U N K N\0\0\0\0".., 5408) = 5408 25450: close(3) = 0 Notice that it does the following: Open the file Read the first 8K Seek to a particular spot Take a lock Seek Write 64 bytes seek Write 5408 bytes close The rrd file in question is 8.6 MB. There was 8KB of reads and 5472 bytes of write. This is one of the big wins over the current binary rrd format over the original ASCII version that came with MRTG. Mike -- Mike Gerdts http://mgerdts.blogspot.com/
> Be careful here. If you are using files that have no > data in them yet > you will get much better compression than later in > life. Judging by > the fact that you got only 12.5x, I suspect that your > files are at > least partially populated. Expect the compression to > get worse over > time.I do expect it to get somewhat worse over time -- I don''t expect such compression forever but didn''t want to get too detailed in my original question. :-) A lot of the data points I''m collecting (40%+) are quite static or change slowly over time - representing, for example, disk space, TCP errors (hopefully always zero! :-)) or JVM jstats of development JVM instances that get only small occasional bursts in activity. (snip)> Read the first 8K > Seek to a particular spot > Take a lock > Seek > Write 64 bytes > seek > Write 5408 bytes > closeInteresting, that looks a lot different than what I''m seeing. Maybe something different in the implementation (I''m using perl RRDs and RRD 1.2.11). Note that RRDFILE.rrd is 125504 bytes on disk. I''ll have to look into it a little deeper as it certainly would help performance to just read the preamble and modify the pieces of the RRD that need to change. Maybe it''s because my RRD''s are quite small, they contain only one DS and I tuned down the number and length of the RRA''s while fighting the performance issues that ultimately ended in me moving this to an opensolaris based box. open("RRDFILE.rrd", O_RDONLY) = 14 fstat64(14, 0x080473E0) = 0 fstat64(14, 0x08047310) = 0 ioctl(14, TCGETA, 0x080473AC) Err#25 ENOTTY read(14, " R R D\0 0 0 0 3\0\0\0\0".., 125952) = 125504 llseek(14, 0, SEEK_CUR) = 125504 lseek(14, 21600, SEEK_SET) = 21600 lseek(14, 1600, SEEK_SET) = 1600 read(14, "\0\0\0\0\0\0F8FF\0\0\0\0".., 125952) = 123904 llseek(14, 0xFFFFFFFFFFFE24F8, SEEK_CUR) = 3896 close(14) = 0 open("RRDFILE.rrd", O_RDWR) = 14 fstat64(14, 0x08047320) = 0 fstat64(14, 0x08047250) = 0 ioctl(14, TCGETA, 0x080472EC) Err#25 ENOTTY read(14, " R R D\0 0 0 0 3\0\0\0\0".., 125952) = 125504 llseek(14, 0, SEEK_CUR) = 125504 lseek(14, 0, SEEK_END) = 125504 llseek(14, 0, SEEK_CUR) = 125504 llseek(14, 0, SEEK_CUR) = 125504 lseek(14, 1504, SEEK_SET) = 1504 fcntl(14, F_SETLK, 0x08047430) = 0 mmap(0x00000000, 125504, PROT_READ|PROT_WRITE, MAP_SHARED, 14, 0) = 0xFE25F000 munmap(0xFE25F000, 125504) = 0 llseek(14, 0, SEEK_CUR) = 1504 lseek(14, 880, SEEK_SET) = 880 write(14, " :B0 x E\0\0\0\0 1 5\0\0".., 624) = 624 close(14) = 0 (I did an strace on linux, also, which is using RRD 1.0.49, and it looks about the same - appears to read the whole thing. Maybe it''s something in RRDs or the way I''m using it) Thanks for spending some of your time analyzing my problem. :-) -Andy This message posted from opensolaris.org
Mark Maybee wrote:> Yup, your assumption is correct. We currently do compression below the > ARC. We have contemplated caching data in compressed form, but have not > really explored the idea fully yet.The same applies to encryption, the current play is to encrypt just after where we currently compress. There are advantages to caching the encrypted rather than clear content in some cases, and disadvantages in others (it depends on what risk you are attempting to mitigate by using encryption). My first cut at encryption for ZFS did actually encrypt what was in the ARC - but that was a mistake because I had no hook to decrypt it. -- Darren J Moffat