Hi, I just installed 2009.06 and found that compression isn''t enabled by default when filesystems are created. Does is make sense to have an RFE open for this? (I''ll open one tonight if need be.) We keep telling people to turn on compression. Are there any situations where turning on compression doesn''t make sense, like rpool/swap? what about rpool/dump? Thanks, ~~sa ---------------- Shannon A. Fiume System Administrator, Infrastructure and Lab Management, Cloud Computing shannon dot fiume at sun dot com
Bob Friesenhahn
2009-Jun-15 20:00 UTC
[zfs-discuss] compression at zfs filesystem creation
On Mon, 15 Jun 2009, Shannon Fiume wrote:> I just installed 2009.06 and found that compression isn''t enabled by > default when filesystems are created. Does is make sense to have an > RFE open for this? (I''ll open one tonight if need be.) We keep > telling people to turn on compression. Are there any situations > where turning on compression doesn''t make sense, like rpool/swap? > what about rpool/dump?In most cases compression is not desireable. It consumes CPU and results in uneven system performance. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Mon, 15 Jun 2009, Shannon Fiume wrote: > >> I just installed 2009.06 and found that compression isn''t enabled by >> default when filesystems are created. Does is make sense to have an >> RFE open for this? (I''ll open one tonight if need be.) We keep telling >> people to turn on compression. Are there any situations where turning >> on compression doesn''t make sense, like rpool/swap? what about >> rpool/dump? > > In most cases compression is not desireable. It consumes CPU and > results in uneven system performance.IIRC there was a blog about I/O performance with ZFS stating that it was faster with compression ON as it didn''t have to wait for so much data from the disks and that the CPU was fast at unpacking data. But sure, it uses more CPU (and probably memory).
dick hoogendijk
2009-Jun-15 20:59 UTC
[zfs-discuss] compression at zfs filesystem creation
On Mon, 15 Jun 2009 22:51:12 +0200 "Thommy M." <thommy.m.malmstrom at gmail.com> wrote:> IIRC there was a blog about I/O performance with ZFS stating that it > was faster with compression ON as it didn''t have to wait for so much > data from the disks and that the CPU was fast at unpacking data. But > sure, it uses more CPU (and probably memory).IF at all, it certainly should not be the DEFAULT. Compression is a choice, nothing more. -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | nevada / OpenSolaris 2009.06 release + All that''s really worth doing is what we do for others (Lewis Carrol)
* Shannon Fiume (Shannon.Fiume at Sun.COM) wrote:> Hi, > > I just installed 2009.06 and found that compression isn''t enabled by > default when filesystems are created. Does is make sense to have an > RFE open for this? (I''ll open one tonight if need be.) We keep telling > people to turn on compression. Are there any situations where turning > on compression doesn''t make sense, like rpool/swap? what about > rpool/dump?That would be enhancement request #86. http://defect.opensolaris.org/bz/show_bug.cgi?id=86 Cheers, -- Glenn
On Mon, 15 Jun 2009, dick hoogendijk wrote:> IF at all, it certainly should not be the DEFAULT. > Compression is a choice, nothing more.I respectfully disagree somewhat. Yes, compression shuould be a choice, but I think the default should be for it to be enabled. -- Rich Teer, SCSA, SCNA, SCSECA URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer
Bob Friesenhahn
2009-Jun-16 00:06 UTC
[zfs-discuss] compression at zfs filesystem creation
On Mon, 15 Jun 2009, Thommy M. wrote:>> >> In most cases compression is not desireable. It consumes CPU and >> results in uneven system performance. > > IIRC there was a blog about I/O performance with ZFS stating that it was > faster with compression ON as it didn''t have to wait for so much data > from the disks and that the CPU was fast at unpacking data. But sure, it > uses more CPU (and probably memory).I''ll believe this when I see it. :-) With really slow disks and a fast CPU it is possible that reading data the first time is faster. However, Solaris is really good at caching data so any often-accessed data is highly likely to be cached and therefore read just one time. The main point of using compression for the root pool would be so that the OS can fit on an abnormally small device such as a FLASH disk. I would use it for a read-mostly device or an archive (backup) device. On desktop systems the influence of compression on desktop response is quite noticeable when writing, even with very fast CPUs and multiple cores. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Mon, 15 Jun 2009, Bob Friesenhahn wrote:> In most cases compression is not desireable. It consumes CPU and results in > uneven system performance.You actually have that backwards. :-) In most cases, compression is very desirable. Performance studies have shown that today''s CPUs can compress data faster than it takes for the uncompressed data to be read or written. That is, the time to read or write compressed data + the time to compress or decompress it is less than the time read or write the uncompressed data. Such is the difference between CPUs and I/O! You are correct that the compression/decompression uses CPU, but most systems have an abundance of CPU, especially when performing I/O. -- Rich Teer, SCSA, SCNA, SCSECA URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer
> On Mon, 15 Jun 2009, dick hoogendijk wrote: > >> IF at all, it certainly should not be the DEFAULT. >> Compression is a choice, nothing more. > > I respectfully disagree somewhat. Yes, compression shuould be a > choice, but I think the default should be for it to be enabled.I agree that "Compression is a choice" and would add : Compression is a choice and it is the default. Just my feelings on the issue. Dennis Clarke
Bob Friesenhahn
2009-Jun-16 01:07 UTC
[zfs-discuss] compression at zfs filesystem creation
On Mon, 15 Jun 2009, Rich Teer wrote:> > You actually have that backwards. :-) In most cases, compression is very > desirable. Performance studies have shown that today''s CPUs can compress > data faster than it takes for the uncompressed data to be read or written.Do you have a reference for such an analysis based on ZFS? I would be interested in linear read/write performance rather than random access synchronous access. Perhaps you are going to make me test this for myself.> You are correct that the compression/decompression uses CPU, but most systems > have an abundance of CPU, especially when performing I/O.I assume that you are talking about single-user systems with little else to do? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hello, I would like to add one more point to this. Everyone seems to agree that compression is useful for reducing load on the disks and the disagreement is about the impact on CPU utilization, right? What about when the compression is performed in dedicated hardware? Shouldn''t compression be on by default in that case? How do I put in an RFE for that? Monish> >> On Mon, 15 Jun 2009, dick hoogendijk wrote: >> >>> IF at all, it certainly should not be the DEFAULT. >>> Compression is a choice, nothing more. >> >> I respectfully disagree somewhat. Yes, compression shuould be a >> choice, but I think the default should be for it to be enabled. > > I agree that "Compression is a choice" and would add : > > Compression is a choice and it is the default. > > Just my feelings on the issue. > > Dennis Clarke > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Robert Milkowski
2009-Jun-16 13:09 UTC
[zfs-discuss] compression at zfs filesystem creation
On Mon, 15 Jun 2009, Bob Friesenhahn wrote:> On Mon, 15 Jun 2009, Thommy M. wrote: >>> >>> In most cases compression is not desireable. It consumes CPU and >>> results in uneven system performance. >> >> IIRC there was a blog about I/O performance with ZFS stating that it was >> faster with compression ON as it didn''t have to wait for so much data >> from the disks and that the CPU was fast at unpacking data. But sure, it >> uses more CPU (and probably memory). > > I''ll believe this when I see it. :-) > > With really slow disks and a fast CPU it is possible that reading data the > first time is faster. However, Solaris is really good at caching data so any > often-accessed data is highly likely to be cached and therefore read just one > time. The main point of using compression for the root pool would be so that > the OS can fit on an abnormally small device such as a FLASH disk. I would > use it for a read-mostly device or an archive (backup) device. >Well, it depends on your working set and how much memory you have. I came across systems with lots of CPU left to spare but a working set is much bigger than the amount of memory and enabling lzjb gave over 2x compression ratio and make an application to run faster. Seen it with ldap, mysql and couple of other apps.
Bob Friesenhahn
2009-Jun-16 13:46 UTC
[zfs-discuss] compression at zfs filesystem creation
On Mon, 15 Jun 2009, Bob Friesenhahn wrote:> On Mon, 15 Jun 2009, Rich Teer wrote: >> >> You actually have that backwards. :-) In most cases, compression is very >> desirable. Performance studies have shown that today''s CPUs can compress >> data faster than it takes for the uncompressed data to be read or written. > > Do you have a reference for such an analysis based on ZFS? I would be > interested in linear read/write performance rather than random access > synchronous access. > > Perhaps you are going to make me test this for myself.Ok, I tested this for myself on a Solaris 10 system with 4 3GHz AMD64 cores and see that we were both right. I did an iozone run with compression and do see a performance improvement. I don''t know what the data iozone produces looks like, but it clearly must be quite compressable. Testing was done with a 64GB file: KB reclen write rewrite read reread uncompressed: 67108864 128 359965 354854 550869 554271 lzjb: 67108864 128 851336 924881 1289059 1362625 Unfortunately, during the benchmark run with lzjb the system desktop was essentially unusable with misbehaving mouse and keyboard as well as reported 55% CPU consumption. Without the compression the system is fully usable with very little CPU consumed. With a slower disk subsystem the CPU overhead would surely be less since writing is still throttled by the disk. It would be better to test with real data rather than iozone. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Mon, 15 Jun 2009, Thommy M. wrote: >>> >>> In most cases compression is not desireable. It consumes CPU and >>> results in uneven system performance. >> >> IIRC there was a blog about I/O performance with ZFS stating that it was >> faster with compression ON as it didn''t have to wait for so much data >> from the disks and that the CPU was fast at unpacking data. But sure, it >> uses more CPU (and probably memory). > > I''ll believe this when I see it. :-) > > With really slow disks and a fast CPU it is possible that reading data > the first time is faster. However, Solaris is really good at caching > data so any often-accessed data is highly likely to be cached and > therefore read just one time.One thing I''m cuious about... When reading compressed data, is it cached before or after it is uncompressed? If before, then while you''ve save re-reading it from the disk, there is still (redundant) overhead for uncompressing it over and over. If the uncompressed data is cached, then I agree it sounds like a total win for read-mostly filesystems. -Kyle> The main point of using compression for the root pool would be so > that the OS can fit on an abnormally small device such as a FLASH > disk. I would use it for a read-mostly device or an archive (backup) > device. > > On desktop systems the influence of compression on desktop response is > quite noticeable when writing, even with very fast CPUs and multiple > cores. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Darren J Moffat
2009-Jun-16 15:01 UTC
[zfs-discuss] compression at zfs filesystem creation
Kyle McDonald wrote:> Bob Friesenhahn wrote: >> On Mon, 15 Jun 2009, Thommy M. wrote: >>>> >>>> In most cases compression is not desireable. It consumes CPU and >>>> results in uneven system performance. >>> >>> IIRC there was a blog about I/O performance with ZFS stating that it was >>> faster with compression ON as it didn''t have to wait for so much data >>> from the disks and that the CPU was fast at unpacking data. But sure, it >>> uses more CPU (and probably memory). >> >> I''ll believe this when I see it. :-) >> >> With really slow disks and a fast CPU it is possible that reading data >> the first time is faster. However, Solaris is really good at caching >> data so any often-accessed data is highly likely to be cached and >> therefore read just one time. > One thing I''m cuious about... > > When reading compressed data, is it cached before or after it is > uncompressed?The decompressed (and decrypted) data is what is cached in memory. Currently the L2ARC stores decompressed (but encrypted) data on the cache devices. -- Darren J Moffat
Monish Shah wrote:> Hello, > > I would like to add one more point to this. > > Everyone seems to agree that compression is useful for reducing load > on the disks and the disagreement is about the impact on CPU > utilization, right? > > What about when the compression is performed in dedicated hardware? > Shouldn''t compression be on by default in that case? How do I put in > an RFE for that?Is there a bugs.intel.com? :-) NB, Solaris already does this for encryption, which is often a more computationally intensive operation. I think the general cases are performed well by current hardware, and it is already multithreaded. The bigger issue is, as Bob notes, resource management. There is opportunity for people to work here, especially since the community has access to large amounts of varied hardware. Should we spin up a special interest group of some sort? -- richard
Darren J Moffat wrote:> Kyle McDonald wrote: >> Bob Friesenhahn wrote: >>> On Mon, 15 Jun 2009, Thommy M. wrote: >>>>> >>>>> In most cases compression is not desireable. It consumes CPU and >>>>> results in uneven system performance. >>>> >>>> IIRC there was a blog about I/O performance with ZFS stating that >>>> it was >>>> faster with compression ON as it didn''t have to wait for so much data >>>> from the disks and that the CPU was fast at unpacking data. But >>>> sure, it >>>> uses more CPU (and probably memory). >>> >>> I''ll believe this when I see it. :-) >>> >>> With really slow disks and a fast CPU it is possible that reading >>> data the first time is faster. However, Solaris is really good at >>> caching data so any often-accessed data is highly likely to be >>> cached and therefore read just one time. >> One thing I''m cuious about... >> >> When reading compressed data, is it cached before or after it is >> uncompressed? > > The decompressed (and decrypted) data is what is cached in memory. > > Currently the L2ARC stores decompressed (but encrypted) data on the > cache devices. >So the cache saves not only the time to access the disk but also the CPU time to decompress. Given this, I think it could be a big win. -Kyle
On Tue, June 16, 2009 15:32, Kyle McDonald wrote:> So the cache saves not only the time to access the disk but also the CPU > time to decompress. Given this, I think it could be a big win.Unless you''re in GIMP working on JPEGs, or doing some kind of MPEG video editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of which are probably some of the largest files in most people''s homedirs nowadays. 1 GB of e-mail is a lot (probably my entire personal mail collection for a decade) and will compress well; 1 GB of audio files is nothing, and won''t compress at all. Perhaps compressing /usr could be handy, but why bother enabling compression if the majority (by volume) of user data won''t do anything but burn CPU? So the correct answer on whether compression should be enabled by default is "it depends". (IMHO :) )
Hello Richard,> Monish Shah wrote: >> What about when the compression is performed in dedicated hardware? >> Shouldn''t compression be on by default in that case? How do I put in an >> RFE for that? > > Is there a bugs.intel.com? :-)I may have misled you. I''m not asking for Intel to add hardware compression. Actually, we already have gzip compression boards that we have integrated into OpenSolaris / ZFS and they are also supported under NexentaStor. What I''m saying is that if such a card is installed, compression should be enabled by default.> NB, Solaris already does this for encryption, which is often a more > computationally intensive operation.Actually, compression is more compute intensive than symmetric encryption (such as AES). Public key encryption, on the other hand, is horrendously compute intensive, much more than compression or symmectric encryption. But, nobody uses public key encryption for bulk data encryption, so that doesn''t apply. Your mileage may vary. You can always come up with compression algorithms that don''t do a very good job of compressing, but which are light on CPU utilization. Monish> I think the general cases are performed well by current hardware, and > it is already multithreaded. The bigger issue is, as Bob notes, resource > management. There is opportunity for people to work here, especially > since the community has access to large amounts of varied hardware. > Should we spin up a special interest group of some sort? > -- richard > >
Kjetil Torgrim Homme
2009-Jun-17 10:03 UTC
[zfs-discuss] compression at zfs filesystem creation
"David Magda" <dmagda at ee.ryerson.ca> writes:> On Tue, June 16, 2009 15:32, Kyle McDonald wrote: > >> So the cache saves not only the time to access the disk but also >> the CPU time to decompress. Given this, I think it could be a big >> win. > > Unless you''re in GIMP working on JPEGs, or doing some kind of MPEG > video editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of > which are probably some of the largest files in most people''s > homedirs nowadays.indeed. I think only programmers will see any substantial benefit from compression, since both the code itself and the object files generated are easily compressible.> 1 GB of e-mail is a lot (probably my entire personal mail collection > for a decade) and will compress well; 1 GB of audio files is > nothing, and won''t compress at all. > > Perhaps compressing /usr could be handy, but why bother enabling > compression if the majority (by volume) of user data won''t do > anything but burn CPU? > > So the correct answer on whether compression should be enabled by > default is "it depends". (IMHO :) )I''d be interested to see benchmarks on MySQL/PostgreSQL performance with compression enabled. my *guess* would be it isn''t beneficial since they usually do small reads and writes, and there is little gain in reading 4 KiB instead of 8 KiB. what other uses cases can benefit from compression? -- Kjetil T. Homme Redpill Linpro AS - Changing the game
Fajar A. Nugraha
2009-Jun-17 10:15 UTC
[zfs-discuss] compression at zfs filesystem creation
On Wed, Jun 17, 2009 at 5:03 PM, Kjetil Torgrim Homme<kjetilho at linpro.no> wrote:> indeed. ?I think only programmers will see any substantial benefit > from compression, since both the code itself and the object files > generated are easily compressible.>> Perhaps compressing /usr could be handy, but why bother enabling >> compression if the majority (by volume) of user data won''t do >> anything but burn CPU?How do you define "substantial"? My opensolaris snv_111b installation has 1.47x compression ratio for "/", with the default compression. It''s well worthed for me. -- Fajar
Casper.Dik at Sun.COM
2009-Jun-17 10:35 UTC
[zfs-discuss] compression at zfs filesystem creation
>On Wed, Jun 17, 2009 at 5:03 PM, Kjetil Torgrim Homme<kjetilho at linpro>.no> wrote: >> indeed. =A0I think only programmers will see any substantial benefi>t >> from compression, since both the code itself and the object files >> generated are easily compressible. > >>> Perhaps compressing /usr could be handy, but why bother enabling >>> compression if the majority (by volume) of user data won''t do >>> anything but burn CPU? > >How do you define "substantial"? My opensolaris snv_111b installation >has 1.47x compression ratio for "/", with the default compression. >It''s well worthed for me.Indeed; I''ve had a few systems with: UFS (boot env 1) UFS (boot env 2) swap lucreate couldn''t fix everything in one (old UFS) partition because of dump and swap; with compression I can fit multiple environments (more than two). I still use "disk swap" because I have some bad experiences with ZFS swap. (ZFS appears to cache and that is very wrong) Now I use: rpool (using both the UFS partitions, now concatenated into one slice) and real swap. My ZFS/Solaris wish list is this: - when you convert from UFS to ZFS, zpool create fails and requires create if; I''d like zpool create about *all* errors, not just one so you know exactly what collateral damage you would do) "has a UFS filesystem" "s2 overlaps s0" etc - zpool upgrade should fail if one of the available boot environments doesn''t support the new version (or upgrade to the lowest supported zfs version) Casper
Kjetil Torgrim Homme
2009-Jun-17 11:27 UTC
[zfs-discuss] compression at zfs filesystem creation
"Fajar A. Nugraha" <fajar at fajar.net> writes:> Kjetil Torgrim Homme wrote: >> indeed. ?I think only programmers will see any substantial benefit >> from compression, since both the code itself and the object files >> generated are easily compressible. > >>> Perhaps compressing /usr could be handy, but why bother enabling >>> compression if the majority (by volume) of user data won''t do >>> anything but burn CPU? > > How do you define "substantial"? My opensolaris snv_111b installation > has 1.47x compression ratio for "/", with the default compression. > It''s well worthed for me.I don''t really care if my "/" is 5 GB or 3 GB. how much faster is your system operating? what''s the compression rate on your data areas? -- Kjetil T. Homme Redpill Linpro AS - Changing the game
>> Unless you''re in GIMP working on JPEGs, or doing some kind of MPEG >> video editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of >> which are probably some of the largest files in most people''s >> homedirs nowadays. > > indeed. I think only programmers will see any substantial benefit > from compression, since both the code itself and the object files > generated are easily compressible.If we are talking about data on people''s desktops and laptops, yes, it is not very common to see a lot of compressible data. There will be some other examples, such as desktops being used for engineering drawings. The CAD files do tend to be compressible and they tend to be big. In any case, the really interesting case for compression is for business data (databases, e-mail servers, etc.) which tends to be quite compressible. ...> I''d be interested to see benchmarks on MySQL/PostgreSQL performance > with compression enabled. my *guess* would be it isn''t beneficial > since they usually do small reads and writes, and there is little gain > in reading 4 KiB instead of 8 KiB.OK, now you have switched from compressibility of data to performance advantage. As I said above, this kind of data usually compresses pretty well. I agree that for random reads, there wouldn''t be any gain from compression. For random writes, in a copy-on-write file system, there might be gains, because the blocks may be arranged in sequential fashion anyway. We are in the process of doing some performance tests to prove or disprove this. Now, if you are using SSDs for this type of workload, I''m pretty sure that compression will help writes. The reason is that the flash translation layer in the SSD has to re-arrange the data and write it page by page. If there is less data to write, there will be fewer program operations. Given that write IOPS rating in an SSD is often much less than read IOPS, using compression to improve that will surely be of great value. At this point, this is educated guesswork. I''m going to see if I can get my hands on an SSD to prove this. Monish> what other uses cases can benefit from compression? > -- > Kjetil T. Homme > Redpill Linpro AS - Changing the game > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Kjetil Torgrim Homme
2009-Jun-17 13:14 UTC
[zfs-discuss] compression at zfs filesystem creation
"Monish Shah" <monish at indranetworks.com> writes:>> I''d be interested to see benchmarks on MySQL/PostgreSQL performance >> with compression enabled. my *guess* would be it isn''t beneficial >> since they usually do small reads and writes, and there is little >> gain in reading 4 KiB instead of 8 KiB. > > OK, now you have switched from compressibility of data to > performance advantage. As I said above, this kind of data usually > compresses pretty well.the thread has been about I/O performance since the first response, as far as I can tell.> I agree that for random reads, there wouldn''t be any gain from > compression. For random writes, in a copy-on-write file system, > there might be gains, because the blocks may be arranged in > sequential fashion anyway. We are in the process of doing some > performance tests to prove or disprove this. > > Now, if you are using SSDs for this type of workload, I''m pretty > sure that compression will help writes. The reason is that the > flash translation layer in the SSD has to re-arrange the data and > write it page by page. If there is less data to write, there will > be fewer program operations. > > Given that write IOPS rating in an SSD is often much less than read > IOPS, using compression to improve that will surely be of great > value.not necessarily, since a partial SSD write is much more expensive than a full block write (128 KiB?). in a write intensive application, that won''t be an issue since the data is flowing steadily, but for the right mix of random reads and writes, this may exacerbate the bottleneck.> At this point, this is educated guesswork. I''m going to see if I > can get my hands on an SSD to prove this.that''d be great! -- Kjetil T. Homme Redpill Linpro AS - Changing the game
On Wed, June 17, 2009 06:03, Kjetil Torgrim Homme wrote:> I''d be interested to see benchmarks on MySQL/PostgreSQL performance > with compression enabled. my *guess* would be it isn''t beneficial > since they usually do small reads and writes, and there is little gain > in reading 4 KiB instead of 8 KiB.MySQL works quite well supposedly: http://tinyurl.com/42sy3v http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/ SmugMug has several 7000-series appliances in production, and they sing its praises: http://www.youtube.com/watch?v=2WEx_XTjPvE#t=20m45s
On Wed, June 17, 2009 06:15, Fajar A. Nugraha wrote:>>> Perhaps compressing /usr could be handy, but why bother enabling >>> compression if the majority (by volume) of user data won''t do >>> anything but burn CPU? > > How do you define "substantial"? My opensolaris snv_111b installation > has 1.47x compression ratio for "/", with the default compression. > It''s well worthed for me.And how many GB is that? ~1.5x is quite good, but if you''re talking about a 7.5 GB install using "only" 3 GB of space, but your homedir is 50 GB, it''s not a lot in relative terms.
Bob Friesenhahn wrote:> On Mon, 15 Jun 2009, Bob Friesenhahn wrote: > >> On Mon, 15 Jun 2009, Rich Teer wrote: >>> >>> You actually have that backwards. :-) In most cases, compression >>> is very >>> desirable. Performance studies have shown that today''s CPUs can >>> compress >>> data faster than it takes for the uncompressed data to be read or >>> written. >> >> Do you have a reference for such an analysis based on ZFS? I would >> be interested in linear read/write performance rather than random >> access synchronous access. >> >> Perhaps you are going to make me test this for myself. > > Ok, I tested this for myself on a Solaris 10 system with 4 3GHz AMD64 > cores and see that we were both right. I did an iozone run with > compression and do see a performance improvement. I don''t know what > the data iozone produces looks like, but it clearly must be quite > compressable. Testing was done with a 64GB file: > > KB reclen write rewrite read reread > uncompressed: 67108864 128 359965 354854 550869 554271 > lzjb: 67108864 128 851336 924881 1289059 1362625 > > Unfortunately, during the benchmark run with lzjb the system desktop > was essentially unusable with misbehaving mouse and keyboard as well > as reported 55% CPU consumption. Without the compression the system > is fully usable with very little CPU consumed.If the system is dedicated to serving files rather than also being used interactively, it should not matter much what the CPU usage is. CPU cycles can''t be stored for later use. Ultimately, it (mostly*) does not matter if one option consumes more CPU resources than another if those CPU resources were otherwise going to go unused. Changes (increases) in latencies are a consideration but probably depend more on process scheduler choice and policies. *Higher CPU usage will increase energy consumption, heat output, and cooling costs...these may be important considerations in some specialized dedicated file server applications, depending on operational considerations. The interactivity hit may pose a greater challenge for any other processes/databases/virtual machines run on hardware that also serves files. The interactivity hit may also be evidence that the process scheduler is not fairly or effectively sharing CPU resources amongst the running processes. If scheduler tweaks aren''t effective, perhaps dedicating a processor core(s) to interactive GUI stuff and the other cores to filesystem duties would help smooth things out. Maybe zones be used for that?> With a slower disk subsystem the CPU overhead would surely be less > since writing is still throttled by the disk. > > It would be better to test with real data rather than iozone.There are 4 sets of articles with links and snippets from their test data below. Follow the links for the full discussion: First article: http://blogs.sun.com/dap/entry/zfs_compression#comments Hardware: Sun Storage 7000 # The server is a quad-core 7410 with 1 JBOD (configured with mirrored storage) and 16GB of RAM. No SSD. # The client machine is a quad-core 7410 with 128GB of DRAM. Summary: text data set *Compression* *Ratio* *Total* *Write* *Read* off 1.00x 3:30 2:08 1:22 lzjb 1.47x 3:26 2:04 1:22 gzip-2 2.35x 6:12 4:50 1:22 gzip 2.52x 11:18 9:56 1:22 gzip-9 2.52x 12:16 10:54 1:22 Summary: media data set *Compression* *Ratio* *Total* *Write* *Read* off 1.00x 3:29 2:07 1:22 lzjb 1.00x 3:31 2:09 1:22 gzip-2 1.01x 6:59 5:37 1:22 gzip 1.01x 7:18 5:57 1:21 gzip-9 1.01x 7:37 6:15 1:22 Second article/discussion: http://ekschi.com/technology/2009/04/28/zfs-compression-a-win-win/ http://blogs.sun.com/observatory/entry/zfs_compression_a_win_win Third article summary: ZFS and MySQL/InnoDB shows that gzip is often cpu-bound on current processors; lzjb improves performance. http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/ Hardware: <http://blogs.smugmug.com/don/2007/04/11/sun-honeymoon-update-servers/>SunFire X2200 M2 w/64GB of RAM and 2 x dual-core 2.6GHz Opterons Dell MD3000 w/15 x 15K SCSI disks and mirrored 512MB battery-backed write caches "Also note that this is writing to two DAS enclosures with 15 x 15K SCSI disks apiece (28 spindles in a striped+mirrored configuration) with 512MB of write cache apiece." TABLE1 *compression* *size* *ratio* *time* uncompressed 172M 1 0.207s lzjb 79M 2.18X 0.234s gzip-1 50M 3.44X 0.24s gzip-9 46M 3.73X 0.217s Notes on TABLE1: * This dataset seems to be small enough that much of time is probably spent in system internals, rather than actually reading, compressing, and writing data, so I view this as only an interesting size datapoint, rather than size and time. Feel free to correct me, though. :) TABLE2 *compression* *size* *ratio* *time* *ratio* uncompressed 631M 1 1.064s 1 lzjb 358M 1.76X 0.668 1.59X gzip-1 253M 2.49X 1.302 0.82X gzip-9 236M 3.73X 11.1s 0.10X Notes on TABLE2: * gzip-9 is massively slower on this particular hunk of data. I''m no expert on gzip, so I have no idea why this would be, but you can see the tradeoff is probably rarely worth it, even if were using precious storage commodities (say, flash or RAM rather than hard disks). I ran this one extra times just to make sure. Seems valid (or a bug). TABLE3 *compression* *size* *ratio* *time* *ratio* uncompressed 2675M 1 15.041s 1 lzjb 830M 3.22X 5.274 2.85X gzip-1 246M 10.87X 44.287 0.34X gzip-9 220M 12.16X 52.475 0.29X Notes on TABLE3: * LZJB really shines here, performance wise. It delivers roughly 3X faster performance while also chewing up roughly 3X less bytes. Awesome. * gzip''s compression ratios are crazy great on this hunk of data, but the performance is pretty awful. Definitely CPU-bound, not IO-bound. TABLE4 *compression* *size* *ratio* *time* *ratio* uncompressed 2828M 1 17.09s 1 lzjb 1814M 1.56X 14.495s 1.18X gzip-1 1384M 2.04X 48.895s 0.35X gzip-9 1355M 2.09X 54.672s 0.31X Notes on TABLE4: * Again, LZJB performs quite well. 1.5X bytes saved while remaining faster. Nice! * gzip is again very obviously CPU bound, rather than IO-bound. Dang. Fourth article: zfs-fuse on Linux: http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg08219.html Hardware: "this test was done an an ancient suse 9.1 box, fuse 2.5.3, 1,25 GB RAM (1 gb in use for other apps), 2x80gb 3ware raid1" linux kernel source, (239M) time tar xf linux-2.6.20.3.tar |compression |time-real |time-user |time-sys |compressratio -------------------------------------------------------------- |lzo |6m39.603s |0m1.288s |0m6.055s |2.99x |gzip |7m46.875s |0m1.275s |0m6.312s |3.41x |lzjb |7m7.600s |0m1.227s |0m6.033s |1.79x |off |7m26.735s |0m1.348s |0m6.230s |1.00x -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090617/4bddbea7/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: icon_smile.gif Type: image/gif Size: 174 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090617/4bddbea7/attachment.gif>
David Magda wrote:> On Tue, June 16, 2009 15:32, Kyle McDonald wrote: > > >> So the cache saves not only the time to access the disk but also the CPU >> time to decompress. Given this, I think it could be a big win. >> > > Unless you''re in GIMP working on JPEGs, or doing some kind of MPEG video > editing--or ripping audio (MP3 / AAC / FLAC) stuff. All of which are > probably some of the largest files in most people''s homedirs nowadays. > > 1 GB of e-mail is a lot (probably my entire personal mail collection for a > decade) and will compress well; 1 GB of audio files is nothing, and won''t > compress at all. > > Perhaps compressing /usr could be handy, but why bother enabling > compression if the majority (by volume) of user data won''t do anything but > burn CPU? > > So the correct answer on whether compression should be enabled by default > is "it depends". (IMHO :) ) >The performance tests I''ve found almost universally show LZJB as not being cpu-bound on recent equipment. A few years from now GZIP may get away from being cpu-bound. As performance tests on current hardware show that enabling LZJB improves overall performance it would make sense to enable it by default. In the future when GZIP is no longer cpu-bound, it might become the default (or there could be another algorithm). There is a long history of previously formidable tasks starting out as cpu-bound but quickly progressing to an ''easily handled in the background'' task. Decoding MP3 and MPEG1, MPEG2 (DVD resolutions), softmodems (and other host signal processor devices), and RAID are all tasks that can easily be handled by recent equipment. Another option/idea to consider is using LZJB as the default compression method, and then performing a background scrub-recompress during otherwise idle times. Technique ideas: 1.) A performance neutral/performance enhancing technique: use any algorithm that is not CPU bound on your hardware, and rarely if ever has worse performance than the uncompressed state 2.) Adaptive technique 1: rarely used blocks could be given the strongest compression (using an algorithm tuned for the data type detected), while frequently used blocks would be compressed at a performance neutral or performance improving levels. 3.) Adaptive technique 2: rarely used blocks could be given the strongest compression (using an algorithm tuned for the data type detected), while frequently used blocks would be compressed at a performance neutral or performance improving levels. As the storage device gets closer to its native capacity, start applying compression both proactively (to new data) and retroactively (to old data), progressively using more powerful compression techniques as the maximum native capacity is approached. Compression could delay the users from reaching the 80-95% capacity point where system performance curves often have their knees (a massive performance degradation with each additional unit). 4.) Maximize space technique: detect the data type and use the best available algorithm for the block. As a counterpoint, if drive capacities keep growing at their current pace it seems they ultimately risk obviating the need to give much thought to the compression algorithm, except to choose one that boosts system performance. (I.e. in hard drives, compression may primarily be used to improve performance rather than gain extra storage space, as drive capacity has grown many times faster than drive performance.) JPEGs often CAN be /losslessly/ compressed further by useful amounts (e.g. 25% space savings). There is more on this here: Tests: http://www.maximumcompression.com/data/jpg.php http://compression.ca/act/act-jpeg.html http://www.downloadsquad.com/2008/09/11/winzip-12-supports-lossless-jpg-compression/ http://download.cnet.com/8301-2007_4-10038172-12.html http://www.online-tech-tips.com/software-reviews/winzip-vs-7-zip-best-compression-method/ These have source code available: http://sylvana.net/jpeg-ari/ PAQ8R http://www.cs.fit.edu/~mmahoney/compression/ (general info http://en.wikipedia.org/wiki/PAQ ) This one says source code is "not yet available" (implying it may become available): http://www.elektronik.htw-aalen.de/packjpg/packjpg_m.htm -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090617/0b80bca1/attachment.html>
Bob Friesenhahn
2009-Jun-18 02:29 UTC
[zfs-discuss] compression at zfs filesystem creation
On Wed, 17 Jun 2009, Haudy Kazemi wrote:>> usable with very little CPU consumed. > If the system is dedicated to serving files rather than also being used > interactively, it should not matter much what the CPU usage is. CPU cycles > can''t be stored for later use. Ultimately, it (mostly*) does not matter ifClearly you have not heard of the software flywheel: http://www.simplesystems.org/users/bfriesen/software_flywheel.html If I understand the blog entry correctly, for text data the task took up to 3.5X longer to complete, and for media data, the task took about 2.2X longer to complete with a maximum storage compression ratio of 2.52X. For my backup drive using lzjb compression I see a compression ratio of only 1.53x. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Wed, 17 Jun 2009, Haudy Kazemi wrote: >>> usable with very little CPU consumed. >> If the system is dedicated to serving files rather than also being >> used interactively, it should not matter much what the CPU usage is. >> CPU cycles can''t be stored for later use. Ultimately, it (mostly*) >> does not matter if > > Clearly you have not heard of the software flywheel: > > http://www.simplesystems.org/users/bfriesen/software_flywheel.htmlI had not heard of such a device, however from the description it appears to be made from virtual unobtanium.... :) My line of reasoning is that unused CPU cycles are to some extent a wasted resource, paralleling the idea that having system RAM sitting empty/unused is also a waste and should be used for caching until the system needs that RAM for other purposes (how the ZFS cache is supposed to work). This isn''t a perfect parallel as CPU power consumption and heat outlet do vary by load much more than does RAM. I''m sure someone could come up with a formula for the optimal CPU loading to maximize energy efficiency. There has been work on this the paper ''Dynamic Data Compression in Multi-hop Wireless Networks'' at http://enl.usc.edu/~abhishek/sigmpf03-sharma.pdf .> If I understand the blog entry correctly, for text data the task took > up to 3.5X longer to complete, and for media data, the task took about > 2.2X longer to complete with a maximum storage compression ratio of > 2.52X. > > For my backup drive using lzjb compression I see a compression ratio > of only 1.53x.I linked to several blog posts. It sounds like you are referring to '' http://blogs.sun.com/dap/entry/zfs_compression#comments ''? This blog''s test results show that on their quad core platform (Sun 7410 have quad core 2.3 ghz AMD Opteron cpus*) : * http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/7410/spec for text data, LZJB compression had negligible performance benefits (task times were unchanged or marginally better) and less storage space was consumed (1.47:1). for media data, LZJB compression had negligible performance benefits (task times were unchanged or marginally worse) and storage space consumed was unchanged (1:1). Take away message: as currently configured, their system has nothing to lose from enabling LZJB. for text data, GZIP compression at any setting, had a significant negative impact on write times (CPU bound), no performance impact on read times, and significant positive improvements in compression ratio. for media data, GZIP compression at any setting, had a significant negative impact on write times (CPU bound), no performance impact on read times, and marginal improvements in compression ratio. Take away message: With GZIP as their system is currently configured, write performance would suffer in exchange for a higher compression ratio. This may be acceptable if the system fulfills a role that has a read heavy usage profile of compressible content. (An archive.org backend would be such an example.) This is similar to the tradeoff made when comparing RAID1 or RAID10 vs RAID5. Automatic benchmarks could be used to detect and select the optimal compression settings for best performance, with the basic case assuming the system is a dedicated file server and more advanced cases accounting for the CPU needs of other processes run on the same platform. Another way would be to ask the administrator what the usage profile for the machine will be and preconfigure compression settings suitable for that use case. Single and dual core systems are more likely to become CPU bound from enabling compression than a quad core. All systems have bottlenecks in them somewhere by virtue of design decisions. One or more of these bottlenecks will be the rate limiting factor for any given workload, such that even if you speed up the rest of the system the process will still take the same amount of time to complete. The LZJB compression benchmarks on the quad core above demonstrate that LZJB is not the rate limiter either in writes or reads. The GZIP benchmarks show that it is a rate limiter, but only during writes. On a more powerful platform (6x faster CPU), GZIP writes may no longer be the bottleneck (assuming that the network bandwidth and drive I/O bandwidth remain unchanged). System component balancing also plays a role. If the server is connected via a 100 Mbps CAT5e link, and all I/O activity is from client computers on that link, does it make any difference if the server is actually capable of GZIP writes at 200 Mbps, 500 Mbps, or 1500 Mbps? If the network link is later upgraded to Gigabit ethernet, now only the system capable of GZIPing at 1500 Mbps can keep up. The rate limiting factor changes as different components are upgraded. In many systems for many workloads, hard drive I/O bandwidth is the rate limiting factor that has the most significant performance impact, such that a 20% boost in drive I/O is more noticeable than a 20% boost in CPU performance (or even a doubling of CPU performance). Many systems are now becoming quite unbalanced in terms of I/O bandwidth vs CPU performance. Trading CPU cycles for I/O bandwidth is one way of compensating for the imbalance, if the task is not already CPU-bound. (A CPU-bound process has the CPU as the rate-limiting factor. A common characteristic of CPU-bound processes is they run the CPU at 100%, and would benefit from a faster processor. Non CPU-bound processes have a different rate-limiting factor which remains unchanged even if a faster CPU is used. An example of a non CPU-bound process is MP3 decoding for live playback. An example of balancing a system is to compare a recent netbook to a stock configuration Pentium 3 laptop from 2002. They both have CPUs of similar capability but the netbooks come with more RAM and some with flash memory rather than hard drives. The performance boost from extra RAM and flash memory storage helps compensate for what by 2009 standards are slow CPUs. As a result, the netbooks tend to have a better balance of CPU/RAM/permanent storage capacity and performance than the stock configuration Pentium 3 laptops (an upgraded ultraportable Pentium 3 laptop can match a netbook quite well).
Bob Friesenhahn
2009-Jun-18 15:18 UTC
[zfs-discuss] compression at zfs filesystem creation
On Thu, 18 Jun 2009, Haudy Kazemi wrote:> > for text data, LZJB compression had negligible performance benefits (task > times were unchanged or marginally better) and less storage space was > consumed (1.47:1). > for media data, LZJB compression had negligible performance benefits (task > times were unchanged or marginally worse) and storage space consumed was > unchanged (1:1). > Take away message: as currently configured, their system has nothing to lose > from enabling LZJB.My understanding is that these tests were done with NFS and one client over gigabit ethernet (a file server scenario). So in this case, the system is able to keep up with NFS over gigabit ethernet when LZJB is used. In a stand-alone power-user "desktop" scenario, the situtation may be quite different. In this case application CPU usage may be competing with storage CPU usage. Since ZFS often defers writes, it may be that the compression is performed at the same time as application compute cycles. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bill Sommerfeld
2009-Jun-18 17:30 UTC
[zfs-discuss] compression at zfs filesystem creation
On Wed, 2009-06-17 at 12:35 +0200, Casper.Dik at Sun.COM wrote:> I still use "disk swap" because I have some bad experiences > with ZFS swap. (ZFS appears to cache and that is very wrong)I''m experimenting with running zfs swap with the primarycache attribute set to "metadata" instead of the default "all". aka: zfs set primarycache=metadata rpool/swap seems like that would be more likely to behave appropriately. - Bill
Darren J Moffat
2009-Jun-19 14:43 UTC
[zfs-discuss] compression at zfs filesystem creation
Bill Sommerfeld wrote:> On Wed, 2009-06-17 at 12:35 +0200, Casper.Dik at Sun.COM wrote: >> I still use "disk swap" because I have some bad experiences >> with ZFS swap. (ZFS appears to cache and that is very wrong) > > I''m experimenting with running zfs swap with the primarycache attribute > set to "metadata" instead of the default "all". > > aka: > > zfs set primarycache=metadata rpool/swap > > seems like that would be more likely to behave appropriately.Agreed, and for the "just incase" scenario secondarycache=none - but then again using an SSD as swap could be interesting.... -- Darren J Moffat