Jason Usher
2012-Sep-20 23:34 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
Hi, I have a ZFS filesystem with compression turned on. Does the "used" property show me the actual data size, or the compressed data size ? If it shows me the compressed size, where can I see the actual data size ? I also wonder about checking status of dedupe - I created my pool without dedupe, and continue to NOT enable dedupe - from zpool history, we see: zpool create -f -O atime=off -O setuid=off -O exec=off -m /mnt/pool pool raidz3 da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 da10 da11 Later, I enabled dedup for just a single filesystem on this pool: zfs set dedup=on pool/dataset and now, I see in ''zpool list'' a value for dedupratio: pool dedupratio 1.65x - Why do I see a value here ? Isn''t dedupe still OFF for the pool as a whole ? I do NOT want to enable dedupe for the entire pool. Also, why do I not see any dedupe stats for the individual filesystem ? I see compressratio, and I see dedup=on, but I don''t see any dedupratio for the filesystem itself... Did turning on dedupe for a single filesystem turn it on for the entire pool ?
Sašo Kiselkov
2012-Sep-21 08:20 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
On 09/21/2012 01:34 AM, Jason Usher wrote:> Hi, > > I have a ZFS filesystem with compression turned on. Does the "used" property show me the actual data size, or the compressed data size ? If it shows me the compressed size, where can I see the actual data size ?It shows the allocated number of bytes used by the filesystem, i.e. after compression. To get the uncompressed size, multiply "used" by "compressratio" (so for example if used=65G and compressratio=2.00x, then your decompressed size is 2.00 x 65G = 130G).> I also wonder about checking status of dedupe - I created my pool without dedupe, and continue to NOT enable dedupe - from zpool history, we see: > > zpool create -f -O atime=off -O setuid=off -O exec=off -m /mnt/pool pool raidz3 da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 da10 da11 > > Later, I enabled dedup for just a single filesystem on this pool: > > zfs set dedup=on pool/dataset > > and now, I see in ''zpool list'' a value for dedupratio: > > pool dedupratio 1.65x - > > > Why do I see a value here ? Isn''t dedupe still OFF for the pool as a whole ? I do NOT want to enable dedupe for the entire pool.That''s because dedup operates at the block level, not the filesystem object level, i.e. it kicks into effect once the data passes through the filesystem layers and gets subdivided into disk blocks. The point is that de-duplication (in a sense) allows you to de-duplicate blocks across multiple filesystems. Take for instance the following example: NAME DEDUP --------- ----- /tank/fsA on /tank/fsB off /tank/fsC on /tank/fsD off /tank/fsE off Here ZFS will try to deduplicate the blocks in fsA not only in regards to other blocks in fsA, but also in regards to fsC.> Also, why do I not see any dedupe stats for the individual filesystem ? I see compressratio, and I see dedup=on, but I don''t see any dedupratio for the filesystem itself...Because, as explained above, once deduplication for a particular block is requested (that''s controlled by the dedup setting in the particular filesystem where the block originated), the dedup mechanism will try to look for matching blocks across all blocks in all filesystems on the given pool that have dedup enabled, not only in the originating filesystem. This is to improve efficiency.> Did turning on dedupe for a single filesystem turn it on for the entire pool ?In a sense, yes. The dedup machinery is pool-wide, but only writes from filesystems which have dedup enabled enter it. The rest simply pass it by and work as usual. Cheers, -- Saso
Jason Usher
2012-Sep-21 20:06 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
--- On Fri, 9/21/12, Sa?o Kiselkov <skiselkov.ml at gmail.com> wrote:> > I have a ZFS filesystem with compression turned > on.? Does the "used" property show me the actual data > size, or the compressed data size ?? If it shows me the > compressed size, where can I see the actual data size ? > > It shows the allocated number of bytes used by the > filesystem, i.e. > after compression. To get the uncompressed size, multiply > "used" by > "compressratio" (so for example if used=65G and > compressratio=2.00x, > then your decompressed size is 2.00 x 65G = 130G).Ok, thank you. The problem with this is, the compressratio only goes to two significant digits, which means if I do the math, I''m only getting an approximation. Since we may use these numbers to compute billing, it is important to get it right. Is there any way at all to get the real *exact* number ?> > Later, I enabled dedup for just a single filesystem on > this pool: > > > > zfs set dedup=on pool/dataset > > > > and now, I see in ''zpool list'' a value for dedupratio: > > > > pool? dedupratio? > ???1.65x? ? ???- > > > > > > Why do I see a value here ?? Isn''t dedupe still > OFF for the pool as a whole ?? I do NOT want to enable > dedupe for the entire pool. > > That''s because dedup operates at the block level, not the > filesystem > object level, i.e. it kicks into effect once the data passes > through the > filesystem layers and gets subdivided into disk blocks. The > point is > that de-duplication (in a sense) allows you to de-duplicate > blocks > across multiple filesystems. Take for instance the following > example: > > NAME??? ??? DEDUP > ---------??? ----- > /tank/fsA??? on > /tank/fsB??? off > /tank/fsC??? on > /tank/fsD??? off > /tank/fsE??? off > > Here ZFS will try to deduplicate the blocks in fsA not only > in regards > to other blocks in fsA, but also in regards to fsC.Ok. So the dedupratio I see for the entire pool is "dedupe ratio for filesystems in this pool that have dedupe enabled" ... yes ?> > Also, why do I not see any dedupe stats for the > individual filesystem ?? I see compressratio, and I see > dedup=on, but I don''t see any dedupratio for the filesystem > itself...Ok, getting back to precise accounting ... if I turn on dedupe for a particular filesystem, and then I multiply the "used" property by the compressratio property, and calculate the real usage, do I need to do another calculation to account for the deduplication ? Or does the "used" property not take into account deduping ?> > Did turning on dedupe for a single filesystem turn it > on for the entire pool ? > > In a sense, yes. The dedup machinery is pool-wide, but only > writes from > filesystems which have dedup enabled enter it. The rest > simply pass it > by and work as usual.Ok - but from a performance point of view, I am only using ram/cpu resources for the deduping of just the individual filesystems I enabled dedupe on, right ? I hope that turning on dedupe for just one filesystem did not incur ram/cpu costs across the entire pool... Thanks.
Jason Usher
2012-Sep-24 17:08 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
Oh, and one other thing ... --- On Fri, 9/21/12, Jason Usher <jusher71 at yahoo.com> wrote:> > It shows the allocated number of bytes used by the > > filesystem, i.e. > > after compression. To get the uncompressed size, > multiply > > "used" by > > "compressratio" (so for example if used=65G and > > compressratio=2.00x, > > then your decompressed size is 2.00 x 65G = 130G). > > > Ok, thank you.? The problem with this is, the > compressratio only goes to two significant digits, which > means if I do the math, I''m only getting an > approximation.? Since we may use these numbers to > compute billing, it is important to get it right. > > Is there any way at all to get the real *exact* number ?I''m hoping the answer is yes - I''ve been looking but do not see it ...> Ok.? So the dedupratio I see for the entire pool is > "dedupe ratio for filesystems in this pool that have dedupe > enabled" ... yes ? > > > > > Also, why do I not see any dedupe stats for the > > individual filesystem ?? I see compressratio, and I > see > > dedup=on, but I don''t see any dedupratio for the > filesystem > > itself... > > > Ok, getting back to precise accounting ... if I turn on > dedupe for a particular filesystem, and then I multiply the > "used" property by the compressratio property, and calculate > the real usage, do I need to do another calculation to > account for the deduplication ?? Or does the "used" > property not take into account deduping ?So if the answer to this is "yes, the used property is not only a compressed figure, but a deduped figure" then I think we have a bigger problem ... You described dedupe as operating not only within the filesystem with dedup=on, but between all filesystems with dedupe enabled. Doesn''t that mean that if I enabled dedupe on more than one filesystem, I can never know how much total, raw space each of those is using ? Because if the dedupe ratio is calculated across all of them, it''s not the actual ratio for any one of them ... so even if I do the math, I can''t decide what the total raw usage for one of them is ... right ? Again, if "used" does not reflect dedupe, and I don''t need to do any math to get the "raw" storage figure, then it doesn''t matter...> > > Did turning on dedupe for a single filesystem turn > it > > on for the entire pool ? > > > > In a sense, yes. The dedup machinery is pool-wide, but > only > > writes from > > filesystems which have dedup enabled enter it. The > rest > > simply pass it > > by and work as usual. > > > Ok - but from a performance point of view, I am only using > ram/cpu resources for the deduping of just the individual > filesystems I enabled dedupe on, right ?? I hope that > turning on dedupe for just one filesystem did not incur > ram/cpu costs across the entire pool...I also wonder about this performance question... Thanks.
Richard Elling
2012-Sep-24 22:09 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
On Sep 24, 2012, at 10:08 AM, Jason Usher <jusher71 at yahoo.com> wrote:> Oh, and one other thing ... > > > --- On Fri, 9/21/12, Jason Usher <jusher71 at yahoo.com> wrote: > >>> It shows the allocated number of bytes used by the >>> filesystem, i.e. >>> after compression. To get the uncompressed size, >> multiply >>> "used" by >>> "compressratio" (so for example if used=65G and >>> compressratio=2.00x, >>> then your decompressed size is 2.00 x 65G = 130G). >> >> >> Ok, thank you. The problem with this is, the >> compressratio only goes to two significant digits, which >> means if I do the math, I''m only getting an >> approximation. Since we may use these numbers to >> compute billing, it is important to get it right. >> >> Is there any way at all to get the real *exact* number ? > > > I''m hoping the answer is yes - I''ve been looking but do not see it ...none can hide from dtrace! # dtrace -qn ''dsl_dataset_stats:entry {this->ds = (dsl_dataset_t *)arg0;printf("%s\tcompressed size = %d\tuncompressed size=%d\n", this->ds->ds_dir->dd_myname, this->ds->ds_phys->ds_compressed_bytes, this->ds->ds_phys->ds_uncompressed_bytes)}'' openindiana-1 compressed size = 3667988992 uncompressed size=3759321088 [zfs get all rpool/openindiana-1 in another shell] For reporting, the number is rounded to 2 decimal places.>> Ok. So the dedupratio I see for the entire pool is >> "dedupe ratio for filesystems in this pool that have dedupe >> enabled" ... yes ? >> >> >>>> Also, why do I not see any dedupe stats for the >>> individual filesystem ? I see compressratio, and I >> see >>> dedup=on, but I don''t see any dedupratio for the >> filesystem >>> itself... >> >> >> Ok, getting back to precise accounting ... if I turn on >> dedupe for a particular filesystem, and then I multiply the >> "used" property by the compressratio property, and calculate >> the real usage, do I need to do another calculation to >> account for the deduplication ? Or does the "used" >> property not take into account deduping ? > > > So if the answer to this is "yes, the used property is not only a compressed figure, but a deduped figure" then I think we have a bigger problem ... > > You described dedupe as operating not only within the filesystem with dedup=on, but between all filesystems with dedupe enabled. > > Doesn''t that mean that if I enabled dedupe on more than one filesystem, I can never know how much total, raw space each of those is using ? Because if the dedupe ratio is calculated across all of them, it''s not the actual ratio for any one of them ... so even if I do the math, I can''t decide what the total raw usage for one of them is ... right ?Correct. This is by design so that blocks shared amongst different datasets can be deduped -- the common case for things like virtual machine images.> > Again, if "used" does not reflect dedupe, and I don''t need to do any math to get the "raw" storage figure, then it doesn''t matter... > > > >>>> Did turning on dedupe for a single filesystem turn >> it >>> on for the entire pool ? >>> >>> In a sense, yes. The dedup machinery is pool-wide, but >> only >>> writes from >>> filesystems which have dedup enabled enter it. The >> rest >>> simply pass it >>> by and work as usual. >> >> >> Ok - but from a performance point of view, I am only using >> ram/cpu resources for the deduping of just the individual >> filesystems I enabled dedupe on, right ? I hope that >> turning on dedupe for just one filesystem did not incur >> ram/cpu costs across the entire pool... > > > I also wonder about this performance question...It depends. -- richard -- illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco www.zfsday.com Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120924/594d2063/attachment-0001.html>
Jason Usher
2012-Sep-25 18:17 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
--- On Mon, 9/24/12, Richard Elling <richard.elling at gmail.com> wrote: I''m hoping the answer is yes - I''ve been looking but do not see it ... none can hide from dtrace!# dtrace -qn ''dsl_dataset_stats:entry {this->ds = (dsl_dataset_t *)arg0;printf("%s\tcompressed size = %d\tuncompressed size=%d\n", this->ds->ds_dir->dd_myname, this->ds->ds_phys->ds_compressed_bytes, this->ds->ds_phys->ds_uncompressed_bytes)}''openindiana-1 ? compressed size = 3667988992 ? ?uncompressed size=3759321088 [zfs get all rpool/openindiana-1 in another shell] For reporting, the number is rounded to 2 decimal places. Ok.? So the dedupratio I see for the entire pool is "dedupe ratio for filesystems in this pool that have dedupe enabled" ... yes ? Thank you - appreciated. Doesn''t that mean that if I enabled dedupe on more than one filesystem, I can never know how much total, raw space each of those is using ? ?Because if the dedupe ratio is calculated across all of them, it''s not the actual ratio for any one of them ... so even if I do the math, I can''t decide what the total raw usage for one of them is ... right ? Correct. This is by design so that blocks shared amongst different datasets canbe deduped -- the common case for things like virtual machine images. Ok, but what about accounting ? If you have multiple deduped filesystems in a pool, you can *never know* how much space any single one of them is using ? That seems unbelievable... Ok - but from a performance point of view, I am only using ram/cpu resources for the deduping of just the individual filesystems I enabled dedupe on, right ?? I hope that turning on dedupe for just one filesystem did not incur ram/cpu costs across the entire pool... It depends.?-- richard Can you elaborate at all ? Dedupe can have fairly profound performance implications, and I''d like to know if I am paying a huge price just to get a dedupe on one little filesystem ... Thanks again.
Volker A. Brandt
2012-Sep-25 18:43 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
> I''m hoping the answer is yes - I''ve been looking but do not see it > ...Well, he is telling you to run the dtrace program as root in one window, and run the "zfs get all" command on a dataset in your pool in another window, to trigger the dataset_stats variable to be filled.> none can hide from dtrace!# dtrace -qn ''dsl_dataset_stats:entry > {this->ds = (dsl_dataset_t *)arg0;printf("%s\tcompressed size > %d\tuncompressed size=%d\n", this->ds->ds_dir->dd_myname, > this->ds->ds_phys->ds_compressed_bytes, > this->ds->ds_phys->ds_uncompressed_bytes)}''openindiana-1 ? > compressed size = 3667988992 ? ?uncompressed size=3759321088 > [zfs get all rpool/openindiana-1 in another shell]HTH -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead"
Jason Usher
2012-Sep-25 19:36 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
--- On Tue, 9/25/12, Volker A. Brandt <vab at bb-c.de> wrote:> Well, he is telling you to run the dtrace program as root in > one > window, and run the "zfs get all" command on a dataset in > your pool > in another window, to trigger the dataset_stats variable to > be filled. > > > none can hide from dtrace!# dtrace -qn > ''dsl_dataset_stats:entry > > {this->ds = (dsl_dataset_t > *)arg0;printf("%s\tcompressed size > > %d\tuncompressed size=%d\n", > this->ds->ds_dir->dd_myname, > > this->ds->ds_phys->ds_compressed_bytes, > > > this->ds->ds_phys->ds_uncompressed_bytes)}''openindiana-1 > ? > > compressed size = 3667988992 ? ?uncompressed > size=3759321088 > > [zfs get all rpool/openindiana-1 in another shell]Yes, he showed me that, I did it, it worked, and I thanked him. The reason it''s hard to make the thread out in that last response is that his email is in rich text, or HTML of some kind, so there''s no >> formatting, etc.
Richard Elling
2012-Sep-25 20:27 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
On Sep 25, 2012, at 11:17 AM, Jason Usher <jusher71 at yahoo.com> wrote:> > Ok - but from a performance point of view, I am only using > ram/cpu resources for the deduping of just the individual > filesystems I enabled dedupe on, right ? I hope that > turning on dedupe for just one filesystem did not incur > ram/cpu costs across the entire pool... > > It depends. -- richard > > > > > Can you elaborate at all ? Dedupe can have fairly profound performance implications, and I''d like to know if I am paying a huge price just to get a dedupe on one little filesystem ...The short answer is: "deduplication transforms big I/Os into small I/Os, but does not eliminate I/O." The reason is that the deduplication table has to be updated when you write something that is deduplicated. This implies that storage devices which are inexpensive in $/GB but expensive in $/IOPS might not be the best candidates for deduplication (eg. HDDs). There is some additional CPU overhead for the sha-256 hash that might or might not be noticeable, depending on your CPU. But perhaps the most important factor is your data -- is it dedupable and are the space savings worthwhile? There is no simple answer for that, but we generally recommend that you simulate dedup before committing to it. -- richard -- illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco www.zfsday.com Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120925/a4f30c94/attachment-0001.html>
Jim Klimov
2012-Sep-25 20:46 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
2012-09-24 21:08, Jason Usher wrote:>> Ok, thank you. The problem with this is, the >> compressratio only goes to two significant digits, which >> means if I do the math, I''m only getting an >> approximation. Since we may use these numbers to >> compute billing, it is important to get it right. >> >> Is there any way at all to get the real *exact* number ?Well, if you take into account snapshots and clones, you can see really small "used" numbers on datasets which reference a lot of data. In fact, for accounting you might be better off with the "referenced" field instead of "used", but note that it is not "recursive" and you need to account each child dataset''s byte references separately. I am not sure if there is a simple way to get exact byte-counts instead of roundings like "422M"... HTH, //Jim
Richard Elling
2012-Sep-25 22:52 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
On Sep 25, 2012, at 1:46 PM, Jim Klimov <jimklimov at cos.ru> wrote:> 2012-09-24 21:08, Jason Usher wrote: >>> Ok, thank you. The problem with this is, the >>> compressratio only goes to two significant digits, which >>> means if I do the math, I''m only getting an >>> approximation. Since we may use these numbers to >>> compute billing, it is important to get it right. >>> >>> Is there any way at all to get the real *exact* number ? > > Well, if you take into account snapshots and clones, > you can see really small "used" numbers on datasets > which reference a lot of data. > > In fact, for accounting you might be better off with > the "referenced" field instead of "used", but note > that it is not "recursive" and you need to account > each child dataset''s byte references separately. > > I am not sure if there is a simple way to get exact > byte-counts instead of roundings like "422M"...zfs get -p -- richard -- illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco www.zfsday.com Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120925/50466cdf/attachment-0001.html>
Jim Klimov
2012-Sep-26 00:28 UTC
[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.
2012-09-26 2:52, Richard Elling wrote:>> I am not sure if there is a simple way to get exact >> byte-counts instead of roundings like "422M"... > > zfs get -p > -- richardThanks to all who corrected me, never too old to learn ;) # zfs get referenced rpool/export/home NAME PROPERTY VALUE SOURCE rpool/export/home referenced 5.41M - # zfs get -p referenced rpool/export/home NAME PROPERTY VALUE SOURCE rpool/export/home referenced 5677056 - Thanks, //Jim