thr3ads.net - zfs discuss - [zfs-discuss] ZFS stats output - used, compressed, deduped, etc. [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Jason Usher

2012-Sep-20 23:34 UTC

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

Hi,

I have a ZFS filesystem with compression turned on.  Does the "used"
property show me the actual data size, or the compressed data size ?  If it
shows me the compressed size, where can I see the actual data size ?

I also wonder about checking status of dedupe - I created my pool without
dedupe, and continue to NOT enable dedupe - from zpool history, we see:

zpool create -f -O atime=off -O setuid=off -O exec=off -m /mnt/pool pool raidz3
da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 da10 da11

Later, I enabled dedup for just a single filesystem on this pool:

zfs set dedup=on pool/dataset

and now, I see in ''zpool list'' a value for dedupratio:

pool  dedupratio     1.65x       -


Why do I see a value here ?  Isn''t dedupe still OFF for the pool as a
whole ?  I do NOT want to enable dedupe for the entire pool.

Also, why do I not see any dedupe stats for the individual filesystem ?  I see
compressratio, and I see dedup=on, but I don''t see any dedupratio for
the filesystem itself...

Did turning on dedupe for a single filesystem turn it on for the entire pool ?

Sašo Kiselkov

2012-Sep-21 08:20 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

On 09/21/2012 01:34 AM, Jason Usher wrote:> Hi,
> 
> I have a ZFS filesystem with compression turned on.  Does the
"used" property show me the actual data size, or the compressed data
size ?  If it shows me the compressed size, where can I see the actual data size
?
It shows the allocated number of bytes used by the filesystem, i.e.
after compression. To get the uncompressed size, multiply "used" by
"compressratio" (so for example if used=65G and compressratio=2.00x,
then your decompressed size is 2.00 x 65G = 130G).
> I also wonder about checking status of dedupe - I created my pool without
dedupe, and continue to NOT enable dedupe - from zpool history, we see:
> 
> zpool create -f -O atime=off -O setuid=off -O exec=off -m /mnt/pool pool
raidz3 da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 da10 da11
> 
> Later, I enabled dedup for just a single filesystem on this pool:
> 
> zfs set dedup=on pool/dataset
> 
> and now, I see in ''zpool list'' a value for dedupratio:
> 
> pool  dedupratio     1.65x       -
> 
> 
> Why do I see a value here ?  Isn''t dedupe still OFF for the pool
as a whole ?  I do NOT want to enable dedupe for the entire pool.
That''s because dedup operates at the block level, not the filesystem
object level, i.e. it kicks into effect once the data passes through the
filesystem layers and gets subdivided into disk blocks. The point is
that de-duplication (in a sense) allows you to de-duplicate blocks
across multiple filesystems. Take for instance the following example:

NAME		DEDUP
---------	-----
/tank/fsA	on
/tank/fsB	off
/tank/fsC	on
/tank/fsD	off
/tank/fsE	off

Here ZFS will try to deduplicate the blocks in fsA not only in regards
to other blocks in fsA, but also in regards to fsC.
> Also, why do I not see any dedupe stats for the individual filesystem ?  I
see compressratio, and I see dedup=on, but I don''t see any dedupratio
for the filesystem itself...
Because, as explained above, once deduplication for a particular block
is requested (that''s controlled by the dedup setting in the particular
filesystem where the block originated), the dedup mechanism will try to
look for matching blocks across all blocks in all filesystems on the
given pool that have dedup enabled, not only in the originating
filesystem. This is to improve efficiency.
> Did turning on dedupe for a single filesystem turn it on for the entire
pool ?
In a sense, yes. The dedup machinery is pool-wide, but only writes from
filesystems which have dedup enabled enter it. The rest simply pass it
by and work as usual.

Cheers,
--
Saso

Jason Usher

2012-Sep-21 20:06 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

--- On Fri, 9/21/12, Sa?o Kiselkov <skiselkov.ml at gmail.com> wrote:
> > I have a ZFS filesystem with compression turned
> on.? Does the "used" property show me the actual data
> size, or the compressed data size ?? If it shows me the
> compressed size, where can I see the actual data size ?
> 
> It shows the allocated number of bytes used by the
> filesystem, i.e.
> after compression. To get the uncompressed size, multiply
> "used" by
> "compressratio" (so for example if used=65G and
> compressratio=2.00x,
> then your decompressed size is 2.00 x 65G = 130G).

Ok, thank you.  The problem with this is, the compressratio only goes to two
significant digits, which means if I do the math, I''m only getting an
approximation.  Since we may use these numbers to compute billing, it is
important to get it right.

Is there any way at all to get the real *exact* number ?

> > Later, I enabled dedup for just a single filesystem on
> this pool:
> > 
> > zfs set dedup=on pool/dataset
> > 
> > and now, I see in ''zpool list'' a value for
dedupratio:
> > 
> > pool? dedupratio?
> ???1.65x? ? ???-
> > 
> > 
> > Why do I see a value here ?? Isn''t dedupe still
> OFF for the pool as a whole ?? I do NOT want to enable
> dedupe for the entire pool.
> 
> That''s because dedup operates at the block level, not the
> filesystem
> object level, i.e. it kicks into effect once the data passes
> through the
> filesystem layers and gets subdivided into disk blocks. The
> point is
> that de-duplication (in a sense) allows you to de-duplicate
> blocks
> across multiple filesystems. Take for instance the following
> example:
> 
> NAME??? ??? DEDUP
> ---------??? -----
> /tank/fsA??? on
> /tank/fsB??? off
> /tank/fsC??? on
> /tank/fsD??? off
> /tank/fsE??? off
> 
> Here ZFS will try to deduplicate the blocks in fsA not only
> in regards
> to other blocks in fsA, but also in regards to fsC.

Ok.  So the dedupratio I see for the entire pool is "dedupe ratio for
filesystems in this pool that have dedupe enabled" ... yes ?

> > Also, why do I not see any dedupe stats for the
> individual filesystem ?? I see compressratio, and I see
> dedup=on, but I don''t see any dedupratio for the filesystem
> itself...

Ok, getting back to precise accounting ... if I turn on dedupe for a particular
filesystem, and then I multiply the "used" property by the
compressratio property, and calculate the real usage, do I need to do another
calculation to account for the deduplication ?  Or does the "used"
property not take into account deduping ?

> > Did turning on dedupe for a single filesystem turn it
> on for the entire pool ?
> 
> In a sense, yes. The dedup machinery is pool-wide, but only
> writes from
> filesystems which have dedup enabled enter it. The rest
> simply pass it
> by and work as usual.

Ok - but from a performance point of view, I am only using ram/cpu resources for
the deduping of just the individual filesystems I enabled dedupe on, right ?  I
hope that turning on dedupe for just one filesystem did not incur ram/cpu costs
across the entire pool...

Thanks.

Jason Usher

2012-Sep-24 17:08 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

Oh, and one other thing ...

--- On Fri, 9/21/12, Jason Usher <jusher71 at yahoo.com> wrote:
> > It shows the allocated number of bytes used by the
> > filesystem, i.e.
> > after compression. To get the uncompressed size,
> multiply
> > "used" by
> > "compressratio" (so for example if used=65G and
> > compressratio=2.00x,
> > then your decompressed size is 2.00 x 65G = 130G).
> 
> 
> Ok, thank you.? The problem with this is, the
> compressratio only goes to two significant digits, which
> means if I do the math, I''m only getting an
> approximation.? Since we may use these numbers to
> compute billing, it is important to get it right.
> 
> Is there any way at all to get the real *exact* number ?

I''m hoping the answer is yes - I''ve been looking but do not
see it ...

> Ok.? So the dedupratio I see for the entire pool is
> "dedupe ratio for filesystems in this pool that have dedupe
> enabled" ... yes ?
> 
> 
> > > Also, why do I not see any dedupe stats for the
> > individual filesystem ?? I see compressratio, and I
> see
> > dedup=on, but I don''t see any dedupratio for the
> filesystem
> > itself...
> 
> 
> Ok, getting back to precise accounting ... if I turn on
> dedupe for a particular filesystem, and then I multiply the
> "used" property by the compressratio property, and calculate
> the real usage, do I need to do another calculation to
> account for the deduplication ?? Or does the "used"
> property not take into account deduping ?

So if the answer to this is "yes, the used property is not only a
compressed figure, but a deduped figure" then I think we have a bigger
problem ...

You described dedupe as operating not only within the filesystem with dedup=on,
but between all filesystems with dedupe enabled.

Doesn''t that mean that if I enabled dedupe on more than one filesystem,
I can never know how much total, raw space each of those is using ?  Because if
the dedupe ratio is calculated across all of them, it''s not the actual
ratio for any one of them ... so even if I do the math, I can''t decide
what the total raw usage for one of them is ... right ?

Again, if "used" does not reflect dedupe, and I don''t need to
do any math to get the "raw" storage figure, then it doesn''t
matter...

> > > Did turning on dedupe for a single filesystem turn
> it
> > on for the entire pool ?
> > 
> > In a sense, yes. The dedup machinery is pool-wide, but
> only
> > writes from
> > filesystems which have dedup enabled enter it. The
> rest
> > simply pass it
> > by and work as usual.
> 
> 
> Ok - but from a performance point of view, I am only using
> ram/cpu resources for the deduping of just the individual
> filesystems I enabled dedupe on, right ?? I hope that
> turning on dedupe for just one filesystem did not incur
> ram/cpu costs across the entire pool...

I also wonder about this performance question...

Thanks.

Richard Elling

2012-Sep-24 22:09 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

On Sep 24, 2012, at 10:08 AM, Jason Usher <jusher71 at yahoo.com> wrote:
> Oh, and one other thing ...
> 
> 
> --- On Fri, 9/21/12, Jason Usher <jusher71 at yahoo.com> wrote:
> 
>>> It shows the allocated number of bytes used by the
>>> filesystem, i.e.
>>> after compression. To get the uncompressed size,
>> multiply
>>> "used" by
>>> "compressratio" (so for example if used=65G and
>>> compressratio=2.00x,
>>> then your decompressed size is 2.00 x 65G = 130G).
>> 
>> 
>> Ok, thank you.  The problem with this is, the
>> compressratio only goes to two significant digits, which
>> means if I do the math, I''m only getting an
>> approximation.  Since we may use these numbers to
>> compute billing, it is important to get it right.
>> 
>> Is there any way at all to get the real *exact* number ?
> 
> 
> I''m hoping the answer is yes - I''ve been looking but do
not see it ...
none can hide from dtrace!
# dtrace -qn ''dsl_dataset_stats:entry {this->ds = (dsl_dataset_t
*)arg0;printf("%s\tcompressed size = %d\tuncompressed size=%d\n",
this->ds->ds_dir->dd_myname,
this->ds->ds_phys->ds_compressed_bytes,
this->ds->ds_phys->ds_uncompressed_bytes)}''
openindiana-1   compressed size = 3667988992    uncompressed size=3759321088

[zfs get all rpool/openindiana-1 in another shell]

For reporting, the number is rounded to 2 decimal places.
>> Ok.  So the dedupratio I see for the entire pool is
>> "dedupe ratio for filesystems in this pool that have dedupe
>> enabled" ... yes ?
>> 
>> 
>>>> Also, why do I not see any dedupe stats for the
>>> individual filesystem ?  I see compressratio, and I
>> see
>>> dedup=on, but I don''t see any dedupratio for the
>> filesystem
>>> itself...
>> 
>> 
>> Ok, getting back to precise accounting ... if I turn on
>> dedupe for a particular filesystem, and then I multiply the
>> "used" property by the compressratio property, and calculate
>> the real usage, do I need to do another calculation to
>> account for the deduplication ?  Or does the "used"
>> property not take into account deduping ?
> 
> 
> So if the answer to this is "yes, the used property is not only a
compressed figure, but a deduped figure" then I think we have a bigger
problem ...
> 
> You described dedupe as operating not only within the filesystem with
dedup=on, but between all filesystems with dedupe enabled.
> 
> Doesn''t that mean that if I enabled dedupe on more than one
filesystem, I can never know how much total, raw space each of those is using ? 
Because if the dedupe ratio is calculated across all of them, it''s not
the actual ratio for any one of them ... so even if I do the math, I
can''t decide what the total raw usage for one of them is ... right ?
Correct. This is by design so that blocks shared amongst different datasets can
be deduped -- the common case for things like virtual machine images.
> 
> Again, if "used" does not reflect dedupe, and I don''t
need to do any math to get the "raw" storage figure, then it
doesn''t matter...
> 
> 
> 
>>>> Did turning on dedupe for a single filesystem turn
>> it
>>> on for the entire pool ?
>>> 
>>> In a sense, yes. The dedup machinery is pool-wide, but
>> only
>>> writes from
>>> filesystems which have dedup enabled enter it. The
>> rest
>>> simply pass it
>>> by and work as usual.
>> 
>> 
>> Ok - but from a performance point of view, I am only using
>> ram/cpu resources for the deduping of just the individual
>> filesystems I enabled dedupe on, right ?  I hope that
>> turning on dedupe for just one filesystem did not incur
>> ram/cpu costs across the entire pool...
> 
> 
> I also wonder about this performance question...
It depends.
 -- richard

--
illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco 
www.zfsday.com
Richard.Elling at RichardElling.com
+1-760-896-4422








-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120924/594d2063/attachment-0001.html>

Jason Usher

2012-Sep-25 18:17 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

--- On Mon, 9/24/12, Richard Elling <richard.elling at gmail.com> wrote:

I''m hoping the answer is yes - I''ve been looking but do not
see it ...

none can hide from dtrace!# dtrace -qn ''dsl_dataset_stats:entry
{this->ds = (dsl_dataset_t *)arg0;printf("%s\tcompressed size =
%d\tuncompressed size=%d\n", this->ds->ds_dir->dd_myname,
this->ds->ds_phys->ds_compressed_bytes,
this->ds->ds_phys->ds_uncompressed_bytes)}''openindiana-1 ?
compressed size = 3667988992 ? ?uncompressed size=3759321088
[zfs get all rpool/openindiana-1 in another shell]
For reporting, the number is rounded to 2 decimal places.
Ok.? So the dedupratio I see for the entire pool is
"dedupe ratio for filesystems in this pool that have dedupe
enabled" ... yes ?




Thank you - appreciated.



Doesn''t that mean that if I enabled dedupe on more than one filesystem,
I can never know how much total, raw space each of those is using ? ?Because if
the dedupe ratio is calculated across all of them, it''s not the actual
ratio for any one of them ... so even if I do the math, I can''t decide
what the total raw usage for one of them is ... right ?

Correct. This is by design so that blocks shared amongst different datasets
canbe deduped -- the common case for things like virtual machine images.




Ok, but what about accounting ?  If you have multiple deduped filesystems in a
pool, you can *never know* how much space any single one of them is using ? 
That seems unbelievable...




Ok - but from a performance point of view, I am only using
ram/cpu resources for the deduping of just the individual
filesystems I enabled dedupe on, right ?? I hope that
turning on dedupe for just one filesystem did not incur
ram/cpu costs across the entire pool...

It depends.?-- richard




Can you elaborate at all ?  Dedupe can have fairly profound performance
implications, and I''d like to know if I am paying a huge price just to
get a dedupe on one little filesystem ...


Thanks again.

Volker A. Brandt

2012-Sep-25 18:43 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

> I''m hoping the answer is yes - I''ve been looking but do
not see it
> ...

Well, he is telling you to run the dtrace program as root in one
window, and run the "zfs get all" command on a dataset in your pool
in another window, to trigger the dataset_stats variable to be filled.
> none can hide from dtrace!# dtrace -qn ''dsl_dataset_stats:entry
> {this->ds = (dsl_dataset_t *)arg0;printf("%s\tcompressed size >
%d\tuncompressed size=%d\n", this->ds->ds_dir->dd_myname,
> this->ds->ds_phys->ds_compressed_bytes,
> this->ds->ds_phys->ds_uncompressed_bytes)}''openindiana-1
?
> compressed size = 3667988992 ? ?uncompressed size=3759321088
> [zfs get all rpool/openindiana-1 in another shell]

HTH -- Volker
-- 
------------------------------------------------------------------------
Volker A. Brandt               Consulting and Support for Oracle Solaris
Brandt & Brandt Computer GmbH                   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim, GERMANY            Email: vab at bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513              Schuhgr??e: 46
Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt

"When logic and proportion have fallen sloppy dead"

Jason Usher

2012-Sep-25 19:36 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

--- On Tue, 9/25/12, Volker A. Brandt <vab at bb-c.de> wrote:

 > Well, he is telling you to run the dtrace program as root in
> one
> window, and run the "zfs get all" command on a dataset in
> your pool
> in another window, to trigger the dataset_stats variable to
> be filled.
> 
> > none can hide from dtrace!# dtrace -qn
> ''dsl_dataset_stats:entry
> > {this->ds = (dsl_dataset_t
> *)arg0;printf("%s\tcompressed size > > %d\tuncompressed
size=%d\n",
> this->ds->ds_dir->dd_myname,
> > this->ds->ds_phys->ds_compressed_bytes,
> >
> this->ds->ds_phys->ds_uncompressed_bytes)}''openindiana-1
> ?
> > compressed size = 3667988992 ? ?uncompressed
> size=3759321088
> > [zfs get all rpool/openindiana-1 in another shell]

Yes, he showed me that, I did it, it worked, and I thanked him.

The reason it''s hard to make the thread out in that last response is
that his email is in rich text, or HTML of some kind, so there''s no
>> formatting, etc.

Richard Elling

2012-Sep-25 20:27 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

On Sep 25, 2012, at 11:17 AM, Jason Usher <jusher71 at yahoo.com>
wrote:> 
> Ok - but from a performance point of view, I am only using
> ram/cpu resources for the deduping of just the individual
> filesystems I enabled dedupe on, right ?  I hope that
> turning on dedupe for just one filesystem did not incur
> ram/cpu costs across the entire pool...
> 
> It depends. -- richard
> 
> 
> 
> 
> Can you elaborate at all ?  Dedupe can have fairly profound performance
implications, and I''d like to know if I am paying a huge price just to
get a dedupe on one little filesystem ...
The short answer is: "deduplication transforms big I/Os into small I/Os, 
but does not eliminate I/O." The reason is that the deduplication table has
to be updated when you write something that is deduplicated. This implies
that storage devices which are inexpensive in $/GB but expensive in $/IOPS
might not be the best candidates for deduplication (eg. HDDs). There is some
additional CPU overhead for the sha-256 hash that might or might not be 
noticeable, depending on your CPU. But perhaps the most important factor
is your data -- is it dedupable and are the space savings worthwhile? There
is no simple answer for that, but we generally recommend that you simulate
dedup before committing to it.
 -- richard

--
illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco 
www.zfsday.com
Richard.Elling at RichardElling.com
+1-760-896-4422








-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120925/a4f30c94/attachment-0001.html>

Jim Klimov

2012-Sep-25 20:46 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

2012-09-24 21:08, Jason Usher wrote:>> Ok, thank you.  The problem with this is, the
>> compressratio only goes to two significant digits, which
>> means if I do the math, I''m only getting an
>> approximation.  Since we may use these numbers to
>> compute billing, it is important to get it right.
>>
>> Is there any way at all to get the real *exact* number ?
Well, if you take into account snapshots and clones,
you can see really small "used" numbers on datasets
which reference a lot of data.

In fact, for accounting you might be better off with
the "referenced" field instead of "used", but note
that it is not "recursive" and you need to account
each child dataset''s byte references separately.

I am not sure if there is a simple way to get exact
byte-counts instead of roundings like "422M"...

HTH,
//Jim

Richard Elling

2012-Sep-25 22:52 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

On Sep 25, 2012, at 1:46 PM, Jim Klimov <jimklimov at cos.ru> wrote:
> 2012-09-24 21:08, Jason Usher wrote:
>>> Ok, thank you.  The problem with this is, the
>>> compressratio only goes to two significant digits, which
>>> means if I do the math, I''m only getting an
>>> approximation.  Since we may use these numbers to
>>> compute billing, it is important to get it right.
>>> 
>>> Is there any way at all to get the real *exact* number ?
> 
> Well, if you take into account snapshots and clones,
> you can see really small "used" numbers on datasets
> which reference a lot of data.
> 
> In fact, for accounting you might be better off with
> the "referenced" field instead of "used", but note
> that it is not "recursive" and you need to account
> each child dataset''s byte references separately.
> 
> I am not sure if there is a simple way to get exact
> byte-counts instead of roundings like "422M"...
zfs get -p
 -- richard

--
illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco 
www.zfsday.com
Richard.Elling at RichardElling.com
+1-760-896-4422








-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120925/50466cdf/attachment-0001.html>

Jim Klimov

2012-Sep-26 00:28 UTC

head link

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

2012-09-26 2:52, Richard Elling wrote:>> I am not sure if there is a simple way to get exact
>> byte-counts instead of roundings like "422M"...
>
> zfs get -p
>   -- richard

Thanks to all who corrected me, never too old to learn ;)

# zfs get referenced rpool/export/home
NAME               PROPERTY    VALUE  SOURCE
rpool/export/home  referenced  5.41M  -

# zfs get -p referenced rpool/export/home
NAME               PROPERTY    VALUE  SOURCE
rpool/export/home  referenced  5677056  -

Thanks,
//Jim

zfs discuss - Sep 2012 - ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.

[zfs-discuss] ZFS stats output - used, compressed, deduped, etc.