thr3ads.net - zfs discuss - [zfs-discuss] Is Dedup processing parallelized? [May 2011]

If this information is useful, please help other people find it:
Share via:

Jim Klimov

2011-May-20 13:07 UTC

[zfs-discuss] Is Dedup processing parallelized?

Hi all,

On my oi_148a system I''m now in the process of "evacuating"
data from my "dcpool" (an iSCSI device with a ZFS pool inside),
which is hosted in my physical "pool" on harddisks (6-disk
raidz2). The "dcpool" was configured to dedup all data inside
it, and the volume "pool/dcpool" was compressed as to separate
the two processes. I decided to scrap this experiment, and
now I''m copying back my data by reading files from "dcpool"
and writing it back into compressed+deduped datasets in "pool".

I often see two interesting conditions in this setup:

1) The process is rather slow (I think due to dedup involved -
    even though, by my calculations, the whole DDT can fit in
    my 8Gb RAM), however the kernel processing time often peaks
    out at close to 50%, and there is often quite a bit of idle
    time. I have a dual-core box, so it makes sense to believe
    that some system cycle is not using more than one core.

    Does anyone know if DDT tree walk or search for available
    block ranges in metaslabs or whatever lengthy cycles there
    can be - if any of these are done in a sequential fashion?

    Below is my current DDT sizing. I still do not know which
    value to trust as the DDT entry size in RAM - the one
    returned by MDB or by ZDB (otherwise - what are those
    in-core and on-disk values? I''ve asked before but got
    no replies...)

# zdb -D -e 1601233584937321596
DDT-sha256-zap-ditto: 68 entries, size 1807 on disk, 240 in core
DDT-sha256-zap-duplicate: 1970815 entries, size 1134 on disk, 183 in core
DDT-sha256-zap-unique: 4376290 entries, size 1158 on disk, 187 in core

dedup = 1.38, compress = 1.07, copies = 1.01, dedup * compress / copies 
= 1.46

# zdb -D -e dcpool
DDT-sha256-zap-ditto: 388 entries, size 380 on disk, 200 in core
DDT-sha256-zap-duplicate: 5421787 entries, size 311 on disk, 176 in core
DDT-sha256-zap-unique: 16841361 entries, size 284 on disk, 145 in core

dedup = 1.34, compress = 1.00, copies = 1.00, dedup * compress / copies 
= 1.34

# echo ::sizeof ddt_entry_t | mdb -k
sizeof (ddt_entry_t) = 0x178

    Since I''m writing to "pool" (queried by GUID number
above),
    my box''s performance primarily depends on its DDT - I guess.
    In worst case that''s 6.4mil entries times 376 bytes = 2.4Gb,
    which is well below my computer''s 8Gb RAM (and fits the ARC
    metadata report below).

    However the "dcpool"''s current DDT is clearly big, about
    23mil entries * 376 bytes = 8.6Gb.

2) As seen below, the ARC including metadata currently takes up 3.7Gb.
    According to prstat, all of the global zone processes use 180Mb.
    ZFS is the only filesystem on this box.
    So the second question is: Who uses the other 4Gb of system RAM?

    This picture occurs consistently on every system uptime, as long
    as I use the pool for reading and/or writing extensively, and it
    seems that this is some sort of kernel buffering or workspace
    memory or whatever (cached metaslab allocation tables, maybe?),
    and it is not part of ARC - but it is even bigger.

    What is it? Can it be controlled (as to not decrease performance
    when ARC and/or DDT need more RAM) or at least queried?

# ./tuning/arc_summary.pl | egrep -v ''mdb|set zfs:'' | head -18
| grep ":
"; echo ::arc | mdb -k | grep meta_
          Physical RAM:  8183 MB
          Free Memory :  993 MB
          LotsFree:      127 MB
          Current Size:             3705 MB (arcsize)
          Target Size (Adaptive):   3705 MB (c)
          Min Size (Hard Limit):    3072 MB (zfs_arc_min)
          Max Size (Hard Limit):    6656 MB (zfs_arc_max)
          Most Recently Used Cache Size:          90%    3342 MB (p)
          Most Frequently Used Cache Size:         9%    362 MB (c-p)
arc_meta_used             =      2617 MB
arc_meta_limit            =      6144 MB
arc_meta_max              =      4787 MB

Thanks for any insights,
//Jim Klimov

Edward Ned Harvey

2011-May-20 13:50 UTC

head link

[zfs-discuss] Is Dedup processing parallelized?

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Jim Klimov
> 
> 1) The process is rather slow (I think due to dedup involved -
>     even though, by my calculations, the whole DDT can fit in
>     my 8Gb RAM), 
Please see:
http://opensolaris.org/jive/thread.jspa?messageID=516567

In particular:> New problem:
> I''m following all the advice I summarized into the OP of this
thread, and
[In other words, complete DDT fits in ram]> testing on a test system. (A laptop). And it''s just not working. I
am
> jumping into the dedup performance abyss far, far eariler thanpredicted...

and:  I have another post, which doesn''t seem to have found its way to
this
list.  So I just resent it.  Here''s a snippet:
> This is a workstation with 6 core processor, 16G ram, and a single 1TB 
> hard disk.
> In the default configuration, arc_meta_limit is 3837MB.  And as I increase
> the number of unique blocks in the data pool, it is perfectly clear that 
> performance jumps off a cliff when arc_meta_used starts to reach that 
> level, which is approx 880,000 to 1,030,000 unique blocks.  FWIW, this 
> means, without evil tuning, a 16G server is only sufficient to run dedup 
> on approx 33GB to 125GB unique data without severe performance 
> degradation
> # zdb -D -e 1601233584937321596
> DDT-sha256-zap-ditto: 68 entries, size 1807 on disk, 240 in core
> DDT-sha256-zap-duplicate: 1970815 entries, size 1134 on disk, 183 in core
> DDT-sha256-zap-unique: 4376290 entries, size 1158 on disk, 187 in core
> 
> dedup = 1.38, compress = 1.07, copies = 1.01, dedup * compress / copies
> = 1.46
> 
> # zdb -D -e dcpool
> DDT-sha256-zap-ditto: 388 entries, size 380 on disk, 200 in core
> DDT-sha256-zap-duplicate: 5421787 entries, size 311 on disk, 176 in core
> DDT-sha256-zap-unique: 16841361 entries, size 284 on disk, 145 in core
> 
> dedup = 1.34, compress = 1.00, copies = 1.00, dedup * compress / copies
> = 1.34
> 
> # echo ::sizeof ddt_entry_t | mdb -k
> sizeof (ddt_entry_t) = 0x178
As you can see in that other thread, I am exploring dedup performance too,
and finding that this method of calculation is totally ineffective.  Number
of blocks times size of ddt_entry, as you have seen, produces a reasonable
number, but the experimentally measured results are nowhere near this.

zfs discuss - May 2011 - Is Dedup processing parallelized?

[zfs-discuss] Is Dedup processing parallelized?

[zfs-discuss] Is Dedup processing parallelized?