Hi all, On my oi_148a system I''m now in the process of "evacuating" data from my "dcpool" (an iSCSI device with a ZFS pool inside), which is hosted in my physical "pool" on harddisks (6-disk raidz2). The "dcpool" was configured to dedup all data inside it, and the volume "pool/dcpool" was compressed as to separate the two processes. I decided to scrap this experiment, and now I''m copying back my data by reading files from "dcpool" and writing it back into compressed+deduped datasets in "pool". I often see two interesting conditions in this setup: 1) The process is rather slow (I think due to dedup involved - even though, by my calculations, the whole DDT can fit in my 8Gb RAM), however the kernel processing time often peaks out at close to 50%, and there is often quite a bit of idle time. I have a dual-core box, so it makes sense to believe that some system cycle is not using more than one core. Does anyone know if DDT tree walk or search for available block ranges in metaslabs or whatever lengthy cycles there can be - if any of these are done in a sequential fashion? Below is my current DDT sizing. I still do not know which value to trust as the DDT entry size in RAM - the one returned by MDB or by ZDB (otherwise - what are those in-core and on-disk values? I''ve asked before but got no replies...) # zdb -D -e 1601233584937321596 DDT-sha256-zap-ditto: 68 entries, size 1807 on disk, 240 in core DDT-sha256-zap-duplicate: 1970815 entries, size 1134 on disk, 183 in core DDT-sha256-zap-unique: 4376290 entries, size 1158 on disk, 187 in core dedup = 1.38, compress = 1.07, copies = 1.01, dedup * compress / copies = 1.46 # zdb -D -e dcpool DDT-sha256-zap-ditto: 388 entries, size 380 on disk, 200 in core DDT-sha256-zap-duplicate: 5421787 entries, size 311 on disk, 176 in core DDT-sha256-zap-unique: 16841361 entries, size 284 on disk, 145 in core dedup = 1.34, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.34 # echo ::sizeof ddt_entry_t | mdb -k sizeof (ddt_entry_t) = 0x178 Since I''m writing to "pool" (queried by GUID number above), my box''s performance primarily depends on its DDT - I guess. In worst case that''s 6.4mil entries times 376 bytes = 2.4Gb, which is well below my computer''s 8Gb RAM (and fits the ARC metadata report below). However the "dcpool"''s current DDT is clearly big, about 23mil entries * 376 bytes = 8.6Gb. 2) As seen below, the ARC including metadata currently takes up 3.7Gb. According to prstat, all of the global zone processes use 180Mb. ZFS is the only filesystem on this box. So the second question is: Who uses the other 4Gb of system RAM? This picture occurs consistently on every system uptime, as long as I use the pool for reading and/or writing extensively, and it seems that this is some sort of kernel buffering or workspace memory or whatever (cached metaslab allocation tables, maybe?), and it is not part of ARC - but it is even bigger. What is it? Can it be controlled (as to not decrease performance when ARC and/or DDT need more RAM) or at least queried? # ./tuning/arc_summary.pl | egrep -v ''mdb|set zfs:'' | head -18 | grep ": "; echo ::arc | mdb -k | grep meta_ Physical RAM: 8183 MB Free Memory : 993 MB LotsFree: 127 MB Current Size: 3705 MB (arcsize) Target Size (Adaptive): 3705 MB (c) Min Size (Hard Limit): 3072 MB (zfs_arc_min) Max Size (Hard Limit): 6656 MB (zfs_arc_max) Most Recently Used Cache Size: 90% 3342 MB (p) Most Frequently Used Cache Size: 9% 362 MB (c-p) arc_meta_used = 2617 MB arc_meta_limit = 6144 MB arc_meta_max = 4787 MB Thanks for any insights, //Jim Klimov
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jim Klimov > > 1) The process is rather slow (I think due to dedup involved - > even though, by my calculations, the whole DDT can fit in > my 8Gb RAM),Please see: http://opensolaris.org/jive/thread.jspa?messageID=516567 In particular:> New problem: > I''m following all the advice I summarized into the OP of this thread, and[In other words, complete DDT fits in ram]> testing on a test system. (A laptop). And it''s just not working. I am > jumping into the dedup performance abyss far, far eariler thanpredicted... and: I have another post, which doesn''t seem to have found its way to this list. So I just resent it. Here''s a snippet:> This is a workstation with 6 core processor, 16G ram, and a single 1TB > hard disk. > In the default configuration, arc_meta_limit is 3837MB. And as I increase> the number of unique blocks in the data pool, it is perfectly clear that > performance jumps off a cliff when arc_meta_used starts to reach that > level, which is approx 880,000 to 1,030,000 unique blocks. FWIW, this > means, without evil tuning, a 16G server is only sufficient to run dedup > on approx 33GB to 125GB unique data without severe performance > degradation> # zdb -D -e 1601233584937321596 > DDT-sha256-zap-ditto: 68 entries, size 1807 on disk, 240 in core > DDT-sha256-zap-duplicate: 1970815 entries, size 1134 on disk, 183 in core > DDT-sha256-zap-unique: 4376290 entries, size 1158 on disk, 187 in core > > dedup = 1.38, compress = 1.07, copies = 1.01, dedup * compress / copies > = 1.46 > > # zdb -D -e dcpool > DDT-sha256-zap-ditto: 388 entries, size 380 on disk, 200 in core > DDT-sha256-zap-duplicate: 5421787 entries, size 311 on disk, 176 in core > DDT-sha256-zap-unique: 16841361 entries, size 284 on disk, 145 in core > > dedup = 1.34, compress = 1.00, copies = 1.00, dedup * compress / copies > = 1.34 > > # echo ::sizeof ddt_entry_t | mdb -k > sizeof (ddt_entry_t) = 0x178As you can see in that other thread, I am exploring dedup performance too, and finding that this method of calculation is totally ineffective. Number of blocks times size of ddt_entry, as you have seen, produces a reasonable number, but the experimentally measured results are nowhere near this.