Lutz Schumann
2010-Jul-01 08:33 UTC
[zfs-discuss] dedup accounting anomaly / dedup experiments
Hello list, I wanted to test deduplication a little and did a experiment. My question was: can I dedupe infinite or is ther a upper limit ? So for that I did a very basic test. - I created a ramdisk-pool (1GB) - enabled dedup and - wrote zeros to it (in one single file) until an error is returned. The size of the pool was 1046 MB, I was able to write 62 GB to it then it says "no space left on device". The block size was 128k, so I was able to write 507.000 blocks to the pool. With this device beeing full, I see the following: 1) zfs list reports that no space is left (AVAIL=0) 2) zpool reports that the dedup factor was ~507.000x 3) zpool reports also that 8,6 MB of space were allocated in the pool (0% used) So for me it looks like there is something broken in ZFS accounting with dedupe. - zpool and zfs usage free space reporting do not align - the real deduplication factor was not 507.000 (meaning I would have been able to write 507.000x1GB = a lot to the pool) - when calculating 1046 MB / 507000 = 2.1 KB, somehow for each block of 128k, 2,1 KB of data bas been written (assuming zfs list is correct). What is this ? Metadata ? Meaning that I have aprox 1.6 % of Meatadata in ZFS (1/(128k/2,1k)) ? I repeatet the same thing for a recordsize of 32k. The funny thing is: - Also 60 GB could be written before "no space left" - 31 MB of space were alloated in the pool (zpool list) The version of the pool is 25. During the experiment I could nicely see: - that performance on ramdisk is CPU bound doing ~125 MB /sec per Core. - performance scales linearly with adding CPU cores. (125 MB/s cor 1core, 253 Mb/s for 2core, 408 MB/s for 4core). - that the upper size of the deduplication table is blocks * ~150 Byte, indipendent of the dedupe factor - the ddt does not grow for deduplicatable blocks (zdb -D) - performance goes down factor of ~4 when switching from allocation policy of "closest" to "best fit" (when the pool fills rate drops from 250 MB/s to 67 MB/s. I suspect even worse results for spinning media because of the head movements (>10x slow down). Anyone knowing why the dedup factor is wrong ? Any insights on what has actually been written (compressed meta data, deduped meta data .. etc.) would be greatly appreshiated. Regards, Robert -- This message posted from opensolaris.org
Will Murnane
2010-Jul-02 07:14 UTC
[zfs-discuss] dedup accounting anomaly / dedup experiments
On Thu, Jul 1, 2010 at 04:33, Lutz Schumann <presales at storageconcepts.de> wrote:> Hello list, > > I wanted to test deduplication a little and did a experiment. > > My question was: ?can I dedupe infinite or is ther a upper limit ? > > So for that I did a very basic test. > - ?I created a ramdisk-pool (1GB) > - enabled dedup and > - wrote zeros to it (in one single file) until an error is returned.I don''t know about the rest of your test, but writing zeroes to a ZFS filesystem is probably not a very good test, because ZFS recognizes these blocks of zeroes and doesn''t actually write anything. Unless maybe encryption is on, but maybe not even then. I''d write a little program that initializes 128k of memory to a particular pattern, then writes it to disk until it gets ENOSPC (or some other error code, I suppose). That should force the first block to actually be written, and all the others to point to it.> During the experiment I could nicely see: > - that performance on ramdisk is CPU bound doing ~125 MB /sec per Core.Ouch! That''s all? Will
Lutz Schumann
2010-Jul-02 16:56 UTC
[zfs-discuss] dedup accounting anomaly / dedup experiments
Hi,> I don''t know about the rest of your test, but writing > zeroes to a ZFS > filesystem is probably not a very good test, because > ZFS recognizes > these blocks of zeroes and doesn''t actually write > anything. Unless > maybe encryption is on, but maybe not even then.Not true. If I want ZFS to write zeros, ZFS does write zeros. You can simply check this by doing a dd. ZFS does not filter "zero" writes.> > During the experiment I could nicely see: > > - that performance on ramdisk is CPU bound doing > ~125 MB /sec per Core. > Ouch! That''s all?Per Core ! So for 8 Core System this is ~1 GB / sec - More then mosts disks can handle. If you use 50% (must save some for scrub) you are ok. Robert -- This message posted from opensolaris.org
Darren J Moffat
2010-Jul-02 18:45 UTC
[zfs-discuss] dedup accounting anomaly / dedup experiments
On 02/07/2010 17:56, Lutz Schumann wrote:>> I don''t know about the rest of your test, but writing >> zeroes to a ZFS >> filesystem is probably not a very good test, because >> ZFS recognizes >> these blocks of zeroes and doesn''t actually write >> anything. Unless >> maybe encryption is on, but maybe not even then. > > Not true. If I want ZFS to write zeros, ZFS does write zeros. You can simply check this by doing a dd. ZFS does not filter "zero" writes.Actually it does if you have compression turned on and the blocks compress away to 0 bytes. See http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zio.c#zio_write_bp_init Specifically line 1005: 1005 if (psize == 0) { 1006 zio->io_pipeline = ZIO_INTERLOCK_PIPELINE; 1007 } else { -- Darren J Moffat
Lutz Schumann
2010-Jul-03 08:44 UTC
[zfs-discuss] dedup accounting anomaly / dedup experiments
> Actually it does if you have compression turned on > and the blocks > compress away to 0 bytes. > > See > http://src.opensolaris.org/source/xref/onnv/onnv-gate/ > usr/src/uts/common/fs/zfs/zio.c#zio_write_bp_init > > Specifically line 1005: > > 1005 if (psize == 0) { > 1006 zio->io_pipeline = ZIO_INTERLOCK_PIPELINE; > 1007 } else { >Interesting, did quick test: Writing zeros async: Dedup=on, Compression=off -> 60 MB / sec Writing zeros async: Dedup=off, Compression=on -> 480 MB /sec Writing zeros async: Dedup=off, Compression=off -> 12 MB / sec It seems if I do a sync write with dedup=off, compress=on, I get 12 MB / sec. In both cases I see disk /IO. So for me itlooks like meta data writes - the I/O beeing limited to 480 Mb/s seems to be limited by the async meta data updates (this is a VM, so slow). So does that mean that zero data blocks are not written, but meta data blocks ? Robert -- This message posted from opensolaris.org
Brandon High
2010-Jul-09 08:45 UTC
[zfs-discuss] dedup accounting anomaly / dedup experiments
On Thu, Jul 1, 2010 at 1:33 AM, Lutz Schumann <presales at storageconcepts.de>wrote:> > Anyone knowing why the dedup factor is wrong ? Any insights on what has > actually been written (compressed meta data, deduped meta data .. etc.) > would be greatly appreshiated. >Metadata and ditto blocks. Even with dedup, zfs will write multiple copies of blocks after reaching a certain threshold. -B -- Brandon High : bhigh at freaks.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100709/928d7ea9/attachment.html>