I created a zfs pool with dedup with the following settings: zpool create data c8t1d0 zfs create data/shared zfs set dedup=on data/shared The thing I was wondering about was it seems like ZFS only dedup at the file level and not the block. When I make multiple copies of a file to the store I see an increase in the deup ratio, but when I copy similar files the ratio stays at 1.00x. -- This message posted from opensolaris.org
On Fri, Jan 28, 2011 at 01:38:11PM -0800, Igor P wrote:> I created a zfs pool with dedup with the following settings: > zpool create data c8t1d0 > zfs create data/shared > zfs set dedup=on data/shared > > The thing I was wondering about was it seems like ZFS only dedup at > the file level and not the block. When I make multiple copies of a > file to the store I see an increase in the deup ratio, but when I copy > similar files the ratio stays at 1.00x.Dedup is done at the block level, not file level. "Similar files" does not mean that they actually share common blocks. You''ll have to look more closely to determine if they do. Nico --
On 01/28/11 02:38 PM, Igor P wrote:> I created a zfs pool with dedup with the following settings: > zpool create data c8t1d0 > zfs create data/shared > zfs set dedup=on data/shared > > The thing I was wondering about was it seems like ZFS only dedup at the file level and not the block. When I make multiple copies of a file to the store I see an increase in the deup ratio, but when I copy similar files the ratio stays at 1.00x.Igor, ZFS does indeed perform dedup at the block level. Identical files have identical blocks, of course, but "similar" files may have differences such that data is inserted, deleted or changed so each block is different. Same data has to be on the same block alignment to have duplicate blocks. Also, it''s important to have lots of RAM or high speed devices to quickly access metadata, or removing data will take a lot of time, so please use appropriately sized systems. That''s been discussed a lot on this list. See Jeff Bonwick''s blog for a very good description: http://blogs.sun.com/bonwick/entry/zfs_dedup I hope that''s helpful, Jeff (a different Jeff)
On Fri, Jan 28, 2011 at 1:38 PM, Igor P <igor at godlike.org> wrote:> I created a zfs pool with dedup with the following settings: > zpool create data c8t1d0 > zfs create data/shared > zfs set dedup=on data/shared > > The thing I was wondering about was it seems like ZFS only dedup at the file level and not the block. When I make multiple copies of a file to the store I see an increase in the deup ratio, but when I copy similar files the ratio stays at 1.00x.Easiest way to test it is to create a 10 MB file full of random data: $ dd if=/dev/random of=random.10M bs=1M count=10 Copy that to the pool a few times under different names to watch the dedupe ratio increase, basically linearly. Then open the file in a text editor and change the last few lines of the files. Copy that to the pool a few times under new names. Watch the dedupe ratio increase, but not linearly as the last block or three of the file will be different. Repeat changing different lines in the file, and watch as disk usage only increases a little, since the files still "share" (or have in common) a lot of blocks. ZFS dedupe happens at the block layer, not the file layer. -- Freddie Cash fjwcash at gmail.com
Roy Sigurd Karlsbakk
2011-Jan-28 22:24 UTC
[zfs-discuss] ZFS not usable (was ZFS Dedup question)
> I created a zfs pool with dedup with the following settings: > zpool create data c8t1d0 > zfs create data/shared > zfs set dedup=on data/shared > > The thing I was wondering about was it seems like ZFS only dedup at > the file level and not the block. When I make multiple copies of a > file to the store I see an increase in the deup ratio, but when I copy > similar files the ratio stays at 1.00x.I''ve done some rather intensive tests on zfs dedup on this 12TB test system we have. I have concluded that with some 150B worth of L2ARC and 8GB ARC, ZFS dedup is unusable for volumes even at 2TB storage. It works, but it''s dead slow in write terms, and the time to remove a dataset is still very long. I wouldn''t recommend using ZFS dedup unless your name were Ahmed Nazif or Silvio Berlusconi, where the damage might be used for some good. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On 1/28/2011 1:48 PM, Nicolas Williams wrote:> On Fri, Jan 28, 2011 at 01:38:11PM -0800, Igor P wrote: >> I created a zfs pool with dedup with the following settings: >> zpool create data c8t1d0 >> zfs create data/shared >> zfs set dedup=on data/shared >> >> The thing I was wondering about was it seems like ZFS only dedup at >> the file level and not the block. When I make multiple copies of a >> file to the store I see an increase in the deup ratio, but when I copy >> similar files the ratio stays at 1.00x. > Dedup is done at the block level, not file level. "Similar files" does > not mean that they actually share common blocks. You''ll have to look > more closely to determine if they do. > > NicoWhat Nico said. The big reason here is that blocks have to be ALIGNED on the same block boundaries to be dedup''d. That is, if I have a file which contains: AAABBCCCCCCDD if I have 4-character wide blocks, then if I copy the file, and append an "X" to the above file, making it look like: XAAABBCCCCCCDD There will be NO DEDUP in that case. This is what trips people up most of the time - they see "similar" files, but don''t realize that "similar" for dedup has to mean aligned on block boundaries, not just "I''ve got the same 3k of data in both files". -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On 1/28/2011 2:24 PM, Roy Sigurd Karlsbakk wrote:>> I created a zfs pool with dedup with the following settings: >> zpool create data c8t1d0 >> zfs create data/shared >> zfs set dedup=on data/shared >> >> The thing I was wondering about was it seems like ZFS only dedup at >> the file level and not the block. When I make multiple copies of a >> file to the store I see an increase in the deup ratio, but when I copy >> similar files the ratio stays at 1.00x. > I''ve done some rather intensive tests on zfs dedup on this 12TB test system we have. I have concluded that with some 150B worth of L2ARC and 8GB ARC, ZFS dedup is unusable for volumes even at 2TB storage. It works, but it''s dead slow in write terms, and the time to remove a dataset is still very long. I wouldn''t recommend using ZFS dedup unless your name were Ahmed Nazif or Silvio Berlusconi, where the damage might be used for some good. > > Vennlige hilsener / Best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 97542685 > roy at karlsbakk.net > http://blogg.karlsbakk.net/ > --If you want Dedup to perform well, you *absolutely* must have a L2ARC device which can hold the *entire* Dedup Table. Remember, the size of the DDT is not dependent on the size of your data pool, but in the number of zfs slabs which are contained in that pool (slab = record, for this purpose). Thus, 12TB worth of DVD iso images (record size about 128k) will consume 256 times less DDT space as will 12TB filled with text configuration files (average record size < 512b). And, I doubt 8GB for ARC is sufficient, either, for a DDT consuming over 100GB of space. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
Roy Sigurd Karlsbakk
2011-Jan-29 14:32 UTC
[zfs-discuss] ZFS not usable (was ZFS Dedup question)
> If you want Dedup to perform well, you *absolutely* must have a L2ARC > device which can hold the *entire* Dedup Table. Remember, the size of > the DDT is not dependent on the size of your data pool, but in the > number of zfs slabs which are contained in that pool (slab = record, > for > this purpose). Thus, 12TB worth of DVD iso images (record size about > 128k) will consume 256 times less DDT space as will 12TB filled with > text configuration files (average record size < 512b). > > And, I doubt 8GB for ARC is sufficient, either, for a DDT consuming > over 100GB of space.The test was run on a test machine with a small (160GB IIRC) root disk, 7x2TB disk in RAIDz2 plus 2 80GB Intel X-25M SSD gen 2. The box has 8GB RAM/ARC, and was configured with two mirrored 4GB partitions on the SSD for SLOG and the rest for L2ARC. The data stored on the system, was Bacula output, meaning more or less streaming to large files. Recordsize was set to 128kB. The initial performance was good, but after filling up about 2TB, performance was down to a 25% of the initial, and AFAICS with only 2TB, even 8GB ARC should suffice for the DDT. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.