thr3ads.net - zfs discuss - [zfs-discuss] ZFS dedup success stories? [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Roy Sigurd Karlsbakk

2011-Jan-30 21:53 UTC

[zfs-discuss] ZFS dedup success stories?

Hi all

As I''ve said here on the list a few times earlier, the last on the
thread ''ZFS not usable (was ZFS Dedup question)'',
I''ve been doing some rather thorough testing on zfs dedup, and as you
can see from the posts, it wasn''t very satisfactory. The docs claim
1-2GB memory usage per terabyte stored, ARC or L2ARC, but as you can read from
the post, I don''t find this very likely.

So, is there anyone in here using dedup for large storage (2TB? 10TB? more?) and
can document sustained high performance?

The reason I ask, is if this is the case, something is badly wrong with my test
setup.

The test box is a supermicro thing with a Core2duo CPU, 8 gigs of RAM, 4 gigs of
mirrored SLOG and some 150 gigs of L2ARC on 80GB x25-M drives. The data drives
are 7 2TB drives in RAIDz2. We''re getting down to 10-20MB/s on Bacula
backup to this system, meaning streaming, which should be good for RAIDz2. Since
the writes are local (bacula-sd running), async writes will be the main thing.
Initial results show pretty good I/O perfrmance, but after about 2TB used, the
I/O speed is down to the numbers I mentioned

PS: I know those drives aren''t optimal for this, but the box is a year
old or so. Still, they should help out a bit.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Garrett D''Amore

2011-Jan-30 22:13 UTC

head link

[zfs-discuss] ZFS dedup success stories?

I''m not sure about *docs*, but my rough estimations:

Assume 1TB of actual used storage.  Assume 64K block/slab size.  (Not
sure how realistic that is -- it depends totally on your data set.)
Assume 300 bytes per DDT entry.

So we have (1024^4 / 65536) * 300 = 5033164800 or about 5GB RAM for one
TB of used disk space.

Dedup is *hungry* for RAM.  8GB is not enough for your configuration,
most likely!  First guess: double the RAM and then you might have better
luck.

The other takeaway here: dedup is the wrong technology for typical small
home server (e.g. systems that max out at 4 or even 8 GB).

Look into compression and snapshot clones as better alternatives to
reduce your disk space needs without incurring the huge RAM penalties
associated with dedup.

Dedup is *great* for a certain type of data set with configurations that
are extremely RAM heavy.  For everyone else, its almost universally the
wrong solution.  Ultimately, disk is usually cheaper than RAM -- think
hard before you enable dedup -- are you making the right trade off?

	- Garrett

On Sun, 2011-01-30 at 22:53 +0100, Roy Sigurd Karlsbakk
wrote:> Hi all
> 
> As I''ve said here on the list a few times earlier, the last on the
thread ''ZFS not usable (was ZFS Dedup question)'',
I''ve been doing some rather thorough testing on zfs dedup, and as you
can see from the posts, it wasn''t very satisfactory. The docs claim
1-2GB memory usage per terabyte stored, ARC or L2ARC, but as you can read from
the post, I don''t find this very likely.
> 
> So, is there anyone in here using dedup for large storage (2TB? 10TB?
more?) and can document sustained high performance?
> 
> The reason I ask, is if this is the case, something is badly wrong with my
test setup.
> 
> The test box is a supermicro thing with a Core2duo CPU, 8 gigs of RAM, 4
gigs of mirrored SLOG and some 150 gigs of L2ARC on 80GB x25-M drives. The data
drives are 7 2TB drives in RAIDz2. We''re getting down to 10-20MB/s on
Bacula backup to this system, meaning streaming, which should be good for
RAIDz2. Since the writes are local (bacula-sd running), async writes will be the
main thing. Initial results show pretty good I/O perfrmance, but after about 2TB
used, the I/O speed is down to the numbers I mentioned
> 
> PS: I know those drives aren''t optimal for this, but the box is a
year old or so. Still, they should help out a bit.
> 
> Vennlige hilsener / Best regards
> 
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> roy at karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det
er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Edward Ned Harvey

2011-Jan-30 23:56 UTC

head link

[zfs-discuss] ZFS dedup success stories?

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
> 
> The test box is a supermicro thing with a Core2duo CPU, 8 gigs of RAM, 4
gigs
> of mirrored SLOG and some 150 gigs of L2ARC on 80GB x25-M drives. The
> data drives are 7 2TB drives in RAIDz2. We''re getting down to
10-20MB/s on
> Bacula backup to this system, meaning streaming, which should be good for
> RAIDz2. Since the writes are local (bacula-sd running), async writes will
be the
> main thing. Initial results show pretty good I/O perfrmance, but after
about
> 2TB used, the I/O speed is down to the numbers I mentioned
You probably know this already, but while you''re doing async writes,
neither the slog nor the l2arc offer any benefit to you.

Also, your problem might be completely unrelated to your pool.  You might try
writing to /dev/null instead, and just see what the performance is.  If
it''s still slow, you know you''re waiting for something else
that''s not zpool related.

Also, you might try making an old backup file available on something like
external disk or whatever, and simply copy it to the zpool in question.  If it
goes fast, once again you''re eliminating the possibility that the
problem is zpool related.

I don''t know what bacula uses in the background, but I know
I''ve had terrible performance using dd to write to tape while dd would
perform just fine writing to anything else ... and anything else would work fine
writing to tape.  My point is only to say that you should question precisely
*what* is causing the performance bottleneck.

Edward Ned Harvey

2011-Jan-31 00:06 UTC

head link

[zfs-discuss] ZFS dedup success stories?

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
> 
> We''re getting down to 10-20MB/s on
Oh, one more thing.  How are you measuring the speed?  Because if you have data
which is highly compressible, or highly duplicated, you could be virtually
writing tons of data really really fast, but the disks would barely be active at
all.  For example, if you
	dd if=/dev/zero bs=1024k count=1024 | pv | gzip | pv > zerofile.gz
Then the data rate going through the first pv is 1000 times higher than the
datarate going through the second pv.  If I were only looking at the data rate
after compression, then I might falsely reach the conclusion that I was having
bad performance.

Roy Sigurd Karlsbakk

2011-Feb-01 00:42 UTC

head link

[zfs-discuss] ZFS dedup success stories?

> I''m not sure about *docs*, but my rough estimations:
> 
> Assume 1TB of actual used storage. Assume 64K block/slab size. (Not
> sure how realistic that is -- it depends totally on your data set.)
> Assume 300 bytes per DDT entry.
> 
> So we have (1024^4 / 65536) * 300 = 5033164800 or about 5GB RAM for
> one
> TB of used disk space.
> 
> Dedup is *hungry* for RAM. 8GB is not enough for your configuration,
> most likely! First guess: double the RAM and then you might have
> better
> luck.
I know... that''s why I use L2ARC
 > The other takeaway here: dedup is the wrong technology for typical
> small home server (e.g. systems that max out at 4 or even 8 GB).
This isn''t a home server test
> Look into compression and snapshot clones as better alternatives to
> reduce your disk space needs without incurring the huge RAM penalties
> associated with dedup.
> 
> Dedup is *great* for a certain type of data set with configurations
> that
> are extremely RAM heavy. For everyone else, its almost universally the
> wrong solution. Ultimately, disk is usually cheaper than RAM -- think
> hard before you enable dedup -- are you making the right trade off?
Just what sort of configurations would you think of? I''ve been testing
dedup in rather large ones, and the sun is that ZFS doesn''t scale well
as of now


Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Roy Sigurd Karlsbakk

2011-Feb-01 00:48 UTC

head link

[zfs-discuss] ZFS dedup success stories (take two)

> As I''ve said here on the list a few times earlier, the last on the
> thread ''ZFS not usable (was ZFS Dedup question)'',
I''ve been doing some
> rather thorough testing on zfs dedup, and as you can see from the
> posts, it wasn''t very satisfactory. The docs claim 1-2GB memory
usage
> per terabyte stored, ARC or L2ARC, but as you can read from the post,
> I don''t find this very likely.
Sorry about the initial post - it was wrong. The hardware configuration was
right, but for initial tests, I use NFS, meaning sync writes. This obviously
stresses the ARC/L2ARC more than async writes, but the result remains the same.

With 140GB with of L2ARC on two X25-Ms and some 4GB partitions on the same
devices, 4GB each, in a mirror, the write speed was reduced to something like
20% of the origian speed. This was with about 2TB used on the zpool with a
single data stream, no parallelism whatsoever. Still with 8GB ARC and 140GB of
L2ARC on two SSDs, this speed is fairly low. I could not see substantially high
CPU or I/O load during this test.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy at karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er
et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.

Garrett D''Amore

2011-Feb-01 01:05 UTC

head link

[zfs-discuss] ZFS dedup success stories (take two)

On 01/31/11 04:48 PM, Roy Sigurd Karlsbakk wrote:>> As I''ve said here on the list a few times earlier, the last on
the
>> thread ''ZFS not usable (was ZFS Dedup question)'',
I''ve been doing some
>> rather thorough testing on zfs dedup, and as you can see from the
>> posts, it wasn''t very satisfactory. The docs claim 1-2GB
memory usage
>> per terabyte stored, ARC or L2ARC, but as you can read from the post,
>> I don''t find this very likely.
>>      
> Sorry about the initial post - it was wrong. The hardware configuration was
right, but for initial tests, I use NFS, meaning sync writes. This obviously
stresses the ARC/L2ARC more than async writes, but the result remains the same.
>
> With 140GB with of L2ARC on two X25-Ms and some 4GB partitions on the same
devices, 4GB each, in a mirror, the write speed was reduced to something like
20% of the origian speed. This was with about 2TB used on the zpool with a
single data stream, no parallelism whatsoever. Still with 8GB ARC and 140GB of
L2ARC on two SSDs, this speed is fairly low. I could not see substantially high
CPU or I/O load during this test.
>    
I would not expect good performance on dedup with write... dedup isn''t 
going to make write''s fast - its something you want on a system with a 
lot of duplicated data that sustain a lot of reads.  (That said, highly 
duplicate date with a DDT that fits entirely in RAM might see a benefit 
from not having to write meta data frequently.  But I suspect an SLOG 
here is going to be critical to get good performance since you''ll still
have a lot of synchronous meta data writes.)

     - Garrett> Vennlige hilsener / Best regards
>
> roy
> --
> Roy Sigurd Karlsbakk
> (+47) 97542685
> roy at karlsbakk.net
> http://blogg.karlsbakk.net/
> --
> I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det
er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Oyvind Syljuasen

2011-Feb-01 08:23 UTC

head link

[zfs-discuss] ZFS dedup success stories?

> > Dedup is *hungry* for RAM. 8GB is not enough for
> your configuration,
> > most likely! First guess: double the RAM and then
> you might have
> > better
> > luck.
> 
> I know... that''s why I use L2ARC
>  
What is zdb -D showing?

Does this give you any clue;
http://blogs.sun.com/roch/entry/dedup_performance_considerations1

br,
syljua
-- 
This message posted from opensolaris.org

Edward Ned Harvey

2011-Feb-01 12:38 UTC

head link

[zfs-discuss] ZFS dedup success stories?

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
> 
> > Dedup is *hungry* for RAM. 8GB is not enough for your configuration,
> > most likely! First guess: double the RAM and then you might have
> > better
> > luck.
> 
> I know... that''s why I use L2ARC
l2arc is not a substitute for ram.  In some cases it can improve disk
performance in the absence of ram, but it cannot be used for in-memory
applications and kernel.

At best, what you''re describing would be swap space on a SSD.  Swap
space is a substitute for ram.  Be aware that SSD performance is 1/100th the
performance of ram (or worse.)

Garrett is right.  Add more ram, if it is physically possible.  And if it is not
physically possible, think long and hard about upgrading your server so you can
add more ram.

Edward Ned Harvey

2011-Feb-01 12:41 UTC

head link

[zfs-discuss] ZFS dedup success stories (take two)

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
> 
> Sorry about the initial post - it was wrong. The hardware configuration was
> right, but for initial tests, I use NFS, meaning sync writes. This
obviously
> stresses the ARC/L2ARC more than async writes, but the result remains the
> same.
I''m sorry, that''s not correct.  L2ARC is a read cache.  ZIL is
used for sync writes.  ZIL always exists.   If there is no dedicated ZIL log
device, then blocks are used for ZIL in the main storage pool.

Craig Morgan

2011-Feb-03 07:48 UTC

head link

[zfs-discuss] ZFS dedup success stories (take two)

Two caveats inline ?

On 1 Feb 2011, at 01:05, Garrett D''Amore wrote:
> On 01/31/11 04:48 PM, Roy Sigurd Karlsbakk wrote:
>>> As I''ve said here on the list a few times earlier, the
last on the
>>> thread ''ZFS not usable (was ZFS Dedup question)'',
I''ve been doing some
>>> rather thorough testing on zfs dedup, and as you can see from the
>>> posts, it wasn''t very satisfactory. The docs claim 1-2GB
memory usage
>>> per terabyte stored, ARC or L2ARC, but as you can read from the
post,
>>> I don''t find this very likely.
>>>     
>> Sorry about the initial post - it was wrong. The hardware configuration
was right, but for initial tests, I use NFS, meaning sync writes. This obviously
stresses the ARC/L2ARC more than async writes, but the result remains the same.
>> 
>> With 140GB with of L2ARC on two X25-Ms and some 4GB partitions on the
same devices, 4GB each, in a mirror, the write speed was reduced to something
like 20% of the origian speed. This was with about 2TB used on the zpool with a
single data stream, no parallelism whatsoever. Still with 8GB ARC and 140GB of
L2ARC on two SSDs, this speed is fairly low. I could not see substantially high
CPU or I/O load during this test.
>>   
> 
> I would not expect good performance on dedup with write... dedup
isn''t going to make write''s fast - its something you want on a
system with a lot of duplicated data that sustain a lot of reads.  (That said,
highly duplicate date with a DDT that fits entirely in RAM might see a benefit
from not having to write meta data frequently.  But I suspect an SLOG here is
going to be critical to get good performance since you''ll still have a
lot of synchronous meta data writes.)
> 
>    - Garrett
There is one circumstance where the write operation could be an improvement, in
a system with data which is highly de-dupable *and* undergoing heavy write load,
it may be useful to forego the large write and instead convert into a smaller
(and more frequent) small metadata write, SLOGs would then show more benefit and
we''d release pressure on the back-end for thruput.

On a system with a high read ratio, de-duped data currently would be quite
efficient, but there is one pathology in current ZFS which impacts this
somewhat, last time I looked each ARC ref to a de-duped block leads to a
inflated ARC copy of the data, hence a highly ref''ed block (20x for
instance), could exist 20x in an inflated state in ARC after read refs to each
occurrence.
De-dup of inflated data in ARC was a pending ZFS optimisation ?

Craig
  >> Vennlige hilsener / Best regards
>> 
>> roy
>> --
>> Roy Sigurd Karlsbakk
>> (+47) 97542685
>> roy at karlsbakk.net
>> http://blogg.karlsbakk.net/
>> --
>> I all pedagogikk er det essensielt at pensum presenteres intelligibelt.
Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og
relevante synonymer p? norsk.
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>   
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Craig Morgan
Cinnabar Solutions Ltd

t: +44 (0)791 338 3190
f: +44 (0)870 705 1726
e: craig at cinnabar-solutions.com
w: www.cinnabar-solutions.com

zfs discuss - Jan 2011 - ZFS dedup success stories?

[zfs-discuss] ZFS dedup success stories?

[zfs-discuss] ZFS dedup success stories?

[zfs-discuss] ZFS dedup success stories?

[zfs-discuss] ZFS dedup success stories?

[zfs-discuss] ZFS dedup success stories?

[zfs-discuss] ZFS dedup success stories (take two)

[zfs-discuss] ZFS dedup success stories (take two)

[zfs-discuss] ZFS dedup success stories?

[zfs-discuss] ZFS dedup success stories?

[zfs-discuss] ZFS dedup success stories (take two)

[zfs-discuss] ZFS dedup success stories (take two)