thr3ads.net - zfs discuss - [zfs-discuss] Deduplication Memory Requirements [May 2011]

If this information is useful, please help other people find it:
Share via:

Ray Van Dolson

2011-May-04 16:57 UTC

[zfs-discuss] Deduplication Memory Requirements

There are a number of threads (this one[1] for example) that describe
memory requirements for deduplication.  They''re pretty high.

I''m trying to get a better understanding... on our NetApps we use 4K
block sizes with their post-process deduplication and get pretty good
dedupe ratios for VM content.

Using ZFS we are using 128K record sizes by default, which nets us less
impressive savings... however, to drop to a 4K record size would
theoretically require that we have nearly 40GB of memory for only 1TB
of storage (based on 150 bytes per block for the DDT).

This obviously becomes prohibitively higher for 10+ TB file systems.

I will note that our NetApps are using only 2TB FlexVols, but would
like to better understand ZFS''s (apparently) higher memory
requirements... or maybe I''m missing something entirely.

Thanks,
Ray

[1] http://markmail.org/message/wile6kawka6qnjdw

Erik Trimble

2011-May-04 19:29 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On 5/4/2011 9:57 AM, Ray Van Dolson wrote:> There are a number of threads (this one[1] for example) that describe
> memory requirements for deduplication.  They''re pretty high.
>
> I''m trying to get a better understanding... on our NetApps we use
4K
> block sizes with their post-process deduplication and get pretty good
> dedupe ratios for VM content.
>
> Using ZFS we are using 128K record sizes by default, which nets us less
> impressive savings... however, to drop to a 4K record size would
> theoretically require that we have nearly 40GB of memory for only 1TB
> of storage (based on 150 bytes per block for the DDT).
>
> This obviously becomes prohibitively higher for 10+ TB file systems.
>
> I will note that our NetApps are using only 2TB FlexVols, but would
> like to better understand ZFS''s (apparently) higher memory
> requirements... or maybe I''m missing something entirely.
>
> Thanks,
> Ray
>
> [1] http://markmail.org/message/wile6kawka6qnjdw
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I''m not familiar with NetApp''s implementation, so I
can''t speak to why
it might appear to use less resources.

However, there are a couple of possible issues here:

(1)  Pre-write vs Post-write Deduplication.
         ZFS does pre-write dedup, where it looks for duplicates before 
it writes anything to disk.  In order to do pre-write dedup, you really 
have to store the ENTIRE deduplication block lookup table in some sort 
of fast (random) access media, realistically Flash or RAM.  The win is 
that you get significantly lower disk utilization (i.e. better I/O 
performance), as (potentially) much less data is actually written to disk.
         Post-write Dedup is done via batch processing - that is, such a 
design has the system periodically scan the saved data, looking for 
duplicates. While this method also greatly benefits from being able to 
store the dedup table in fast random storage, it''s not anywhere as 
critical. The downside here is that you see much higher disk utilization 
- the system must first write all new data to disk (without looking for 
dedup), and then must also perform significant I/O later on to do the dedup.

(2) Block size:  a 4k block size will yield better dedup than a 128k 
block size, presuming reasonable data turnover.  This is inherent, as 
any single bit change in a block will make it non-duplicated.  With 32x 
the block size, there is a much greater chance that a small change in 
data will require a large loss of dedup ratio.  That is, 4k blocks 
should almost always yield much better dedup ratios than larger ones. 
Also, remember that the ZFS block size is a SUGGESTION for zfs 
filesystems (i.e. it will use UP TO that block size, but not always that 
size), but is FIXED for zvols.

(3) Method of storing (and data stored in) the dedup table.
         ZFS''s current design is (IMHO) rather piggy on DDT and L2ARC 
lookup requirements. Right now, ZFS requires a record in the ARC (RAM) 
for each L2ARC (cache) entire, PLUS the actual L2ARC entry.  So, it 
boils down to 500+ bytes of combined L2ARC & RAM usage per block entry 
in the DDT.  Also, the actual DDT entry itself is perhaps larger than 
absolutely necessary.
         I suspect that NetApp does the following to limit their 
resource usage:   they presume the presence of some sort of cache that 
can be dedicated to the DDT (and, since they also control the hardware, 
they can make sure there is always one present).  Thus, they can make 
their code completely avoid the need for an equivalent to the ARC-based 
lookup.  In addition, I suspect they have a smaller DDT entry itself.  
Which boils down to probably needing 50% of the total resource 
consumption of ZFS, and NO (or extremely small, and fixed) RAM requirement.


Honestly, ZFS''s cache (L2ARC) requirements aren''t really a
problem. The
big issue is the ARC requirements, which, until they can be seriously 
reduced (or, best case, simply eliminated), really is a significant 
barrier to adoption of ZFS dedup.

Right now, ZFS treats DDT entries like any other data or metadata in how 
it ages from ARC to L2ARC to gone.  IMHO, the better way to do this is 
simply require the DDT to be entirely stored on the L2ARC (if present), 
and not ever keep any DDT info in the ARC at all (that is, the ARC 
should contain a pointer to the DDT in the L2ARC, and that''s it, 
regardless of the amount or frequency of access of the DDT).  Frankly, 
at this point, I''d almost change the design to REQUIRE a L2ARC device
in
order to turn on Dedup.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Ray Van Dolson

2011-May-04 21:54 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble
wrote:> On 5/4/2011 9:57 AM, Ray Van Dolson wrote:
> > There are a number of threads (this one[1] for example) that describe
> > memory requirements for deduplication.  They''re pretty high.
> >
> > I''m trying to get a better understanding... on our NetApps we
use 4K
> > block sizes with their post-process deduplication and get pretty good
> > dedupe ratios for VM content.
> >
> > Using ZFS we are using 128K record sizes by default, which nets us
less
> > impressive savings... however, to drop to a 4K record size would
> > theoretically require that we have nearly 40GB of memory for only 1TB
> > of storage (based on 150 bytes per block for the DDT).
> >
> > This obviously becomes prohibitively higher for 10+ TB file systems.
> >
> > I will note that our NetApps are using only 2TB FlexVols, but would
> > like to better understand ZFS''s (apparently) higher memory
> > requirements... or maybe I''m missing something entirely.
> >
> > Thanks,
> > Ray
> 
> I''m not familiar with NetApp''s implementation, so I
can''t speak to
> why it might appear to use less resources.
> 
> However, there are a couple of possible issues here:
> 
> (1)  Pre-write vs Post-write Deduplication.
>          ZFS does pre-write dedup, where it looks for duplicates before 
> it writes anything to disk.  In order to do pre-write dedup, you really 
> have to store the ENTIRE deduplication block lookup table in some sort 
> of fast (random) access media, realistically Flash or RAM.  The win is 
> that you get significantly lower disk utilization (i.e. better I/O 
> performance), as (potentially) much less data is actually written to disk.
>          Post-write Dedup is done via batch processing - that is, such a 
> design has the system periodically scan the saved data, looking for 
> duplicates. While this method also greatly benefits from being able to 
> store the dedup table in fast random storage, it''s not anywhere as
> critical. The downside here is that you see much higher disk utilization 
> - the system must first write all new data to disk (without looking for 
> dedup), and then must also perform significant I/O later on to do the
dedup.
Makes sense.
> (2) Block size:  a 4k block size will yield better dedup than a 128k 
> block size, presuming reasonable data turnover.  This is inherent, as 
> any single bit change in a block will make it non-duplicated.  With 32x 
> the block size, there is a much greater chance that a small change in 
> data will require a large loss of dedup ratio.  That is, 4k blocks 
> should almost always yield much better dedup ratios than larger ones. 
> Also, remember that the ZFS block size is a SUGGESTION for zfs 
> filesystems (i.e. it will use UP TO that block size, but not always that 
> size), but is FIXED for zvols.
> 
> (3) Method of storing (and data stored in) the dedup table.
>          ZFS''s current design is (IMHO) rather piggy on DDT and
L2ARC
> lookup requirements. Right now, ZFS requires a record in the ARC (RAM) 
> for each L2ARC (cache) entire, PLUS the actual L2ARC entry.  So, it 
> boils down to 500+ bytes of combined L2ARC & RAM usage per block entry 
> in the DDT.  Also, the actual DDT entry itself is perhaps larger than 
> absolutely necessary.
So the addition of L2ARC doesn''t necessarily reduce the need for
memory (at least not much if you''re talking about 500 bytes combined)?
I was hoping we could slap in 80GB''s of SSD L2ARC and get away with
"only" 16GB of RAM for example.
>          I suspect that NetApp does the following to limit their 
> resource usage:   they presume the presence of some sort of cache that 
> can be dedicated to the DDT (and, since they also control the hardware, 
> they can make sure there is always one present).  Thus, they can make 
> their code completely avoid the need for an equivalent to the ARC-based 
> lookup.  In addition, I suspect they have a smaller DDT entry itself.  
> Which boils down to probably needing 50% of the total resource 
> consumption of ZFS, and NO (or extremely small, and fixed) RAM requirement.
> 
> Honestly, ZFS''s cache (L2ARC) requirements aren''t really
a problem. The
> big issue is the ARC requirements, which, until they can be seriously 
> reduced (or, best case, simply eliminated), really is a significant 
> barrier to adoption of ZFS dedup.
> 
> Right now, ZFS treats DDT entries like any other data or metadata in how 
> it ages from ARC to L2ARC to gone.  IMHO, the better way to do this is 
> simply require the DDT to be entirely stored on the L2ARC (if present), 
> and not ever keep any DDT info in the ARC at all (that is, the ARC 
> should contain a pointer to the DDT in the L2ARC, and that''s it, 
> regardless of the amount or frequency of access of the DDT).  Frankly, 
> at this point, I''d almost change the design to REQUIRE a L2ARC
device in
> order to turn on Dedup.
Thanks for you response, Eric.  Very helpful.

Ray

Brandon High

2011-May-04 21:55 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 12:29 PM, Erik Trimble <erik.trimble at oracle.com>
wrote:> ? ? ? ?I suspect that NetApp does the following to limit their resource
> usage: ? they presume the presence of some sort of cache that can be
> dedicated to the DDT (and, since they also control the hardware, they can
> make sure there is always one present). ?Thus, they can make their code
AFAIK, NetApp has more restrictive requirements about how much data
can be dedup''d on each type of hardware.

See page 29 of http://media.netapp.com/documents/tr-3505.pdf - Smaller
pieces of hardware can only dedup 1TB volumes, and even the big-daddy
filers will only dedup up to 16TB per volume, even if the volume size
is 32TB (the largest volume available for dedup).

NetApp solves the problem by putting rigid constraints around the
problem, whereas ZFS lets you enable dedup for any size dataset. Both
approaches have limitations, and it sucks when you hit them.

-B

-- 
Brandon High : bhigh at freaks.com

Erik Trimble

2011-May-04 22:49 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On 5/4/2011 2:54 PM, Ray Van Dolson wrote:> On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble wrote:
>> (2) Block size:  a 4k block size will yield better dedup than a 128k
>> block size, presuming reasonable data turnover.  This is inherent, as
>> any single bit change in a block will make it non-duplicated.  With 32x
>> the block size, there is a much greater chance that a small change in
>> data will require a large loss of dedup ratio.  That is, 4k blocks
>> should almost always yield much better dedup ratios than larger ones.
>> Also, remember that the ZFS block size is a SUGGESTION for zfs
>> filesystems (i.e. it will use UP TO that block size, but not always
that
>> size), but is FIXED for zvols.
>>
>> (3) Method of storing (and data stored in) the dedup table.
>>           ZFS''s current design is (IMHO) rather piggy on DDT
and L2ARC
>> lookup requirements. Right now, ZFS requires a record in the ARC (RAM)
>> for each L2ARC (cache) entire, PLUS the actual L2ARC entry.  So, it
>> boils down to 500+ bytes of combined L2ARC&  RAM usage per block
entry
>> in the DDT.  Also, the actual DDT entry itself is perhaps larger than
>> absolutely necessary.
> So the addition of L2ARC doesn''t necessarily reduce the need for
> memory (at least not much if you''re talking about 500 bytes
combined)?
> I was hoping we could slap in 80GB''s of SSD L2ARC and get away
with
> "only" 16GB of RAM for example.
It reduces *somewhat* the need for RAM.  Basically, if you have no L2ARC 
cache device, the DDT must be stored in RAM.  That''s about 376 bytes
per
dedup block.

If you have an L2ARC cache device, then the ARC must contain a reference 
to every DDT entry stored in the L2ARC, which consumes 176 bytes per DDT 
entry reference.

So, adding a L2ARC reduces the ARC consumption by about 55%.

Of course, the other benefit from a L2ARC is the data/metadata caching, 
which is likely worth it just by itself.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Ray Van Dolson

2011-May-04 23:14 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High
wrote:> On Wed, May 4, 2011 at 12:29 PM, Erik Trimble <erik.trimble at
oracle.com> wrote:
> > ? ? ? ?I suspect that NetApp does the following to limit their
resource
> > usage: ? they presume the presence of some sort of cache that can be
> > dedicated to the DDT (and, since they also control the hardware, they
can
> > make sure there is always one present). ?Thus, they can make their
code
> 
> AFAIK, NetApp has more restrictive requirements about how much data
> can be dedup''d on each type of hardware.
> 
> See page 29 of http://media.netapp.com/documents/tr-3505.pdf - Smaller
> pieces of hardware can only dedup 1TB volumes, and even the big-daddy
> filers will only dedup up to 16TB per volume, even if the volume size
> is 32TB (the largest volume available for dedup).
> 
> NetApp solves the problem by putting rigid constraints around the
> problem, whereas ZFS lets you enable dedup for any size dataset. Both
> approaches have limitations, and it sucks when you hit them.
> 
> -B
That is very true, although worth mentioning you can have quite a few
of the dedupe/SIS enabled FlexVols on even the lower-end filers (our
FAS2050 has a bunch of 2TB SIS enabled FlexVols).

The FAS2050 of course has a fairly small memory footprint... 

I do like the additional flexibility you have with ZFS, just trying to
get a handle on the memory requirements.

Are any of you out there using dedupe ZFS file systems to store VMware
VMDK (or any VM tech. really)?  Curious what recordsize you use and
what your hardware specs / experiences have been.

Ray

Ray Van Dolson

2011-May-04 23:17 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 04, 2011 at 03:49:12PM -0700, Erik Trimble
wrote:> On 5/4/2011 2:54 PM, Ray Van Dolson wrote:
> > On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble wrote:
> >> (2) Block size:  a 4k block size will yield better dedup than a
128k
> >> block size, presuming reasonable data turnover.  This is inherent,
as
> >> any single bit change in a block will make it non-duplicated. 
With 32x
> >> the block size, there is a much greater chance that a small change
in
> >> data will require a large loss of dedup ratio.  That is, 4k blocks
> >> should almost always yield much better dedup ratios than larger
ones.
> >> Also, remember that the ZFS block size is a SUGGESTION for zfs
> >> filesystems (i.e. it will use UP TO that block size, but not
always that
> >> size), but is FIXED for zvols.
> >>
> >> (3) Method of storing (and data stored in) the dedup table.
> >>           ZFS''s current design is (IMHO) rather piggy on
DDT and L2ARC
> >> lookup requirements. Right now, ZFS requires a record in the ARC
(RAM)
> >> for each L2ARC (cache) entire, PLUS the actual L2ARC entry.  So,
it
> >> boils down to 500+ bytes of combined L2ARC&  RAM usage per
block entry
> >> in the DDT.  Also, the actual DDT entry itself is perhaps larger
than
> >> absolutely necessary.
> > So the addition of L2ARC doesn''t necessarily reduce the need
for
> > memory (at least not much if you''re talking about 500 bytes
combined)?
> > I was hoping we could slap in 80GB''s of SSD L2ARC and get
away with
> > "only" 16GB of RAM for example.
> 
> It reduces *somewhat* the need for RAM.  Basically, if you have no L2ARC 
> cache device, the DDT must be stored in RAM.  That''s about 376
bytes per
> dedup block.
> 
> If you have an L2ARC cache device, then the ARC must contain a reference 
> to every DDT entry stored in the L2ARC, which consumes 176 bytes per DDT 
> entry reference.
> 
> So, adding a L2ARC reduces the ARC consumption by about 55%.
> 
> Of course, the other benefit from a L2ARC is the data/metadata caching, 
> which is likely worth it just by itself.
Great info.  Thanks Erik.

For dedupe workloads on larger file systems (8TB+), I wonder if makes
sense to use SLC / enterprise class SSD (or better) devices for L2ARC
instead of lower-end MLC stuff?  Seems like we''d be seeing more writes
to the device than in a non-dedupe scenario.

Thanks,
Ray

Erik Trimble

2011-May-04 23:36 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On 5/4/2011 4:14 PM, Ray Van Dolson wrote:> On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:
>> On Wed, May 4, 2011 at 12:29 PM, Erik Trimble<erik.trimble at
oracle.com>  wrote:
>>>         I suspect that NetApp does the following to limit their
resource
>>> usage:   they presume the presence of some sort of cache that can
be
>>> dedicated to the DDT (and, since they also control the hardware,
they can
>>> make sure there is always one present).  Thus, they can make their
code
>> AFAIK, NetApp has more restrictive requirements about how much data
>> can be dedup''d on each type of hardware.
>>
>> See page 29 of http://media.netapp.com/documents/tr-3505.pdf - Smaller
>> pieces of hardware can only dedup 1TB volumes, and even the big-daddy
>> filers will only dedup up to 16TB per volume, even if the volume size
>> is 32TB (the largest volume available for dedup).
>>
>> NetApp solves the problem by putting rigid constraints around the
>> problem, whereas ZFS lets you enable dedup for any size dataset. Both
>> approaches have limitations, and it sucks when you hit them.
>>
>> -B
> That is very true, although worth mentioning you can have quite a few
> of the dedupe/SIS enabled FlexVols on even the lower-end filers (our
> FAS2050 has a bunch of 2TB SIS enabled FlexVols).
>Stupid question - can you hit all the various SIS volumes at once, and 
not get horrid performance penalties?

If so, I''m almost certain NetApp is doing post-write dedup.  That way, 
the strictly controlled max FlexVol size helps with keeping the resource 
limits down, as it will be able to round-robin the post-write dedup to 
each FlexVol in turn.

ZFS''s problem is that it needs ALL the resouces for EACH pool ALL the 
time, and can''t really share them well if it expects to keep
performance
from tanking... (no pun intended)
> The FAS2050 of course has a fairly small memory footprint...
>
> I do like the additional flexibility you have with ZFS, just trying to
> get a handle on the memory requirements.
>
> Are any of you out there using dedupe ZFS file systems to store VMware
> VMDK (or any VM tech. really)?  Curious what recordsize you use and
> what your hardware specs / experiences have been.
>
> Ray
Right now, I use it for my Solaris 8 containers and VirtualBox images.  
the VB images are mostly Windows (XP and Win2003).

I tend to put the OS image in one VMdisk, and my scratch disks in 
another. That is, I generally don''t want my apps writing much to my OS 
images. My scratch/data disks aren''t dedup.

Overall, I''m running about 30 deduped images served out over NFS. My 
recordsize is set to 128k, but, given that they''re OS images, my actual
disk block usage has a significant 4k presence.  One way I reduced this 
initally was to have the VMdisk image stored on local disk, then copied 
the *entire* image to the ZFS server, so the server saw a single large 
file, which meant it tended to write full 128k blocks.  Do note, that my 
30 images only takes about 20GB of actual space, after dedup. I figure 
about 5GB of dedup space per OS type (and, I have 4 different setups).

My data VMdisks, however, chew though about 4TB of disk space, which is 
nondeduped. I''m still trying to determine if I''m better off
serving
those data disks as NFS mounts to my clients, or as VMdisk images 
available over iSCSI or NFS.  Right now, I''m doing VMdisks over NFS.

The setup I''m using is an older X4200 (non-M2), with 3rd-party SSDs as 
L2ARC, hooked to an old 3500FC array. It has 8GB of RAM in total, and 
runs just fine with that.  I definitely am going to upgrade to something 
much larger in the near future, since I expect to up my number of VM 
images by at least a factor of 5.

That all said, if you''re relatively careful about separating OS
installs
from active data, you can get really impressive dedup ratios using a 
relatively small amount of actual space.    In my case, I expect to 
eventually be serving about 10 different configs out to a total of maybe 
100 clients, and probably never exceed 100GB max on the deduped end. 
Which means that I''ll be able to get away with 16GB of RAM for the
whole
server, comfortably.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Tim Cook

2011-May-04 23:44 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 6:36 PM, Erik Trimble <erik.trimble at
oracle.com>wrote:
> On 5/4/2011 4:14 PM, Ray Van Dolson wrote:
>
>> On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:
>>
>>> On Wed, May 4, 2011 at 12:29 PM, Erik Trimble<erik.trimble at
oracle.com>
>>>  wrote:
>>>
>>>>        I suspect that NetApp does the following to limit their
resource
>>>> usage:   they presume the presence of some sort of cache that
can be
>>>> dedicated to the DDT (and, since they also control the
hardware, they
>>>> can
>>>> make sure there is always one present).  Thus, they can make
their code
>>>>
>>> AFAIK, NetApp has more restrictive requirements about how much data
>>> can be dedup''d on each type of hardware.
>>>
>>> See page 29 of http://media.netapp.com/documents/tr-3505.pdf -
Smaller
>>> pieces of hardware can only dedup 1TB volumes, and even the
big-daddy
>>> filers will only dedup up to 16TB per volume, even if the volume
size
>>> is 32TB (the largest volume available for dedup).
>>>
>>> NetApp solves the problem by putting rigid constraints around the
>>> problem, whereas ZFS lets you enable dedup for any size dataset.
Both
>>> approaches have limitations, and it sucks when you hit them.
>>>
>>> -B
>>>
>> That is very true, although worth mentioning you can have quite a few
>> of the dedupe/SIS enabled FlexVols on even the lower-end filers (our
>> FAS2050 has a bunch of 2TB SIS enabled FlexVols).
>>
>>  Stupid question - can you hit all the various SIS volumes at once, and
> not get horrid performance penalties?
>
> If so, I''m almost certain NetApp is doing post-write dedup.  That
way, the
> strictly controlled max FlexVol size helps with keeping the resource limits
> down, as it will be able to round-robin the post-write dedup to each
FlexVol
> in turn.
>
> ZFS''s problem is that it needs ALL the resouces for EACH pool ALL
the time,
> and can''t really share them well if it expects to keep performance
from
> tanking... (no pun intended)
>
>On a 2050?  Probably not.  It''s got a single-core mobile celeron CPU
and
2GB/ram.  You couldn''t even run ZFS on that box, much less ZFS+dedup. 
Can
you do it on a model that isn''t 4 years old without tanking
performance?
 Absolutely.

Outside of those two 2000 series, the reason there are dedup limits
isn''t
performance.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110504/f596d052/attachment.html>

Erik Trimble

2011-May-04 23:44 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On 5/4/2011 4:17 PM, Ray Van Dolson wrote:> On Wed, May 04, 2011 at 03:49:12PM -0700, Erik Trimble wrote:
>> On 5/4/2011 2:54 PM, Ray Van Dolson wrote:
>>> On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble wrote:
>>>> (2) Block size:  a 4k block size will yield better dedup than a
128k
>>>> block size, presuming reasonable data turnover.  This is
inherent, as
>>>> any single bit change in a block will make it non-duplicated. 
With 32x
>>>> the block size, there is a much greater chance that a small
change in
>>>> data will require a large loss of dedup ratio.  That is, 4k
blocks
>>>> should almost always yield much better dedup ratios than larger
ones.
>>>> Also, remember that the ZFS block size is a SUGGESTION for zfs
>>>> filesystems (i.e. it will use UP TO that block size, but not
always that
>>>> size), but is FIXED for zvols.
>>>>
>>>> (3) Method of storing (and data stored in) the dedup table.
>>>>            ZFS''s current design is (IMHO) rather piggy
on DDT and L2ARC
>>>> lookup requirements. Right now, ZFS requires a record in the
ARC (RAM)
>>>> for each L2ARC (cache) entire, PLUS the actual L2ARC entry. 
So, it
>>>> boils down to 500+ bytes of combined L2ARC&   RAM usage per
block entry
>>>> in the DDT.  Also, the actual DDT entry itself is perhaps
larger than
>>>> absolutely necessary.
>>> So the addition of L2ARC doesn''t necessarily reduce the
need for
>>> memory (at least not much if you''re talking about 500
bytes combined)?
>>> I was hoping we could slap in 80GB''s of SSD L2ARC and get
away with
>>> "only" 16GB of RAM for example.
>> It reduces *somewhat* the need for RAM.  Basically, if you have no
L2ARC
>> cache device, the DDT must be stored in RAM.  That''s about 376
bytes per
>> dedup block.
>>
>> If you have an L2ARC cache device, then the ARC must contain a
reference
>> to every DDT entry stored in the L2ARC, which consumes 176 bytes per
DDT
>> entry reference.
>>
>> So, adding a L2ARC reduces the ARC consumption by about 55%.
>>
>> Of course, the other benefit from a L2ARC is the data/metadata caching,
>> which is likely worth it just by itself.
> Great info.  Thanks Erik.
>
> For dedupe workloads on larger file systems (8TB+), I wonder if makes
> sense to use SLC / enterprise class SSD (or better) devices for L2ARC
> instead of lower-end MLC stuff?  Seems like we''d be seeing more
writes
> to the device than in a non-dedupe scenario.
>
> Thanks,
> RayI''m using Enterprise-class MLC drives (without a supercap), and they 
work fine with dedup.  I''d have to test, but I don''t think
that the
increase in write is that much, so I don''t expect a SLC to really make 
much of a difference. (fill rate of the L2ARC is limited, so I can''t 
imaging we''d bump up against the MLC''s limits)

-- 

Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Erik Trimble

2011-May-04 23:51 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On 5/4/2011 4:44 PM, Tim Cook wrote:>
>
> On Wed, May 4, 2011 at 6:36 PM, Erik Trimble <erik.trimble at oracle.com
> <mailto:erik.trimble at oracle.com>> wrote:
>
>     On 5/4/2011 4:14 PM, Ray Van Dolson wrote:
>
>         On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:
>
>             On Wed, May 4, 2011 at 12:29 PM, Erik
>             Trimble<erik.trimble at oracle.com
>             <mailto:erik.trimble at oracle.com>>  wrote:
>
>                        I suspect that NetApp does the following to
>                 limit their resource
>                 usage:   they presume the presence of some sort of
>                 cache that can be
>                 dedicated to the DDT (and, since they also control the
>                 hardware, they can
>                 make sure there is always one present).  Thus, they
>                 can make their code
>
>             AFAIK, NetApp has more restrictive requirements about how
>             much data
>             can be dedup''d on each type of hardware.
>
>             See page 29 of
>             http://media.netapp.com/documents/tr-3505.pdf - Smaller
>             pieces of hardware can only dedup 1TB volumes, and even
>             the big-daddy
>             filers will only dedup up to 16TB per volume, even if the
>             volume size
>             is 32TB (the largest volume available for dedup).
>
>             NetApp solves the problem by putting rigid constraints
>             around the
>             problem, whereas ZFS lets you enable dedup for any size
>             dataset. Both
>             approaches have limitations, and it sucks when you hit them.
>
>             -B
>
>         That is very true, although worth mentioning you can have
>         quite a few
>         of the dedupe/SIS enabled FlexVols on even the lower-end
>         filers (our
>         FAS2050 has a bunch of 2TB SIS enabled FlexVols).
>
>     Stupid question - can you hit all the various SIS volumes at once,
>     and not get horrid performance penalties?
>
>     If so, I''m almost certain NetApp is doing post-write dedup. 
That
>     way, the strictly controlled max FlexVol size helps with keeping
>     the resource limits down, as it will be able to round-robin the
>     post-write dedup to each FlexVol in turn.
>
>     ZFS''s problem is that it needs ALL the resouces for EACH pool
ALL
>     the time, and can''t really share them well if it expects to
keep
>     performance from tanking... (no pun intended)
>
>
> On a 2050?  Probably not.  It''s got a single-core mobile celeron
CPU
> and 2GB/ram.  You couldn''t even run ZFS on that box, much less 
> ZFS+dedup.  Can you do it on a model that isn''t 4 years old
without
> tanking performance?  Absolutely.
>
> Outside of those two 2000 series, the reason there are dedup limits 
> isn''t performance.
>
> --Tim
>Indirectly, yes, it''s performance, since NetApp has plainly chosen 
post-write dedup as a method to restrict the required hardware 
capabilities.  The dedup limits on Volsize are almost certainly driven 
by the local RAM requirements for post-write dedup.

It also looks like NetApp isn''t providing for a dedicated DDT cache, 
which means that when the NetApp is doing dedup, it''s consuming the 
normal filesystem cache (i.e. chewing through RAM).  Frankly, I''d be 
very surprised if you didn''t see a noticeable performance hit during
the
period that the NetApp appliance is performing the dedup scans.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110504/d5323b92/attachment-0001.html>

Ray Van Dolson

2011-May-04 23:59 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 04, 2011 at 04:51:36PM -0700, Erik Trimble
wrote:> On 5/4/2011 4:44 PM, Tim Cook wrote:
> 
> 
> 
>     On Wed, May 4, 2011 at 6:36 PM, Erik Trimble <erik.trimble at
oracle.com>
>     wrote:
> 
>         On 5/4/2011 4:14 PM, Ray Van Dolson wrote:
> 
>             On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:
> 
>                 On Wed, May 4, 2011 at 12:29 PM, Erik Trimble<
>                 erik.trimble at oracle.com>  wrote:
> 
>                            I suspect that NetApp does the following to
limit
>                     their resource
>                     usage:   they presume the presence of some sort of
cache
>                     that can be
>                     dedicated to the DDT (and, since they also control the
>                     hardware, they can
>                     make sure there is always one present).  Thus, they can
>                     make their code
> 
>                 AFAIK, NetApp has more restrictive requirements about how
much
>                 data
>                 can be dedup''d on each type of hardware.
> 
>                 See page 29 of
http://media.netapp.com/documents/tr-3505.pdf -
>                 Smaller
>                 pieces of hardware can only dedup 1TB volumes, and even the
>                 big-daddy
>                 filers will only dedup up to 16TB per volume, even if the
>                 volume size
>                 is 32TB (the largest volume available for dedup).
> 
>                 NetApp solves the problem by putting rigid constraints
around
>                 the
>                 problem, whereas ZFS lets you enable dedup for any size
>                 dataset. Both
>                 approaches have limitations, and it sucks when you hit
them.
> 
>                 -B
> 
>             That is very true, although worth mentioning you can have quite
a
>             few
>             of the dedupe/SIS enabled FlexVols on even the lower-end filers
>             (our
>             FAS2050 has a bunch of 2TB SIS enabled FlexVols).
> 
> 
>         Stupid question - can you hit all the various SIS volumes at once,
and
>         not get horrid performance penalties?
> 
>         If so, I''m almost certain NetApp is doing post-write
dedup.  That way,
>         the strictly controlled max FlexVol size helps with keeping the
>         resource limits down, as it will be able to round-robin the
post-write
>         dedup to each FlexVol in turn.
> 
>         ZFS''s problem is that it needs ALL the resouces for EACH
pool ALL the
>         time, and can''t really share them well if it expects to
keep
>         performance from tanking... (no pun intended)
> 
> 
> 
>     On a 2050?  Probably not.  It''s got a single-core mobile
celeron CPU and
>     2GB/ram.  You couldn''t even run ZFS on that box, much less
ZFS+dedup.  Can
>     you do it on a model that isn''t 4 years old without tanking
performance?
>      Absolutely.
> 
>     Outside of those two 2000 series, the reason there are dedup limits
isn''t
>     performance. 
> 
>     --Tim
> 
> 
> Indirectly, yes, it''s performance, since NetApp has plainly chosen
> post-write dedup as a method to restrict the required hardware
> capabilities.  The dedup limits on Volsize are almost certainly
> driven by the local RAM requirements for post-write dedup.
> 
> It also looks like NetApp isn''t providing for a dedicated DDT
cache,
> which means that when the NetApp is doing dedup, it''s consuming
the
> normal filesystem cache (i.e. chewing through RAM).  Frankly, I''d
be
> very surprised if you didn''t see a noticeable performance hit
during
> the period that the NetApp appliance is performing the dedup scans.
Yep, when the dedupe process runs, there is a drop in performance
(hence we usually schedule it to run off-peak hours).  Obviously this
is a luxury that wouldn''t be an option in every environment...

During normal operations outside of the dedupe period we haven''t
noticed a performance hit.  I don''t think we hit the filer too hard
however -- it''s acting as a VMware datastore and only a few of the
VM''s
have higher I/O footprints.

It is a 2050C however so we spread the load across the two filer heads
(although we occasionally run everything on one head when performing
maintenance on the other).

Ray

Brandon High

2011-May-05 00:11 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 4:36 PM, Erik Trimble <erik.trimble at oracle.com>
wrote:> If so, I''m almost certain NetApp is doing post-write dedup. ?That
way, the
> strictly controlled max FlexVol size helps with keeping the resource limits
> down, as it will be able to round-robin the post-write dedup to each
FlexVol
> in turn.
They are, its in their docs. A volume is dedup''d when 20% of
non-deduped data is added to it, or something similar. 8 volumes can
be processed at once though, I believe, and it could be that weaker
systems are not able to do as many in parallel.
> block usage has a significant 4k presence. ?One way I reduced this initally
> was to have the VMdisk image stored on local disk, then copied the *entire*
> image to the ZFS server, so the server saw a single large file, which meant
> it tended to write full 128k blocks. ?Do note, that my 30 images only takes
Wouldn''t you have been better off cloning datasets that contain an
unconfigured install and customizing from there?

-B

-- 
Brandon High : bhigh at freaks.com

Tim Cook

2011-May-05 00:15 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 6:51 PM, Erik Trimble <erik.trimble at
oracle.com>wrote:
>  On 5/4/2011 4:44 PM, Tim Cook wrote:
>
>
>
> On Wed, May 4, 2011 at 6:36 PM, Erik Trimble <erik.trimble at
oracle.com>wrote:
>
>> On 5/4/2011 4:14 PM, Ray Van Dolson wrote:
>>
>>> On Wed, May 04, 2011 at 02:55:55PM -0700, Brandon High wrote:
>>>
>>>> On Wed, May 4, 2011 at 12:29 PM, Erik Trimble<erik.trimble
at oracle.com>
>>>>  wrote:
>>>>
>>>>>        I suspect that NetApp does the following to limit
their resource
>>>>> usage:   they presume the presence of some sort of cache
that can be
>>>>> dedicated to the DDT (and, since they also control the
hardware, they
>>>>> can
>>>>> make sure there is always one present).  Thus, they can
make their code
>>>>>
>>>> AFAIK, NetApp has more restrictive requirements about how much
data
>>>> can be dedup''d on each type of hardware.
>>>>
>>>> See page 29 of http://media.netapp.com/documents/tr-3505.pdf -
Smaller
>>>> pieces of hardware can only dedup 1TB volumes, and even the
big-daddy
>>>> filers will only dedup up to 16TB per volume, even if the
volume size
>>>> is 32TB (the largest volume available for dedup).
>>>>
>>>> NetApp solves the problem by putting rigid constraints around
the
>>>> problem, whereas ZFS lets you enable dedup for any size
dataset. Both
>>>> approaches have limitations, and it sucks when you hit them.
>>>>
>>>> -B
>>>>
>>> That is very true, although worth mentioning you can have quite a
few
>>> of the dedupe/SIS enabled FlexVols on even the lower-end filers
(our
>>> FAS2050 has a bunch of 2TB SIS enabled FlexVols).
>>>
>>>  Stupid question - can you hit all the various SIS volumes at once,
and
>> not get horrid performance penalties?
>>
>> If so, I''m almost certain NetApp is doing post-write dedup. 
That way, the
>> strictly controlled max FlexVol size helps with keeping the resource
limits
>> down, as it will be able to round-robin the post-write dedup to each
FlexVol
>> in turn.
>>
>> ZFS''s problem is that it needs ALL the resouces for EACH pool
ALL the
>> time, and can''t really share them well if it expects to keep
performance
>> from tanking... (no pun intended)
>>
>>
>  On a 2050?  Probably not.  It''s got a single-core mobile celeron
CPU and
> 2GB/ram.  You couldn''t even run ZFS on that box, much less
ZFS+dedup.  Can
> you do it on a model that isn''t 4 years old without tanking
performance?
>  Absolutely.
>
>  Outside of those two 2000 series, the reason there are dedup limits
isn''t
> performance.
>
>  --Tim
>
>  Indirectly, yes, it''s performance, since NetApp has plainly
chosen
> post-write dedup as a method to restrict the required hardware
> capabilities.  The dedup limits on Volsize are almost certainly driven by
> the local RAM requirements for post-write dedup.
>
> It also looks like NetApp isn''t providing for a dedicated DDT
cache, which
> means that when the NetApp is doing dedup, it''s consuming the
normal
> filesystem cache (i.e. chewing through RAM).  Frankly, I''d be very
surprised
> if you didn''t see a noticeable performance hit during the period
that the
> NetApp appliance is performing the dedup scans.
>
>
Again, it depends on the model/load/etc.  The smallest models will see
performance hits for sure.  If the vol size limits are strictly a matter of
ram, why exactly would they jump from 4TB to 16TB on a 3140 by simply
upgrading ONTAP?  If the limits haven''t gone up on, at the very least,
every
one of the x2xx systems 12 months from now, feel free to dig up the thread
and give an I-told-you-so.  I''m quite confident that won''t be
the case.  The
16TB limit SCREAMS to me that it''s a holdover from the same 32bit limit
that
causes 32-bit volumes to have a 16TB limit.  I''m quite confident
they''re
just taking the cautious approach on moving to 64bit dedup code.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110504/db2d1f95/attachment.html>

Erik Trimble

2011-May-05 00:20 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On 5/4/2011 5:11 PM, Brandon High wrote:> On Wed, May 4, 2011 at 4:36 PM, Erik Trimble<erik.trimble at
oracle.com>  wrote:
>> If so, I''m almost certain NetApp is doing post-write dedup. 
That way, the
>> strictly controlled max FlexVol size helps with keeping the resource
limits
>> down, as it will be able to round-robin the post-write dedup to each
FlexVol
>> in turn.
> They are, its in their docs. A volume is dedup''d when 20% of
> non-deduped data is added to it, or something similar. 8 volumes can
> be processed at once though, I believe, and it could be that weaker
> systems are not able to do as many in parallel.
>Sounds rational.
>> block usage has a significant 4k presence.  One way I reduced this
initally
>> was to have the VMdisk image stored on local disk, then copied the
*entire*
>> image to the ZFS server, so the server saw a single large file, which
meant
>> it tended to write full 128k blocks.  Do note, that my 30 images only
takes
> Wouldn''t you have been better off cloning datasets that contain an
> unconfigured install and customizing from there?
>
> -BGiven that my "OS" installs include a fair amount of 3rd-party add-ons
(compilers, SDKs, et al), I generally find the best method for me is to 
fully configure a client (with the VMdisk on local storage), then copy 
that VMdisk to the ZFS server as a "golden image".  I can then clone 
that image for my other clients of that type, and only have to change 
the network information.

Initially, each new VM image consumes about 1MB of space. :-)

Overall, I''ve found that as I have to patch each image, it''s
worth-while
to take a new "golden-image" snapshot every so often, and then 
reconfigure each client machine again from that new Golden image. I''m 
sure I could do some optimization here, but the method works well enough.


What you want to avoid is having the OS image written to, and waiting 
for any other configuration and customization to happen AFTER it was 
placed on the ZFS server is sub-optimal.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

Edward Ned Harvey

2011-May-05 03:15 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Erik Trimble
> 
> ZFS''s problem is that it needs ALL the resouces for EACH pool ALL
the
> time, and can''t really share them well if it expects to keep
performance
> from tanking... (no pun intended)
That''s true, but on the flipside, if you don''t have adequate
resources
dedicated all the time, it means performance is unsustainable.  Anything
which is going to do post-write dedup will necessarily have degraded
performance on a periodic basis.  This is in *addition* to all your scrubs
and backups and so on.

Edward Ned Harvey

2011-May-05 03:23 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Ray Van Dolson
>  
> Are any of you out there using dedupe ZFS file systems to store VMware
> VMDK (or any VM tech. really)?  Curious what recordsize you use and
> what your hardware specs / experiences have been.
Generally speaking, dedup doesn''t work on VM images.  (Same is true for
ZFS
or netapp or anything else.)  Because the VM images are all going to have
their own filesystems internally with whatever blocksize is relevant to the
guest OS.  If the virtual blocks in the VM don''t align with the ZFS (or
whatever FS) host blocks...  Then even when you write duplicated data inside
the guest, the host won''t see it as a duplicated block.

There are some situations where dedup may help on VM images...  For example
if you''re not using sparse files and you have a zero-filed disk...  But
in
that case, you should probably just use a sparse file instead...  Or ...  If
you have a "golden" image that you''re copying all over the
place ... but in
that case, you should probably just use clones instead...

Or if you''re intimately familiar with both the guest & host
filesystems, and
you choose blocksizes carefully to make them align.  But that seems
complicated and likely to fail.

Tim Cook

2011-May-05 03:25 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 10:15 PM, Edward Ned Harvey <
opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:
> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> > bounces at opensolaris.org] On Behalf Of Erik Trimble
> >
> > ZFS''s problem is that it needs ALL the resouces for EACH pool
ALL the
> > time, and can''t really share them well if it expects to keep
performance
> > from tanking... (no pun intended)
>
> That''s true, but on the flipside, if you don''t have
adequate resources
> dedicated all the time, it means performance is unsustainable.  Anything
> which is going to do post-write dedup will necessarily have degraded
> performance on a periodic basis.  This is in *addition* to all your scrubs
> and backups and so on.
>
>
>AGAIN, you''re assuming that all system resources are used all the time
and
can''t possibly go anywhere else.  This is absolutely false.  If someone
is
running a system at 99% capacity 24/7, perhaps that might be a factual
statement.  I''d argue if someone is running the system 99% all of the
time,
the system is grossly undersized for the workload.  How can you EVER expect
a highly available system to run 99% on both nodes (all nodes in a vmax/vsp
scenario) and ever be able to fail over?  Either a home-brew Opensolaris
Cluster, Oracle 7000 cluster, or NetApp?

I''m gathering that this list in general has a lack of understanding of
how
NetApp does things.  If you don''t know for a fact how it works, stop
jumping
to conclusions on how you think it works.  I know for a fact that short of
the guys currently/previously writing the code at NetApp, there''s a
handful
of people in the entire world who know (factually) how they''re
allocating
resources from soup to nuts.

As far as this discussion is concerned, there''s only two points that
matter:
They''ve got dedup on primary storage, it works in the field.  The rest
is
just static that doesn''t matter.  Let''s focus on how to make
ZFS better
instead of trying to guess how others are making it work, especially when
they''ve got a completely different implementation.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110504/a68fba7d/attachment.html>

Tim Cook

2011-May-05 03:26 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 10:23 PM, Edward Ned Harvey <
opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:
> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> > bounces at opensolaris.org] On Behalf Of Ray Van Dolson
> >
> > Are any of you out there using dedupe ZFS file systems to store VMware
> > VMDK (or any VM tech. really)?  Curious what recordsize you use and
> > what your hardware specs / experiences have been.
>
> Generally speaking, dedup doesn''t work on VM images.  (Same is
true for ZFS
> or netapp or anything else.)  Because the VM images are all going to have
> their own filesystems internally with whatever blocksize is relevant to the
> guest OS.  If the virtual blocks in the VM don''t align with the
ZFS (or
> whatever FS) host blocks...  Then even when you write duplicated data
> inside
> the guest, the host won''t see it as a duplicated block.
>
> There are some situations where dedup may help on VM images...  For example
> if you''re not using sparse files and you have a zero-filed disk...
But in
> that case, you should probably just use a sparse file instead...  Or ...
>  If
> you have a "golden" image that you''re copying all over
the place ... but in
> that case, you should probably just use clones instead...
>
> Or if you''re intimately familiar with both the guest & host
filesystems,
> and
> you choose blocksizes carefully to make them align.  But that seems
> complicated and likely to fail.
>
>
>That''s patently false.  VM images are the absolute best use-case for
dedup
outside of backup workloads.  I''m not sure who told you/where you got
the
idea that VM images are not ripe for dedup, but it''s wrong.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110504/60f1f8f2/attachment-0001.html>

Edward Ned Harvey

2011-May-05 03:45 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

> From: Tim Cook [mailto:tim at cook.ms]
> 
> > ZFS''s problem is that it needs ALL the resouces for EACH pool
ALL the
> > time, and can''t really share them well if it expects to keep
performance
> > from tanking... (no pun intended)
> That''s true, but on the flipside, if you don''t have
adequate resources
> dedicated all the time, it means performance is unsustainable. ?Anything
> which is going to do post-write dedup will necessarily have degraded
> performance on a periodic basis. ?This is in *addition* to all your scrubs
> and backups and so on.
> 
> 
> AGAIN, you''re assuming that all system resources are used all the
time and
> can''t possibly go anywhere else.? This is absolutely false.? If
someone is
> running a system at 99% capacity 24/7, perhaps that might be a factual
> statement.? I''d argue if someone is running the system 99% all of
the
time,> the system is grossly undersized for the workload.? 
Well, here is my situation:  I do IT for a company whose workload is very
spiky.  For weeks at a time, the system will be 99% idle.  Then when the
engineers have a deadline to meet, they will expand and consume all
available resources, no matter how much you give them.  So they will keep
all systems 99% busy for a month at a time.  After the deadline passes, they
drop back down to 99% idle.

The work is IO intensive so it''s not appropriate for something like the
cloud.

> I''m gathering that this list in general has a lack of
understanding of how
> NetApp does things.? If you don''t know for a fact how it works,
stop
jumping> to conclusions on how you think it works.? I know for a fact that short ofthe

I''m a little confused by this rant.  Cuz I didn''t say anything
about netapp.

Edward Ned Harvey

2011-May-05 03:49 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

> From: Tim Cook [mailto:tim at cook.ms]
> 
> That''s patently false.? VM images are the absolute best use-case
for dedup
> outside of backup workloads.? I''m not sure who told you/where you
got the
> idea that VM images are not ripe for dedup, but it''s wrong.
Well, I got that idea from this list.  I said a little bit about why I
believed it was true ... about dedup being ineffective for VM''s ...
Would
you care to describe a use case where dedup would be effective for a VM?  Or
perhaps cite something specific, instead of just wiping the whole thing and
saying "patently false?"  I don''t feel like this comment was
productive...

Garrett D''Amore

2011-May-05 05:31 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

We have customers using dedup with lots of vm images... in one extreme case they
are getting dedup ratios of over 200:1!

You don''t need dedup or sparse files for zero filling.  Simple zle
compression will eliminate those for you far more efficiently and without
needing massive amounts of ram.

Our customers have the ability to access our systems engineers to design the
solution for their needs.  If you are serious about doing this stuff right, work
with someone like Nexenta that can engineer a complete solution instead of
trying to figure out which of us on this forum are quacks and which are cracks. 
:)

Tim Cook <tim at cook.ms> wrote:
>On Wed, May 4, 2011 at 10:23 PM, Edward Ned Harvey <
>opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:
>
>> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
>> > bounces at opensolaris.org] On Behalf Of Ray Van Dolson
>> >
>> > Are any of you out there using dedupe ZFS file systems to store
VMware
>> > VMDK (or any VM tech. really)?  Curious what recordsize you use
and
>> > what your hardware specs / experiences have been.
>>
>> Generally speaking, dedup doesn''t work on VM images.  (Same is
true for ZFS
>> or netapp or anything else.)  Because the VM images are all going to
have
>> their own filesystems internally with whatever blocksize is relevant to
the
>> guest OS.  If the virtual blocks in the VM don''t align with
the ZFS (or
>> whatever FS) host blocks...  Then even when you write duplicated data
>> inside
>> the guest, the host won''t see it as a duplicated block.
>>
>> There are some situations where dedup may help on VM images...  For
example
>> if you''re not using sparse files and you have a zero-filed
disk...  But in
>> that case, you should probably just use a sparse file instead...  Or
...
>>  If
>> you have a "golden" image that you''re copying all
over the place ... but in
>> that case, you should probably just use clones instead...
>>
>> Or if you''re intimately familiar with both the guest &
host filesystems,
>> and
>> you choose blocksizes carefully to make them align.  But that seems
>> complicated and likely to fail.
>>
>>
>>
>That''s patently false.  VM images are the absolute best use-case
for dedup
>outside of backup workloads.  I''m not sure who told you/where you
got the
>idea that VM images are not ripe for dedup, but it''s wrong.
>
>--Tim
>
>_______________________________________________
>zfs-discuss mailing list
>zfs-discuss at opensolaris.org
>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Edward Ned Harvey

2011-May-05 13:02 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

> From: Garrett D''Amore [mailto:garrett at nexenta.com]
> 
> We have customers using dedup with lots of vm images... in one extreme
> case they are getting dedup ratios of over 200:1!
I assume you''re talking about a situation where there is an initial VM
image, and then to clone the machine, the customers copy the VM, correct?
If that is correct, have you considered ZFS cloning instead?

When I said dedup wasn''t good for VM''s, what I''m
talking about is:  If there is data inside the VM which is cloned...  For
example if somebody logs into the guest OS and then does a "cp"
operation...  Then dedup of the host is unlikely to be able to recognize that
data as cloned data inside the virtual disk.

> Our customers have the ability to access our systems engineers to design
the
> solution for their needs.  If you are serious about doing this stuff right,
work
> with someone like Nexenta that can engineer a complete solution instead of
> trying to figure out which of us on this forum are quacks and which are
> cracks.  :)
Is this a zfs discussion list, or a nexenta sales & promotion list?

Constantin Gonzalez

2011-May-05 14:33 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

Hi,

On 05/ 5/11 03:02 PM, Edward Ned Harvey wrote:>> From: Garrett D''Amore [mailto:garrett at nexenta.com]
>>
>> We have customers using dedup with lots of vm images... in one extreme
>> case they are getting dedup ratios of over 200:1!
>
> I assume you''re talking about a situation where there is an
initial VM image, and then to clone the machine, the customers copy the VM,
correct?
> If that is correct, have you considered ZFS cloning instead?
>
> When I said dedup wasn''t good for VM''s, what I''m
talking about is:  If there is data inside the VM which is cloned...  For
example if somebody logs into the guest OS and then does a "cp"
operation...  Then dedup of the host is unlikely to be able to recognize that
data as cloned data inside the virtual disk.
ZFS cloning and ZFS dedup are solving two problems that are related, but
different:

- Through Cloning, a lot of space can be saved in situations where it is
   known beforehand that data is going to be used multiple times from multiple
   different "views". Virtualization is a perfect example of this.

- Through Dedup, space can be saved in situations where the duplicate nature
   of data is not known, or not known beforehand. Again, in virtualization
   scenarios, this could be common modifications to VM images that are
   performed multiple times, but not anticipated, such as extra software,
   OS patches, or simply man users saving the same files to their local
   desktops.

To go back to the "cp" example: If someone logs into a VM that is
backed by
ZFS with dedup enabled, then copies a file, the extra space that the file will
take will be minimal. The act of copying the file will break down into a
series of blocks that will be recognized as duplicate blocks.

This is completely independent of the clone nature of the underlying
VM''s
backing store.

But I agree that the biggest savings are to be expected from cloning first,
as they typically translate into n GB (for the base image) x # of users,
which is a _lot_.

Dedup is still the icing on the cake for all those data blocks that were
unforeseen. And that can be a lot, too, as everone who has seen cluttered
desktops full of downloaded files can probably confirm.

Cheers,
    Constantin

-- 

Constantin Gonzalez Schmitz, Sales Consultant,
Oracle Hardware Presales Germany
Phone: +49 89 460 08 25 91		| Mobile: +49 172 834 90 30
Blog: http://constantin.glez.de/	| Twitter: zalez

ORACLE Deutschland B.V. & Co. KG, Sonnenallee 1, 85551 Kirchheim-Heimstetten

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstra?e 25, D-80992 M?nchen
Registergericht: Amtsgericht M?nchen, HRA 95603

Komplement?rin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Gesch?ftsf?hrer: J?rgen Kunz, Marcel van de Molen, Alexander van der Ven

Oracle is committed to developing practices and products that help protect the
environment

Garrett D''Amore

2011-May-05 15:31 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Thu, 2011-05-05 at 09:02 -0400, Edward Ned Harvey
wrote:> > From: Garrett D''Amore [mailto:garrett at nexenta.com]
> > 
> > We have customers using dedup with lots of vm images... in one extreme
> > case they are getting dedup ratios of over 200:1!
> 
> I assume you''re talking about a situation where there is an
initial VM image, and then to clone the machine, the customers copy the VM,
correct?
> If that is correct, have you considered ZFS cloning instead?
No.  Obviously if you can clone, its better.  But sometimes you can''t
do
this even with v12n, and we have this situation at customer sites today.
(I have always said, zfs clone is far easier, far more proven, and far
more efficient, *if* you can control the "ancestral" relationship to
take advantage of the clone.)  For example, one are where cloning can''t
help is with patches and updates.  In some instances these can get quite
large, and across 1000''s of VMs the space required can be considerable.
> 
> When I said dedup wasn''t good for VM''s, what I''m
talking about is:  If there is data inside the VM which is cloned...  For
example if somebody logs into the guest OS and then does a "cp"
operation...  Then dedup of the host is unlikely to be able to recognize that
data as cloned data inside the virtual disk.
I disagree.  I believe that within the VMDKs data is aligned nicely,
since these are disk images.

At any rate, we are seeing real (and large) dedup ratios in the field
when used with v12n.  In fact, this is the killer app for dedup.
 > 
> > Our customers have the ability to access our systems engineers to
design the
> > solution for their needs.  If you are serious about doing this stuff
right, work
> > with someone like Nexenta that can engineer a complete solution
instead of
> > trying to figure out which of us on this forum are quacks and which
are
> > cracks.  :)
> 
> Is this a zfs discussion list, or a nexenta sales & promotion list?
My point here was that there is a lot of half baked advice being
given... the idea that you should only use dedup if you have a bunch of
zeros on your disk images is absolutely and totally nuts for example.
It doesn''t match real world experience, and it doesn''t match
the theory
either.

And sometimes real-world experience trumps the theory.  I''ve been shown
on numerous occasions that ideas that I thought were half-baked turned
out to be very effective in the field, and vice versa.  (I''m a
developer, not a systems engineer.  Fortunately I have a very close
working relationship with a couple of awesome systems engineers.)

Folks come here looking for advice.  I think the advice that if you''re
contemplating these kinds of solutions, you should get someone with some
real world experience solving these kinds of problems every day, is very
sound advice.  Trying to pull out the truths from the myths I see stated
here nearly every day is going to be difficult for the average reader
here, I think.

	- Garrett

Joerg Moellenkamp

2011-May-05 16:51 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

> I assume you''re talking about a situation where there is an
initial VM image, and then to clone the machine, the customers copy the VM,
correct?
> If that is correct, have you considered ZFS cloning instead?
>
> When I said dedup wasn''t good for VM''s, what I''m
talking about is:  If there is data inside the VM which is cloned...  For
example if somebody logs into the guest OS and then does a "cp"
operation...  Then dedup of the host is unlikely to be able to recognize that
data as cloned data inside the virtual disk.I have the same opinion. When having talks with customers about the 
usage of dedup and cloning, the answer is simple: When you know that 
duplicates will occur but don''t know when, then use dedup, when you
know
that duplicates occur and you know that they are there from the 
beginning, then use cloning.

Thus VM images cries for cloning. I''m not a fan for dedup for VMs. I 
heard  the argument once "but what is with vm patching". Aside from
the
problem of detecting the clones, i wouldn''t patch a vm, but patch the 
master image and regenerate the clones, especially when it''s about 
general patching session (just saving a gig because there is a patch on 
2 or 3 of 100 server) isn''t worth the effort of spending a lot of
memory
for dedup). Out of a simple reason: Patching the VMs each on it''s own
is
likely to increase VM sprawl. So all i save is some iron, but i''m not 
simplifying administration. However this needs good administrative 
processes.

You can use dedup for VMs, but i''m not sure someone should ...
> Is this a zfs discussion list, or a nexenta sales & promotion list?Well ... i have an opinion how he sees that ... however it''s just my
own ;)

-- 
ORACLE
Joerg Moellenkamp | Sales Consultant
Phone: +49 40 251523-460 | Mobile: +49 172 8318433
Oracle Hardware Presales - Nord

ORACLE Deutschland B.V.&   Co. KG | Nagelsweg 55 | 20097 Hamburg

ORACLE Deutschland B.V.&   Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 M?nchen
Registergericht: Amtsgericht M?nchen, HRA 95603

Komplement?rin: ORACLE Deutschland Verwaltung B.V.
Rijnzathe 6, 3454PV De Meern, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Gesch?ftsf?hrer: J?rgen Kunz, Marcel van de Molen, Alexander van der Ven

Oracle is committed to developing practices and products that help protect the
environment

Brandon High

2011-May-05 21:58 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 4, 2011 at 8:23 PM, Edward Ned Harvey
<opensolarisisdeadlongliveopensolaris at nedharvey.com>
wrote:> Generally speaking, dedup doesn''t work on VM images. ?(Same is
true for ZFS
> or netapp or anything else.) ?Because the VM images are all going to have
> their own filesystems internally with whatever blocksize is relevant to the
> guest OS. ?If the virtual blocks in the VM don''t align with the
ZFS (or
> whatever FS) host blocks... ?Then even when you write duplicated data
inside
> the guest, the host won''t see it as a duplicated block.
A zvol with 4k blocks should give you decent results with Windows
guests. Recent versions use 4k alignment by default and 4k blocks, so
there should be lots of duplicates for a base OS image.
> There are some situations where dedup may help on VM images... ?For example
> if you''re not using sparse files and you have a zero-filed disk...
?But in
compression=zle works even better for these cases, since it doesn''t
require DDT resources.
> Or if you''re intimately familiar with both the guest & host
filesystems, and
> you choose blocksizes carefully to make them align. ?But that seems
> complicated and likely to fail.
Using a 4k block size is a safe bet, since most OSs use a block size
that is a multiple of 4k. It''s the same reason that the new
"Advanced
Format" drives use 4k sectors.

Windows uses 4k alignment and 4k (or larger) clusters.
ext3/ext4 uses 1k, 2k, or 4k blocks. Drives over 512MB should use 4k
by default. The block alignment is determined by the partitioning, so
some care needs to be taken there.
zfs uses ''ashift'' size blocks. I''m not sure what
ashift works out to
be when using a zvol though, so it could be as small as 512b but may
be set to the same as the blocksize property.
ufs is 4k or 8k on x86 and 8k on sun4u. As with ext4, block alignment
is determined by partitioning and slices.

-B

-- 
Brandon High : bhigh at freaks.com

Richard Elling

2011-May-06 02:09 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On May 5, 2011, at 2:58 PM, Brandon High wrote:> On Wed, May 4, 2011 at 8:23 PM, Edward Ned Harvey
> 
>> Or if you''re intimately familiar with both the guest &
host filesystems, and
>> you choose blocksizes carefully to make them align.  But that seems
>> complicated and likely to fail.
> 
> Using a 4k block size is a safe bet, since most OSs use a block size
> that is a multiple of 4k. It''s the same reason that the new
"Advanced
> Format" drives use 4k sectors.
Yes, 4KB block sizes are replacing the 512B blocks of yesteryear. However,
the real reason the HDD manufacturers headed this way is because they can
get more usable bits per platter. The tradeoff is that your workload may consume
more real space on the platter than before. TANSTAAFL.

The trick for best performance and best opportunity for dedup (alignment
notwithstanding)
is to have a block size that is smaller than your workload.  Or, don''t
bring a 128KB block
to a 4KB block battle. For this reason, the default 8KB block size for a zvol is
a reasonable
choice, but perhaps 4KB is better for many workloads.
 -- richard

Richard Elling

2011-May-06 02:15 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On May 5, 2011, at 6:02 AM, Edward Ned Harvey wrote:> Is this a zfs discussion list, or a nexenta sales & promotion list?
Obviously, this is a Nextenta sales & promotion list. And Oracle. And OSX.
And BSD. And Linux. And anyone who needs help or can offer help with ZFS
technology :-) This list has never been more diverse. The only sad part is the
unnecessary assassination of the OpenSolaris brand. But life moves on, and so
does good technology.
 -- richard-who-is-proud-to-work-at-Nexenta

Edward Ned Harvey

2011-May-06 03:50 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

> From: Brandon High [mailto:bhigh at freaks.com]
> 
> On Wed, May 4, 2011 at 8:23 PM, Edward Ned Harvey
> <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:
> > Generally speaking, dedup doesn''t work on VM images. ?(Same
is true for
> ZFS
> > or netapp or anything else.) ?Because the VM images are all going to
have> > their own filesystems internally with whatever blocksize is relevant
to
the> > guest OS. ?If the virtual blocks in the VM don''t align with
the ZFS (or
> > whatever FS) host blocks... ?Then even when you write duplicated data
> inside
> > the guest, the host won''t see it as a duplicated block.
> 
> A zvol with 4k blocks should give you decent results with Windows
> guests. Recent versions use 4k alignment by default and 4k blocks, so
> there should be lots of duplicates for a base OS image.

I agree with everything Brandon said.

The one thing I would add is:  The "correct" recordsize for each guest
machine would depend on the filesystem that the guest machine is using.
Without knowing a specific filesystem on a specific guest OS, the 4k
recordsize sounds like a reasonable general-purpose setting.  But if you
know more details of the guest, you could hopefully use a larger recordsize
and therefore consume less ram on the host.

If you have to use the 4k recordsize, it is likely to consume 32x more
memory than the default 128k recordsize of ZFS.  At this rate, it becomes
increasingly difficult to get a justification to enable the dedup.  But
it''s
certainly possible.

Edward Ned Harvey

2011-May-06 04:04 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-
> bounces at opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> If you have to use the 4k recordsize, it is likely to consume 32x more
> memory than the default 128k recordsize of ZFS.  At this rate, it becomes
> increasingly difficult to get a justification to enable the dedup.  But
it''s> certainly possible.
Sorry, I didn''t realize ... RE just said (and I take his word for it)
that
the default recordsize for a zvol is 8k.  While of course the default
recordsize for a ZFS filesystem is 128k.

Emphasis is that memory requirement is a constant multiplied by number of
blocks, so smaller blocks ==> higher number of blocks ==> more memory
consumption.

This could be a major difference in implementation ... If you are going to
use ZFS over NFS as your VM storage backend that would default to 128k
recordsize, while if you''re going to use ZFS over ISCSI as your VM
storage
backend that would default to 8k recordsize.

In either case, you really want to be aware of, and tune your recordsize
appropriately for the guest(s) that you are running.

Brandon High

2011-May-06 04:05 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Thu, May 5, 2011 at 8:50 PM, Edward Ned Harvey
<opensolarisisdeadlongliveopensolaris at nedharvey.com>
wrote:> If you have to use the 4k recordsize, it is likely to consume 32x more
> memory than the default 128k recordsize of ZFS. ?At this rate, it becomes
> increasingly difficult to get a justification to enable the dedup. ?But
it''s
> certainly possible.
You''re forgetting that zvols use an 8k volblocksize by default. If
you''re currently exporting exporting volumes with iSCSI it''s
only a 2x
increase.

The tradeoff is that you should have more duplicate blocks, and reap
the rewards there. I''m fairly certain that it won''t offset the
large
increase in the size of the DDT however. Dedup with zvols is probably
never a good idea as a result.

Only if you''re hosting your VM images in .vmdk files will you get 128k
blocks. Of course, your chance of getting many identical blocks gets
much, much smaller. You''ll have to worry about the guests''
block
alignment in the context of the image file, since two identical files
may not create identical blocks as seen from ZFS. This means you may
get only fractional savings and have an enormous DDT.

-B

-- 
Brandon High : bhigh at freaks.com

Ray Van Dolson

2011-May-06 16:15 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Wed, May 04, 2011 at 08:49:03PM -0700, Edward Ned Harvey
wrote:> > From: Tim Cook [mailto:tim at cook.ms]
> > 
> > That''s patently false.? VM images are the absolute best
use-case for dedup
> > outside of backup workloads.? I''m not sure who told you/where
you got the
> > idea that VM images are not ripe for dedup, but it''s wrong.
> 
> Well, I got that idea from this list.  I said a little bit about why I
> believed it was true ... about dedup being ineffective for VM''s
... Would
> you care to describe a use case where dedup would be effective for a VM? 
Or
> perhaps cite something specific, instead of just wiping the whole thing and
> saying "patently false?"  I don''t feel like this comment
was productive...
> 
We use dedupe on our VMware datastores and typically see 50% savings,
often times more.  We do of course keep "like" VM''s on the
same volume
(at this point nothing more than groups of Windows VM''s, Linux
VM''s and
so on).

Note that this isn''t on ZFS (yet), but we hope to begin experimenting
with it soon (using NexentaStor).

Apologies for devolving the conversation too much in the NetApp
direction -- simply was a point of reference for me to get a better
understanding of things on the ZFS side. :)

Ray

Brandon High

2011-May-06 17:21 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Fri, May 6, 2011 at 9:15 AM, Ray Van Dolson <rvandolson at esri.com>
wrote:
> We use dedupe on our VMware datastores and typically see 50% savings,
> often times more. ?We do of course keep "like" VM''s on
the same volume
I think NetApp uses 4k blocks by default, so the block size and
alignment should match up for most filesystems and yield better
savings.

Your server''s resource requirements for ZFS and dedup will be much
higher due to the large DDT, as you initially suspected.

If bp_rewrite is ever completed and released, this might change. It
should allow for offline dedup, which may make dedup usable in more
situations.
> Apologies for devolving the conversation too much in the NetApp
> direction -- simply was a point of reference for me to get a better
> understanding of things on the ZFS side. :)
It''s good to compare the two, since they have a pretty large overlap
in functionality but sometimes very different implementations.

-B

-- 
Brandon High : bhigh at freaks.com

Evaldas Auryla

2011-May-09 07:11 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On 05/ 6/11 07:21 PM, Brandon High wrote:> On Fri, May 6, 2011 at 9:15 AM, Ray Van Dolson<rvandolson at
esri.com>
> wrote:
>
>> We use dedupe on our VMware datastores and typically see 50% savings,
>> often times more.  We do of course keep "like" VM''s
on the same volume
> I think NetApp uses 4k blocks by default, so the block size and
> alignment should match up for most filesystems and yield better
> savings.Assuming that VMware datastores are on NFS ? Otherwise VMware filesystem 
VMFS uses its own block sizes from 1M to 8M, so the important point is 
to align guest OS partition to 1M, and Windows guests starting from 
Vista/2008 do that by default now.

Regards,

Tim Cook

2011-May-09 13:51 UTC

head link

[zfs-discuss] Deduplication Memory Requirements

On Mon, May 9, 2011 at 2:11 AM, Evaldas Auryla <evaldas.auryla at
edqm.eu>wrote:
>  On 05/ 6/11 07:21 PM, Brandon High wrote:
>
>> On Fri, May 6, 2011 at 9:15 AM, Ray Van Dolson<rvandolson at
esri.com>
>> wrote:
>>
>>  We use dedupe on our VMware datastores and typically see 50% savings,
>>> often times more.  We do of course keep "like"
VM''s on the same volume
>>>
>> I think NetApp uses 4k blocks by default, so the block size and
>> alignment should match up for most filesystems and yield better
>> savings.
>>
> Assuming that VMware datastores are on NFS ? Otherwise VMware filesystem
> VMFS uses its own block sizes from 1M to 8M, so the important point is to
> align guest OS partition to 1M, and Windows guests starting from Vista/2008
> do that by default now.
>
> Regards,
>
>The VMFS filesystem itself is aligned by NetApp at LUN creation time.  You
still align to a 4K block on a filer because there is no way to
automatically align an encapsulated guest, especially when you could have
different guest OS types on a LUN.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110509/c9ff6af1/attachment.html>

zfs discuss - May 2011 - Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements

[zfs-discuss] Deduplication Memory Requirements