Sorry I couldn''t find this anywhere yet. For deduping it is best to have the lookup table in RAM, but I wasn''t too sure how much RAM is suggested? ::Assuming 128KB Block Sizes, and 100% unique data: 1TB*1024*1024*1024/128 = 8388608 Blocks ::Each Block needs 8 byte pointer? 8388608*8 = 67108864 bytes ::Ram suggest per TB 67108864/1024/1024 = 64MB So if I understand correctly we should have a min of 64MB RAM per TB for deduping? *hopes my math wasn''t way off*, or is there significant extra overhead stored per block for the lookup table? For example is there some kind of redundancy on the lookup table (relation to RAM space requirments) to counter corruption? I read some articles and they all mention that there is significant performance loss if the table isn''t in RAM, but none really mentioned how much RAM one should have per TB of duping. Thanks, hope someone can confirm *or give me the real numbers* for me. I know blocksize is variable; I''m most interessted in the default zfs setup right now. -- This message posted from opensolaris.org
On 2010-Oct-20 08:36:30 +0800, Never Best <quickx at hotmail.com> wrote:>Sorry I couldn''t find this anywhere yet. For deduping it is best to >have the lookup table in RAM, but I wasn''t too sure how much RAM is >suggested?*Lots*>::Assuming 128KB Block Sizes, and 100% unique data: >1TB*1024*1024*1024/128 = 8388608 Blocks >::Each Block needs 8 byte pointer? >8388608*8 = 67108864 bytes >::Ram suggest per TB >67108864/1024/1024 = 64MB > >So if I understand correctly we should have a min of 64MB RAM per TB >for deduping? *hopes my math wasn''t way off*, or is there significant >extra overhead stored per block for the lookup table?The rule-of-thumb is 270 bytes per DDT entry - that means a minimum of 2.2GB of RAM (or fast L2ARC) per TB. And note that 128KB is the maximum blocksize - it''s quite likely that you will have smaller blocks (which implies more RAM). I know my average blocksize is only a few KB. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101020/62252dd9/attachment.bin>
Ouch. I was thinking a DDT entry basically just needs an 8byte pointer to where-ever the data is located on disk, with a O(1) hash table for lookup, and maybe some redundancy/error correction data. Maybe that should get optimized; a light weight version for NB ;). I guess it is doing more than I thought it was, maybe with some performance boosts at the cost of DDT size *will read up a bit more*? Ahh well, I can still use it for specific folders for now and look into a SSD for L2ARC (this is how it''s done I''m guessing) to dedup the entire raid ;). Thanks -- This message posted from opensolaris.org
Sometimes you read about people having low performance deduping: it is because they have too little RAM. -- This message posted from opensolaris.org
Orvar Korvar wrote:> Sometimes you read about people having low performance deduping: it is because they have too little RAM. >I mostly heard they have low performance when they start deleting deduplicated data, not before that. So do you think that with 2.2GB of RAM per 1 TB of storage, with 128Kb blocks, deduplication will have no performance impact when deleting deduped data? Or it is like everyone was saying, that slow deleting of deduplicated data is something that is/to be fixed in further ZFS development?
Never Best wrote:> Sorry I couldn''t find this anywhere yet. For deduping it is best to have the lookup table in RAM, but I wasn''t too sure how much RAM is suggested? > > ::Assuming 128KB Block Sizes, and 100% unique data: > 1TB*1024*1024*1024/128 = 8388608 Blocks > ::Each Block needs 8 byte pointer? > 8388608*8 = 67108864 bytes > ::Ram suggest per TB > 67108864/1024/1024 = 64MB > > So if I understand correctly we should have a min of 64MB RAM per TB for deduping? *hopes my math wasn''t way off*, or is there significant extra overhead stored per block for the lookup table? For example is there some kind of redundancy on the lookup table (relation to RAM space requirments) to counter corruption? > > I read some articles and they all mention that there is significant performance loss if the table isn''t in RAM, but none really mentioned how much RAM one should have per TB of duping. > > Thanks, hope someone can confirm *or give me the real numbers* for me. I know blocksize is variable; I''m most interessted in the default zfs setup right now. >There were several detailed discussions about this over the past 6 months that should be in the archives. I believe most of the info came from Richard Elling.
On 10/22/2010 8:44 PM, Haudy Kazemi wrote:> Never Best wrote: >> Sorry I couldn''t find this anywhere yet. For deduping it is best to >> have the lookup table in RAM, but I wasn''t too sure how much RAM is >> suggested? >> >> ::Assuming 128KB Block Sizes, and 100% unique data: >> 1TB*1024*1024*1024/128 = 8388608 Blocks >> ::Each Block needs 8 byte pointer? >> 8388608*8 = 67108864 bytes >> ::Ram suggest per TB >> 67108864/1024/1024 = 64MB >> >> So if I understand correctly we should have a min of 64MB RAM per TB >> for deduping? *hopes my math wasn''t way off*, or is there significant >> extra overhead stored per block for the lookup table? For example is >> there some kind of redundancy on the lookup table (relation to RAM >> space requirments) to counter corruption? >> >> I read some articles and they all mention that there is significant >> performance loss if the table isn''t in RAM, but none really mentioned >> how much RAM one should have per TB of duping. >> >> Thanks, hope someone can confirm *or give me the real numbers* for >> me. I know blocksize is variable; I''m most interessted in the >> default zfs setup right now. > There were several detailed discussions about this over the past 6 > months that should be in the archives. I believe most of the info > came from Richard Elling. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussLook for both my name and Richard''s, going back about a year. In particular, this thread started out a good data flow: http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg35349.html bottom line: 270 bytes per record so, for 4k record size, that works out to be 67GB per 1 TB of unique data. 128k record size means about 2GB per 1 TB. dedup means buy a (big) SSD for L2ARC. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
comments at the bottom... On Oct 23, 2010, at 1:48 AM, Erik Trimble wrote:> On 10/22/2010 8:44 PM, Haudy Kazemi wrote: >> Never Best wrote: >>> Sorry I couldn''t find this anywhere yet. For deduping it is best to have the lookup table in RAM, but I wasn''t too sure how much RAM is suggested? >>> >>> ::Assuming 128KB Block Sizes, and 100% unique data: >>> 1TB*1024*1024*1024/128 = 8388608 Blocks >>> ::Each Block needs 8 byte pointer? >>> 8388608*8 = 67108864 bytes >>> ::Ram suggest per TB >>> 67108864/1024/1024 = 64MB >>> >>> So if I understand correctly we should have a min of 64MB RAM per TB for deduping? *hopes my math wasn''t way off*, or is there significant extra overhead stored per block for the lookup table? For example is there some kind of redundancy on the lookup table (relation to RAM space requirments) to counter corruption? >>> >>> I read some articles and they all mention that there is significant performance loss if the table isn''t in RAM, but none really mentioned how much RAM one should have per TB of duping. >>> >>> Thanks, hope someone can confirm *or give me the real numbers* for me. I know blocksize is variable; I''m most interessted in the default zfs setup right now. >> There were several detailed discussions about this over the past 6 months that should be in the archives. I believe most of the info came from Richard Elling. >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > Look for both my name and Richard''s, going back about a year. In particular, this thread started out a good data flow: > > http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg35349.html > > > bottom line: 270 bytes per recordSometimes we see bigger sizes, but you have to have a lot of references before the DDT entry gets bigger than 512 bytes. Or, another way to look at this is: for every record, you will be updating 512 bytes (or the minimum sector size). This is why you''ll hear me say that dedup changes big I/O into little I/O, but it doesn''t eliminate I/O. Fortunately, modern SSDs do little I/O well. Unfortunately, HDDs are better optimized for big I/O and are lousy for little I/O.> so, for 4k record size, that works out to be 67GB per 1 TB of unique data. 128k record size means about 2GB per 1 TB.Divide by 4 because the DDT is considered metadata and the metadata limit is 1/4 of ARC size. Yes, there is an open bug on this. No, it didn''t make b147. Yes, it is a trivial fix and can be tuned in the field.> dedup means buy a (big) SSD for L2ARC.L2ARC directory entries take space, too. SWAG around 200 bytes for each L2ARC record. -- richard -- OpenStorage Summit, October 25-27, Palo Alto, CA http://nexenta-summit2010.eventbrite.com USENIX LISA ''10 Conference, November 7-12, San Jose, CA ZFS and performance consulting http://www.RichardElling.com