I''m finally at the point of adding an SSD to my system, so I can get reasonable dedup performance. The question here goes to sizing of the SSD for use as an L2ARC device. Noodling around, I found Richard''s old posing on ARC->L2ARC memory requirements, which is mighty helpful in making sure I don''t overdo the L2ARC side. (http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg34677.html) What I haven''t found is a reasonable way to determine how big I''ll need an L2ARC to fit all the relevant data for dedup. I''ve seen several postings back in Jan about this, and there wasn''t much help, as was acknowledged at the time. What I''m after is exactly what needs to be stored extra for DDT? I''m looking at the 200-byte header in ARC per L2ARC entry, and assuming that is for all relevant info stored in the L2ARC, whether it''s actual data or metadata. My question is this: the metadata for a slab (record) takes up how much space? With DDT turned on, I''m assuming that this metadata is larger than with it off (or, is it the same now for both)? There has to be some way to do a back-of-the-envelope calc that says (X) pool size = (Y) min L2ARC size = (Z) min ARC size -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Feb 28, 2010, at 7:11 PM, Erik Trimble wrote:> I''m finally at the point of adding an SSD to my system, so I can get reasonable dedup performance. > > The question here goes to sizing of the SSD for use as an L2ARC device. > > Noodling around, I found Richard''s old posing on ARC->L2ARC memory requirements, which is mighty helpful in making sure I don''t overdo the L2ARC side. > > (http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg34677.html)I don''t know of an easy way to see the number of blocks, which is what you need to complete a capacity plan. OTOH, it doesn''t hurt to have an L2ARC, just beware of wasting space if you have a small RAM machine.> What I haven''t found is a reasonable way to determine how big I''ll need an L2ARC to fit all the relevant data for dedup. I''ve seen several postings back in Jan about this, and there wasn''t much help, as was acknowledged at the time. > > What I''m after is exactly what needs to be stored extra for DDT? I''m looking at the 200-byte header in ARC per L2ARC entry, and assuming that is for all relevant info stored in the L2ARC, whether it''s actual data or metadata. My question is this: the metadata for a slab (record) takes up how much space? With DDT turned on, I''m assuming that this metadata is larger than with it off (or, is it the same now for both)? > > There has to be some way to do a back-of-the-envelope calc that says (X) pool size = (Y) min L2ARC size = (Z) min ARC sizeIf you know the number of blocks and the size distribution you can calculate this. In other words, it isn''t very easy to do in advance unless you have a fixed-size workload (eg database that doesn''t grow :-) For example, if you have a 10 GB database with 8KB blocks, then you can calculate how much RAM would be required to hold the headers for a 10 GB L2ARC device: headers = 10 GB / 8 KB RAM needed ~ 200 bytes * headers for media, you can reasonably expect 128KB blocks. The DDT size can be measured with "zdb -D poolname" but you can expect that to grow over time, too. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)
Richard Elling wrote:> On Feb 28, 2010, at 7:11 PM, Erik Trimble wrote: > >> I''m finally at the point of adding an SSD to my system, so I can get reasonable dedup performance. >> >> The question here goes to sizing of the SSD for use as an L2ARC device. >> >> Noodling around, I found Richard''s old posing on ARC->L2ARC memory requirements, which is mighty helpful in making sure I don''t overdo the L2ARC side. >> >> (http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg34677.html) >> > > I don''t know of an easy way to see the number of blocks, which is what > you need to complete a capacity plan. OTOH, it doesn''t hurt to have an > L2ARC, just beware of wasting space if you have a small RAM machine. >I haven''t found a good way, either. And I''ve looked. ;-)>> What I haven''t found is a reasonable way to determine how big I''ll need an L2ARC to fit all the relevant data for dedup. I''ve seen several postings back in Jan about this, and there wasn''t much help, as was acknowledged at the time. >> >> What I''m after is exactly what needs to be stored extra for DDT? I''m looking at the 200-byte header in ARC per L2ARC entry, and assuming that is for all relevant info stored in the L2ARC, whether it''s actual data or metadata. My question is this: the metadata for a slab (record) takes up how much space? With DDT turned on, I''m assuming that this metadata is larger than with it off (or, is it the same now for both)? >> >> There has to be some way to do a back-of-the-envelope calc that says (X) pool size = (Y) min L2ARC size = (Z) min ARC size >> > > If you know the number of blocks and the size distribution you can > calculate this. In other words, it isn''t very easy to do in advance unless > you have a fixed-size workload (eg database that doesn''t grow :-) > For example, if you have a 10 GB database with 8KB blocks, then > you can calculate how much RAM would be required to hold the > headers for a 10 GB L2ARC device: > headers = 10 GB / 8 KB > RAM needed ~ 200 bytes * headers > > for media, you can reasonably expect 128KB blocks. > > The DDT size can be measured with "zdb -D poolname" but you > can expect that to grow over time, too. > -- richarThat''s good, but I''d like a way to pre-calculate my potential DDT size (which, I''m assuming, will sit in L2ARC, right?) Once again, I''m assuming that each DDT entry corresponds to a record (slab), so to be exact, I would need to know the number of slabs (which doesn''t currently seem possible). I''d be satisfied with a guesstimate based on what my expected average block size it. But what I need to know is how big a DDT entry is for each record. I''m trying to parse the code, and I don''t have it in a sufficiently intelligent IDE right now to find all the cross-references. I''ve got as far as (in ddt.h) struct ddt_entry { ddt_key_t dde_key; ddt_phys_t dde_phys[DDT_PHYS_TYPES]; zio_t *dde_lead_zio[DDT_PHYS_TYPES]; void *dde_repair_data; enum ddt_type dde_type; enum ddt_class dde_class; uint8_t dde_loading; uint8_t dde_loaded; kcondvar_t dde_cv; avl_node_t dde_node; }; Any idea what these structure size actually are? -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Feb 28, 2010, at 11:04 PM, Erik Trimble wrote:> Richard Elling wrote: >> On Feb 28, 2010, at 7:11 PM, Erik Trimble wrote: >> >>> I''m finally at the point of adding an SSD to my system, so I can get reasonable dedup performance. >>> >>> The question here goes to sizing of the SSD for use as an L2ARC device. >>> >>> Noodling around, I found Richard''s old posing on ARC->L2ARC memory requirements, which is mighty helpful in making sure I don''t overdo the L2ARC side. >>> >>> (http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg34677.html) >>> >> >> I don''t know of an easy way to see the number of blocks, which is what >> you need to complete a capacity plan. OTOH, it doesn''t hurt to have an L2ARC, just beware of wasting space if you have a small RAM machine. >> > I haven''t found a good way, either. And I''ve looked. ;-) > > > > >>> What I haven''t found is a reasonable way to determine how big I''ll need an L2ARC to fit all the relevant data for dedup. I''ve seen several postings back in Jan about this, and there wasn''t much help, as was acknowledged at the time. >>> >>> What I''m after is exactly what needs to be stored extra for DDT? I''m looking at the 200-byte header in ARC per L2ARC entry, and assuming that is for all relevant info stored in the L2ARC, whether it''s actual data or metadata. My question is this: the metadata for a slab (record) takes up how much space? With DDT turned on, I''m assuming that this metadata is larger than with it off (or, is it the same now for both)? >>> >>> There has to be some way to do a back-of-the-envelope calc that says (X) pool size = (Y) min L2ARC size = (Z) min ARC size >>> >> >> If you know the number of blocks and the size distribution you can >> calculate this. In other words, it isn''t very easy to do in advance unless >> you have a fixed-size workload (eg database that doesn''t grow :-) >> For example, if you have a 10 GB database with 8KB blocks, then >> you can calculate how much RAM would be required to hold the >> headers for a 10 GB L2ARC device: >> headers = 10 GB / 8 KB >> RAM needed ~ 200 bytes * headers >> >> for media, you can reasonably expect 128KB blocks. >> >> The DDT size can be measured with "zdb -D poolname" but you can expect that to grow over time, too. >> -- richar > That''s good, but I''d like a way to pre-calculate my potential DDT size (which, I''m assuming, will sit in L2ARC, right?)It will be in the ARC/L2ARC just like other data.> Once again, I''m assuming that each DDT entry corresponds to a record (slab), so to be exact, I would need to know the number of slabs (which doesn''t currently seem possible). I''d be satisfied with a guesstimate based on what my expected average block size it. But what I need to know is how big a DDT entry is for each record. I''m trying to parse the code, and I don''t have it in a sufficiently intelligent IDE right now to find all the cross-references. > I''ve got as far as (in ddt.h) > > struct ddt_entry { > ddt_key_t dde_key; > ddt_phys_t dde_phys[DDT_PHYS_TYPES]; > zio_t *dde_lead_zio[DDT_PHYS_TYPES]; > void *dde_repair_data; > enum ddt_type dde_type; > enum ddt_class dde_class; > uint8_t dde_loading; > uint8_t dde_loaded; > kcondvar_t dde_cv; > avl_node_t dde_node; > }; > > Any idea what these structure size actually are?Around 270 bytes, or one 512 byte sector. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)
On Mon, Mar 01, 2010 at 09:22:38AM -0800, Richard Elling wrote:> > Once again, I''m assuming that each DDT entry corresponds to a > > record (slab), so to be exact, I would need to know the number of > > slabs (which doesn''t currently seem possible). I''d be satisfied > > with a guesstimate based on what my expected average block size > > it. But what I need to know is how big a DDT entry is for each > > record. I''m trying to parse the code, and I don''t have it in a > > sufficiently intelligent IDE right now to find all the > > cross-references. > > > > I''ve got as far as (in ddt.h) > > > > struct ddt_entry {[..]> > }; > > > > Any idea what these structure size actually are? > > Around 270 bytes, or one 512 byte sector.Is the assumption above that correct - the DDT stores one of these records per "block", and as such the native recordsize of DDT is just 512 bytes? Or are they aggregated somehow? Is this the difference between the in-memory and on-disk sizes, due to sector alignment padding? We got as far as showing that 512-byte records in L2ARC are expensive in RAM overhead, but I still don''t know for sure the recordsize of DDT as seen by ARC. I''m still hoping someone will describe how to use zdb to find and inspect the DDT object on-disk. Last time we got stuck at trying to determine the units used for numbers printed by zdb -D. This whole sizing business is getting to be quite a FAQ, and there hasn''t really been a clear answer. Yes, there are many moving parts, and applying generic sizing recommendations is hard -- but at least being able to see more of the parts would help. If nothing else, it would help move these kinds of discussion along to more specific analysis. So: - what are the units/sizes in bytes reported by zdb -D - what is the in-memory size of a DDT entry, including overheads - what is the on-disk size of a DDT entry, including overheads - what is the recordsize of DDT, as visible to L2ARC - what ram overhead %age does L2ARC need for that recordsize With those, one can start from zdb stats on an existing pool, or estimations about certain kinds of data, and add overheads and muliply down the list to model the totals. One can also probably see clearly the benefit of extra L1ARC capacity vs L2ARC with those overheads, and the cost of doing dedup at all. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100302/e4008866/attachment.bin>