Roy Sigurd Karlsbakk
2010-Apr-02 00:39 UTC
[zfs-discuss] dedup and memory/l2arc requirements
Hi all I''ve been told (on #opensolaris, irc.freenode.net) that opensolaris needs a lot of memory and/or l2arc for dedup to function properly. How much memory or l2arc should I get for a 12TB zpool (8x2GB in RAIDz2), and then, how much for 125TB (after RAIDz2 overhead)? Is there a function into which I can plug my recordsize and volume size to get the appropriate numbers? Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Roy Sigurd Karlsbakk
2010-Apr-02 04:33 UTC
[zfs-discuss] dedup and memory/l2arc requirements
> > I might add some swap I guess. I will have to try it on another > > machine with more RAM and less pool, and see how the size of the > zdb > > image compares to the calculated size of DDT needed. So long as > zdb > > is the same or a little smaller than the DDT it predicts, the > tool''s > > still useful, just sometimes it will report ``DDT too big but not > sure > > by how much'''', by coredumping/thrashing instead of finishing. > > In my experience, more swap doesn''t help break through the 2GB memory > barrier. As zdb is an intentionally unsupported tool, methinks > recompile > may be required (or write your own).I guess this tool might not work too well, then, with 20TiB in 47M files? Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Roy Sigurd Karlsbakk
2010-Apr-02 04:34 UTC
[zfs-discuss] dedup and memory/l2arc requirements
> You can estimate the amount of disk space needed for the deduplication > table > and the expected deduplication ratio by using "zdb -S poolname" on > your existing > pool.This is all good, but it doesn''t work too well for planning. Is there a rule of thumb I can use for a general overview? Say I want 125TB space and I want to dedup that for backup use. It''ll probably be quite efficient dedup, so long alignment will match. By the way, is there a way to auto-align data for dedup in case of backup? Or does zfs do this by itself? Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/
On Apr 1, 2010, at 5:39 PM, Roy Sigurd Karlsbakk wrote:> Hi all > > I''ve been told (on #opensolaris, irc.freenode.net) that opensolaris needs a lot of memory and/or l2arc for dedup to function properly. How much memory or l2arc should I get for a 12TB zpool (8x2GB in RAIDz2), and then, how much for 125TB (after RAIDz2 overhead)? Is there a function into which I can plug my recordsize and volume size to get the appropriate numbers?You can estimate the amount of disk space needed for the deduplication table and the expected deduplication ratio by using "zdb -S poolname" on your existing pool. Be patient, for an existing pool with lots of objects, this can take some time to run. # ptime zdb -S zwimming Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 2.27M 239G 188G 194G 2.27M 239G 188G 194G 2 327K 34.3G 27.8G 28.1G 698K 73.3G 59.2G 59.9G 4 30.1K 2.91G 2.10G 2.11G 152K 14.9G 10.6G 10.6G 8 7.73K 691M 529M 529M 74.5K 6.25G 4.79G 4.80G 16 673 43.7M 25.8M 25.9M 13.1K 822M 492M 494M 32 197 12.3M 7.02M 7.03M 7.66K 480M 269M 270M 64 47 1.27M 626K 626K 3.86K 103M 51.2M 51.2M 128 22 908K 250K 251K 3.71K 150M 40.3M 40.3M 256 7 302K 48K 53.7K 2.27K 88.6M 17.3M 19.5M 512 4 131K 7.50K 7.75K 2.74K 102M 5.62M 5.79M 2K 1 2K 2K 2K 3.23K 6.47M 6.47M 6.47M 8K 1 128K 5K 5K 13.9K 1.74G 69.5M 69.5M Total 2.63M 277G 218G 225G 3.22M 337G 263G 270G dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50 real 8:02.391932786 user 1:24.231855093 sys 15.193256108 In this file system, 2.75 million blocks are allocated. The in-core size of a DDT entry is approximately 250 bytes. So the math is pretty simple: in-core size = 2.63M * 250 = 657.5 MB If your dedup ratio is 1.0, then this number will scale linearly with size. If the dedup rate > 1.0, then this number will not scale linearly, it will be less. So you can use the linear scale as a worst-case approximation. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes:re> # ptime zdb -S zwimming Simulated DDT histogram: re> refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE re> Total 2.63M 277G 218G 225G 3.22M 337G 263G 270G re> in-core size = 2.63M * 250 = 657.5 MB Thanks, that is really useful! It''ll probably make the difference between trying dedup and not, for me. It is not working for me yet. It got to this point in prstat: 6754 root 2554M 1439M sleep 60 0 0:03:31 1.9% zdb/106 and then ran out of memory: $ pfexec ptime zdb -S tub out of memory -- generating core dump I might add some swap I guess. I will have to try it on another machine with more RAM and less pool, and see how the size of the zdb image compares to the calculated size of DDT needed. So long as zdb is the same or a little smaller than the DDT it predicts, the tool''s still useful, just sometimes it will report ``DDT too big but not sure by how much'''', by coredumping/thrashing instead of finishing. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100402/b3772c5f/attachment.bin>
On Apr 2, 2010, at 2:03 PM, Miles Nordin wrote:>>>>>> "re" == Richard Elling <richard.elling at gmail.com> writes: > > re> # ptime zdb -S zwimming Simulated DDT histogram: > re> refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE > re> Total 2.63M 277G 218G 225G 3.22M 337G 263G 270G > > re> in-core size = 2.63M * 250 = 657.5 MB > > Thanks, that is really useful! It''ll probably make the difference > between trying dedup and not, for me. > > It is not working for me yet. It got to this point in prstat: > > 6754 root 2554M 1439M sleep 60 0 0:03:31 1.9% zdb/106 > > and then ran out of memory: > > $ pfexec ptime zdb -S tub > out of memory -- generating core dumpThis is annoying. By default, zdb is compiled as a 32-bit executable and it can be a hog. Compiling it yourself is too painful for most folks :-(> I might add some swap I guess. I will have to try it on another > machine with more RAM and less pool, and see how the size of the zdb > image compares to the calculated size of DDT needed. So long as zdb > is the same or a little smaller than the DDT it predicts, the tool''s > still useful, just sometimes it will report ``DDT too big but not sure > by how much'''', by coredumping/thrashing instead of finishing.In my experience, more swap doesn''t help break through the 2GB memory barrier. As zdb is an intentionally unsupported tool, methinks recompile may be required (or write your own). -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
On Apr 1, 2010, at 9:34 PM, Roy Sigurd Karlsbakk wrote:>> You can estimate the amount of disk space needed for the deduplication >> table >> and the expected deduplication ratio by using "zdb -S poolname" on >> your existing >> pool. > > This is all good, but it doesn''t work too well for planning. Is there a rule of thumb I can use for a general overview?If you know the average record size for your workload, then you can calculate the average number of records when given the total space. This should get you in the ballpark.> Say I want 125TB space and I want to dedup that for backup use. It''ll probably be quite efficient dedup, so long alignment will match. By the way, is there a way to auto-align data for dedup in case of backup? Or does zfs do this by itself?ZFS does not change alignment. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
On 03/04/2010 00:57, Richard Elling wrote:> > This is annoying. By default, zdb is compiled as a 32-bit executable and > it can be a hog. Compiling it yourself is too painful for most folks :-(/usr/sbin/zdb is actually a link to /usr/lib/isaexec $ ls -il /usr/sbin/zdb /usr/lib/isaexec 300679 -r-xr-xr-x 92 root bin 8248 Nov 16 10:26 /usr/lib/isaexec* 300679 -r-xr-xr-x 92 root bin 8248 Nov 16 10:26 /usr/sbin/zdb* $ ls -il /usr/sbin/i86/zdb /usr/sbin/amd64/zdb 200932 -r-xr-xr-x 1 root bin 173224 Mar 15 10:20 /usr/sbin/amd64/zdb* 200933 -r-xr-xr-x 1 root bin 159960 Mar 15 10:20 /usr/sbin/i86/zdb* This means both 32 and 64 bit versions are already available and if the kernel is 64 bit then the 64 bit version of zdb will be run if you run /usr/sbin/zdb. -- Darren J Moffat