I have some questions about the choice of SSDs to use for ZIL and L2ARC. I''m trying to build an OpenSolaris iSCSI SAN out of a whitebox system, which is intended to be used as a backup SAN during storage migration, so it''s built on a tight budget. The system currently has 4GB RAM, 3GHz Core2-Quad and 8x 500GB WD REII SATA HDDs attached to an Areca 8port ARC-1220 controller configured as RAID10. For ZIL and rpool I''m planning on ordering 2x 16GB Mtron server grade SLC SSDs (Pro 7500 Series) and attach them to a 2port Areca ARC-1200 SATA RAID controller with BBU as a mirror. The reason for using a RAID controller here, is so the battery backup makes sure the zil is flushed to disk, while the controller cache helps counter the disabled drive cache. For the L2ARC I''ve ordered 4x Intel X25-M G2 80GB SLC SSDs which I plan to connect to the intel onboard ahci ports. - Is this a reasonable setup for ZIL? - Is it safe to run the L2ARC without battery backup with write cache enabled? - For the tank, on the 8port Areca, do I need a battery backup on that controller aswell or will the ZIL suffice to assure correct data? - Does it make sense to use HW RAID10 on the storage controller or would I get better performance out of JBOD + ZFS RAIDZ2? Best Regards, Felix Buenemann
On Mon, Feb 08, 2010 at 04:58:38AM +0100, Felix Buenemann wrote:> I have some questions about the choice of SSDs to use for ZIL and L2ARC.I have one answer. The other questions are mostly related to your raid controller, which I can''t answer directly.> - Is it safe to run the L2ARC without battery backup with write cache > enabled?Yes, it''s just a cache, errors will be detected and re-fetched from the pool. Also, it is volatile-at-reboot (starts cold) at present anyway, so preventing data loss at power off is not worth spending any money or time over.> - Does it make sense to use HW RAID10 on the storage controller or would > I get better performance out of JBOD + ZFS RAIDZ2?A more comparable alternative would be using the controller in jbod mode and a pool of zfs mirror vdevs. I''d expect that gives similar performance to the controller''s mirroring (unless higher pci bus usage is a bottleneck) but gives you the benefits of zfs healing on disk errors. Performance of RaidZ/5 vs mirrors is a much more workload-sensitive question, regardless of the additional implementation-specific wrinkles of either kind. Your emphasis on lots of slog and l2arc suggests performance is a priority. Whether all this kit is enough to hide the IOPS penalty of raidz/5, or whether you need it even to make mirrors perform adequately, you''ll have to decide yourself. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100208/b644542a/attachment.bin>
Hi Daniel, Am 08.02.10 05:45, schrieb Daniel Carosone:> On Mon, Feb 08, 2010 at 04:58:38AM +0100, Felix Buenemann wrote: >> I have some questions about the choice of SSDs to use for ZIL and L2ARC. > > I have one answer. The other questions are mostly related to your > raid controller, which I can''t answer directly. > >> - Is it safe to run the L2ARC without battery backup with write cache >> enabled? > > Yes, it''s just a cache, errors will be detected and re-fetched from > the pool. Also, it is volatile-at-reboot (starts cold) at present > anyway, so preventing data loss at power off is not worth spending any > money or time over.Thanks for clarifying this.>> - Does it make sense to use HW RAID10 on the storage controller or would >> I get better performance out of JBOD + ZFS RAIDZ2? > > A more comparable alternative would be using the controller in jbod > mode and a pool of zfs mirror vdevs. I''d expect that gives similar > performance to the controller''s mirroring (unless higher pci bus usage > is a bottleneck) but gives you the benefits of zfs healing on disk > errors.I was under the impression, that using HW RAID10 would save me 50% PCI bandwidth and allow the controller to more intelligently handle its cache, so I sticked with it. But I should run some benchmarks in RAID10 vs. JBOD with ZFS mirrors to see if this makes a difference.> Performance of RaidZ/5 vs mirrors is a much more workload-sensitive > question, regardless of the additional implementation-specific > wrinkles of either kind. > > Your emphasis on lots of slog and l2arc suggests performance is a > priority. Whether all this kit is enough to hide the IOPS penalty of > raidz/5, or whether you need it even to make mirrors perform > adequately, you''ll have to decide yourself.So it seems right to assume, that RAIDZ1/2 has about the same performance hit as HW RAID5/6 with Write Cache. I wasn''t aware that ZFS can do RAID10 style multiple mirrors, so that seems to be the better option anyways.> -- > Dan.- Felix
On Mon, 8 Feb 2010, Felix Buenemann wrote:> > I was under the impression, that using HW RAID10 would save me 50% PCI > bandwidth and allow the controller to more intelligently handle its cache, so > I sticked with it. But I should run some benchmarks in RAID10 vs. JBOD with > ZFS mirrors to see if this makes a difference.The answer to this is "it depends". If the PCI-E and controller have enough bandwidth capacity, then the write bottleneck will be the disk itself. If there is insufficient controller bandwidth capacity, then the controller becomes the bottleneck. If the bottleneck is the disks, then there is hardly any write penalty from using zfs mirrors. If the bottleneck is the controller, then you may see 1/2 the write performance due to using zfs mirrors. If you are using modern computing hardware, then the disks should be the bottleneck. Performance of HW RAID controllers is a complete unknown and they tend to modify the data so that it depends on the specific controller, which really sucks if the controller fails. It is usually better to run the controller in a JBOD mode (taking advantage of its write cache, if available) and use zfs mirrors. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
To add to Bob''s notes... On Feb 8, 2010, at 8:37 AM, Bob Friesenhahn wrote:> On Mon, 8 Feb 2010, Felix Buenemann wrote: >> >> I was under the impression, that using HW RAID10 would save me 50% PCI bandwidth and allow the controller to more intelligently handle its cache, so I sticked with it. But I should run some benchmarks in RAID10 vs. JBOD with ZFS mirrors to see if this makes a difference. > > The answer to this is "it depends". If the PCI-E and controller have enough bandwidth capacity, then the write bottleneck will be the disk itself.If you have HDDs, the write bandwidth bottleneck will be the disk.> If there is insufficient controller bandwidth capacity, then the controller becomes the bottleneck.We don''t tend to see this for HDDs, but SSDs can crush a controller and channel. -- richard
On Mon, 8 Feb 2010, Richard Elling wrote:> >> If there is insufficient controller bandwidth capacity, then the >> controller becomes the bottleneck. > > We don''t tend to see this for HDDs, but SSDs can crush a controller and > channel.It is definitely seen with older PCI hardware. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Am 08.02.10 22:23, schrieb Bob Friesenhahn:> On Mon, 8 Feb 2010, Richard Elling wrote: >> >>> If there is insufficient controller bandwidth capacity, then the >>> controller becomes the bottleneck. >> >> We don''t tend to see this for HDDs, but SSDs can crush a controller and >> channel. > > It is definitely seen with older PCI hardware.Well to make things short: Using JBOD + ZFS Striped Mirrors vs. controller''s RAID10, dropped the max. sequential read I/O from over 400 MByte/s to below 300 MByte/s. However random I/O and sequential writes seemed to perform equally well. One thing however was mucbh better using ZFS mirrors: random seek performance was about 4 times higher, so I guess for random I/O on a busy system the JBOD would win. The controller can deliver 800 MByte/s on cache hits and is connected with PCIe x8, so theoretically it should have enough PCI bandwidth. It''s cpu is the older 500MHz IOP333, so it has less power than the newer IOP348 controllers with 1.2GHZ cpus. Too bad I have no choice but using HW RAID, because the mainboard bios only supports 7 boot devices, so it can''t boot from the right disk if the Areca is in JBOD and I found no way to disable the controllers BIOS. Well maybe I could flash the EFI BIOS to work around this... (I''ve done my tests by reconfiguring the controller at runtime.)> > Bob- Felix
On Tue, 9 Feb 2010, Felix Buenemann wrote:> > Well to make things short: Using JBOD + ZFS Striped Mirrors vs. controller''s > RAID10, dropped the max. sequential read I/O from over 400 MByte/s to below > 300 MByte/s. However random I/O and sequential writes seemed to performMuch of the difference is likely that your controller implements true RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors. Since zfs does not use true striping across vdevs, it relies on sequential prefetch requests to get the sequential read rate up. Sometimes zfs''s prefetch is not aggressive enough. I have observed that there may still be considerably more read performance available (to another program/thread) even while a benchmark program is reading sequentially as fast as it can. Try running two copies of your benchmark program at once and see what happens. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Am 09.02.10 02:30, schrieb Bob Friesenhahn:> On Tue, 9 Feb 2010, Felix Buenemann wrote: >> >> Well to make things short: Using JBOD + ZFS Striped Mirrors vs. >> controller''s RAID10, dropped the max. sequential read I/O from over >> 400 MByte/s to below 300 MByte/s. However random I/O and sequential >> writes seemed to perform > > Much of the difference is likely that your controller implements true > RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors. > Since zfs does not use true striping across vdevs, it relies on > sequential prefetch requests to get the sequential read rate up. > Sometimes zfs''s prefetch is not aggressive enough. > > I have observed that there may still be considerably more read > performance available (to another program/thread) even while a benchmark > program is reading sequentially as fast as it can. > > Try running two copies of your benchmark program at once and see what > happens.Yes, JBOD + ZFS load-balanced mirrors does seem to work better under heavy load. I tried rebooting a Windows VM from NFS, which took about 43 sec with hot cache in both cases. But when doing this during a bonnie++ benchmark run, the ZFS mirrors would win big time, taking just 2:47sec instead of over 4min to reboot the VM. So I think in a real world scenario, the ZFS mirrors will win. On a sitenote however I noticed that small sequential I/O (copying a 150MB sourcetree to NFS), the ZFS mirrors where 50% slower than the controllers RAID10.> Bob- Felix
Am 09.02.10 09:58, schrieb Felix Buenemann:> Am 09.02.10 02:30, schrieb Bob Friesenhahn: >> On Tue, 9 Feb 2010, Felix Buenemann wrote: >>> >>> Well to make things short: Using JBOD + ZFS Striped Mirrors vs. >>> controller''s RAID10, dropped the max. sequential read I/O from over >>> 400 MByte/s to below 300 MByte/s. However random I/O and sequential >>> writes seemed to perform >> >> Much of the difference is likely that your controller implements true >> RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors. >> Since zfs does not use true striping across vdevs, it relies on >> sequential prefetch requests to get the sequential read rate up. >> Sometimes zfs''s prefetch is not aggressive enough. >> >> I have observed that there may still be considerably more read >> performance available (to another program/thread) even while a benchmark >> program is reading sequentially as fast as it can. >> >> Try running two copies of your benchmark program at once and see what >> happens. > > Yes, JBOD + ZFS load-balanced mirrors does seem to work better under > heavy load. I tried rebooting a Windows VM from NFS, which took about 43 > sec with hot cache in both cases. But when doing this during a bonnie++ > benchmark run, the ZFS mirrors would win big time, taking just 2:47sec > instead of over 4min to reboot the VM. > So I think in a real world scenario, the ZFS mirrors will win. > > On a sitenote however I noticed that small sequential I/O (copying a > 150MB sourcetree to NFS), the ZFS mirrors where 50% slower than the > controllers RAID10.I had a hunch that the controllers volume read ahead would interfere with the ZFS load-shared mirrors and voil?: sequential reads jumped from 270 MByte/s to 420 MByte/s, which checks out nicely, because writes are about 200 MByte/s.> >> Bob > > - Felix- Felix
Hi Intel X-25 M are MLC not SLC, there are very good for L2ARC. and next, you need more RAM: ZFS can''t handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS use memory to allocate and manage L2ARC. 2010/2/10 Felix Buenemann <Felix.Buenemann at googlemail.com>:> Am 09.02.10 09:58, schrieb Felix Buenemann: >> >> Am 09.02.10 02:30, schrieb Bob Friesenhahn: >>> >>> On Tue, 9 Feb 2010, Felix Buenemann wrote: >>>> >>>> Well to make things short: Using JBOD + ZFS Striped Mirrors vs. >>>> controller''s RAID10, dropped the max. sequential read I/O from over >>>> 400 MByte/s to below 300 MByte/s. However random I/O and sequential >>>> writes seemed to perform >>> >>> Much of the difference is likely that your controller implements true >>> RAID10 wereas ZFS "striped" mirrors are actually load-shared mirrors. >>> Since zfs does not use true striping across vdevs, it relies on >>> sequential prefetch requests to get the sequential read rate up. >>> Sometimes zfs''s prefetch is not aggressive enough. >>> >>> I have observed that there may still be considerably more read >>> performance available (to another program/thread) even while a benchmark >>> program is reading sequentially as fast as it can. >>> >>> Try running two copies of your benchmark program at once and see what >>> happens. >> >> Yes, JBOD + ZFS load-balanced mirrors does seem to work better under >> heavy load. I tried rebooting a Windows VM from NFS, which took about 43 >> sec with hot cache in both cases. But when doing this during a bonnie++ >> benchmark run, the ZFS mirrors would win big time, taking just 2:47sec >> instead of over 4min to reboot the VM. >> So I think in a real world scenario, the ZFS mirrors will win. >> >> On a sitenote however I noticed that small sequential I/O (copying a >> 150MB sourcetree to NFS), the ZFS mirrors where 50% slower than the >> controllers RAID10. > > I had a hunch that the controllers volume read ahead would interfere with > the ZFS load-shared mirrors and voil?: sequential reads jumped from 270 > MByte/s to 420 MByte/s, which checks out nicely, because writes are about > 200 MByte/s. > >> >>> Bob >> >> - Felix > > - Felix > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Hi Micka?l, Am 12.02.10 13:49, schrieb Micka?l Maillot:> Intel X-25 M are MLC not SLC, there are very good for L2ARC.Yes, I''m only using those for L2ARC, I''m planing on getting to Mtron Pro 7500 16GB SLC SSDs for ZIL.> and next, you need more RAM: > ZFS can''t handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS > use memory to allocate and manage L2ARC.Is there a guideline in which relation L2ARC size should be to RAM? I could upgrade the server to 8GB, but that''s the maximum the i975X chipset can handle. Best Regards, Felix Buenemann
On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote:> Hi Micka?l, > > Am 12.02.10 13:49, schrieb Micka?l Maillot: >> Intel X-25 M are MLC not SLC, there are very good for L2ARC. > > Yes, I''m only using those for L2ARC, I''m planing on getting to Mtron Pro 7500 16GB SLC SSDs for ZIL. > >> and next, you need more RAM: >> ZFS can''t handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS >> use memory to allocate and manage L2ARC. > > Is there a guideline in which relation L2ARC size should be to RAM?Approximately 200 bytes per record. I use the following example: Suppose we use a Seagate LP 2 TByte disk for the L2ARC + Disk has 3,907,029,168 512 byte sectors, guaranteed + Workload uses 8 kByte fixed record size RAM needed for arc_buf_hdr entries + Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes Don''t underestimate the RAM needed for large L2ARCs -- richard> > I could upgrade the server to 8GB, but that''s the maximum the i975X chipset can handle. > > Best Regards, > Felix Buenemann > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Am 12.02.10 18:17, schrieb Richard Elling:> On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote: > >> Hi Micka?l, >> >> Am 12.02.10 13:49, schrieb Micka?l Maillot: >>> Intel X-25 M are MLC not SLC, there are very good for L2ARC. >> >> Yes, I''m only using those for L2ARC, I''m planing on getting to Mtron Pro 7500 16GB SLC SSDs for ZIL. >> >>> and next, you need more RAM: >>> ZFS can''t handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS >>> use memory to allocate and manage L2ARC. >> >> Is there a guideline in which relation L2ARC size should be to RAM? > > Approximately 200 bytes per record. I use the following example: > Suppose we use a Seagate LP 2 TByte disk for the L2ARC > + Disk has 3,907,029,168 512 byte sectors, guaranteed > + Workload uses 8 kByte fixed record size > RAM needed for arc_buf_hdr entries > + Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes > > Don''t underestimate the RAM needed for large L2ARCsI''m not sure how your workload record size plays into above formula (where does - 9232 come from?), but given I''ve got ~300GB L2ARC, I''d need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the L2ARC.> -- richard > >> >> I could upgrade the server to 8GB, but that''s the maximum the i975X chipset can handle. >> >> Best Regards, >> Felix Buenemann >>- Felix
On 02/12/10 09:36, Felix Buenemann wrote:> given I''ve got ~300GB L2ARC, I''d > need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the > L2ARC.But that would only leave ~800MB free for everything else the server needs to do. - Bill
On Feb 12, 2010, at 9:36 AM, Felix Buenemann wrote:> Am 12.02.10 18:17, schrieb Richard Elling: >> On Feb 12, 2010, at 8:20 AM, Felix Buenemann wrote: >> >>> Hi Micka?l, >>> >>> Am 12.02.10 13:49, schrieb Micka?l Maillot: >>>> Intel X-25 M are MLC not SLC, there are very good for L2ARC. >>> >>> Yes, I''m only using those for L2ARC, I''m planing on getting to Mtron Pro 7500 16GB SLC SSDs for ZIL. >>> >>>> and next, you need more RAM: >>>> ZFS can''t handle 4x 80 Gb of L2ARC with only 4Gb of RAM because ZFS >>>> use memory to allocate and manage L2ARC. >>> >>> Is there a guideline in which relation L2ARC size should be to RAM? >> >> Approximately 200 bytes per record. I use the following example: >> Suppose we use a Seagate LP 2 TByte disk for the L2ARC >> + Disk has 3,907,029,168 512 byte sectors, guaranteed >> + Workload uses 8 kByte fixed record size >> RAM needed for arc_buf_hdr entries >> + Need = ~(3,907,029,168 - 9,232) * 200 / 16 = ~48 GBytes >> >> Don''t underestimate the RAM needed for large L2ARCs > > I''m not sure how your workload record size plays into above formula (where does - 9232 come from?), but given I''ve got ~300GB L2ARC, I''d need about 7.2GB RAM, so upgrading to 8GB would be enough to satisfy the L2ARC.recordsize=8kB=16 sectors @ 512 bytes/sector 9,232 is the number of sectors reserved for labels, around 4.75 MBytes Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation): size (GB) 300 size (sectors) 585937500 labels (sectors) 9232 available sectors 585928268 bytes/L2ARC header 200 recordsize (sectors) recordsize (kBytes) L2ARC capacity (records) Header size (MBytes) 1 0.5 585928268 111,760 2 1 292964134 55,880 4 2 146482067 27,940 8 4 73241033 13,970 16 8 36620516 6,980 32 16 18310258 3,490 64 32 9155129 1,750 128 64 4577564 870 256 128 2288782 440 So, depending on the data, you need somewhere between 440 MBytes and 111 GBytes to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 40% of the total used size. Ok, that rule really isn''t very useful... The next question is, what does my data look like? The answer is that there will most likely be a distribution of various sized record. But the distribution isn''t as interesting for this calculation than the actual number of records. I''m not sure there is an easy way to get that information, but I''ll look around... -- richard
On Fri, Feb 12, 2010 at 11:26:33AM -0800, Richard Elling wrote:> Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation): > size (GB) 300 > size (sectors) 585937500 > labels (sectors) 9232 > available sectors 585928268 > bytes/L2ARC header 200 > > recordsize (sectors) recordsize (kBytes) L2ARC capacity (records) Header size (MBytes) > 1 0.5 585928268 111,760 > 2 1 292964134 55,880 > 4 2 146482067 27,940 > 8 4 73241033 13,970 > 16 8 36620516 6,980 > 32 16 18310258 3,490 > 64 32 9155129 1,750 > 128 64 4577564 870 > 256 128 2288782 440 > > So, depending on the data, you need somewhere between 440 MBytes and 111 GBytes > to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 40% > of the total used size. Ok, that rule really isn''t very useful...All that precision up-front for such a broad conclusion.. bummer :) I''m interested in a better rule of thumb, for rough planning purposes. As previously noted, I''m especially interesed in the combination with dedup, where DDT entries need to be cached. What''s the recordsize for L2ARC-of-on-disk-DDT, and how does that bias the overhead %age above? I''m also interested in a more precise answer to a different question, later on. Lets say I already have an L2ARC, running and warm. How do I tell how much is being used? Presumably, if it''s not full, RAM to manage it is the constraint - how can I confirm that and how can I tell how much RAM is currently used? If I can observe these figures, I can tell if I''m wasting ssd space that can''t be used. Either I can reallocate that space or know that adding RAM will have an even bigger benefit (increasing both primary and secondary cache sizes). Maybe I can even decide that L2ARC is not worth it for this box (especially if it can''t fit any more RAM). Finally, how smart is L2ARC at optimising this usage? If it''s under memory pressure, does it prefer to throw out smaller records in favour of larger more efficient ones? My current rule of thumb for all this, absent better information, is that you should just have gobs of RAM (no surprise there) but that if you can''t, then dedup seems to be most worthwhile when the pool itself is on ssd, no l2arc. Say, a laptop. Here, you care most about saving space and the IO overhead costs least. We need some thumbs in between these extremes. :-( -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100213/f4e32d82/attachment.bin>
Brendan Gregg - Sun Microsystems
2010-Feb-12 22:20 UTC
[zfs-discuss] ZFS ZIL + L2ARC SSD Setup
G''Day, On Sat, Feb 13, 2010 at 09:02:58AM +1100, Daniel Carosone wrote:> On Fri, Feb 12, 2010 at 11:26:33AM -0800, Richard Elling wrote: > > Mathing aorund a bit, for a 300 GB L2ARC (apologies for the tab separation): > > size (GB) 300 > > size (sectors) 585937500 > > labels (sectors) 9232 > > available sectors 585928268 > > bytes/L2ARC header 200 > > > > recordsize (sectors) recordsize (kBytes) L2ARC capacity (records) Header size (MBytes) > > 1 0.5 585928268 111,760 > > 2 1 292964134 55,880 > > 4 2 146482067 27,940 > > 8 4 73241033 13,970 > > 16 8 36620516 6,980 > > 32 16 18310258 3,490 > > 64 32 9155129 1,750 > > 128 64 4577564 870 > > 256 128 2288782 440 > > > > So, depending on the data, you need somewhere between 440 MBytes and 111 GBytes > > to hold the L2ARC headers. For a rule of thumb, somewhere between 0.15% and 40% > > of the total used size. Ok, that rule really isn''t very useful... > > All that precision up-front for such a broad conclusion.. bummer :) > > I''m interested in a better rule of thumb, for rough planning > purposes. As previously noted, I''m especially interesed in theI use 2.5% for an 8 Kbyte record size. ie, for every 1 Gbyte of L2ARC, about 25 Mbytes of ARC is consumed. I don''t recommand other record sizes since: - the L2ARC is currently intended for random I/O workloads. Such workloads usually have small record sizes, such as 8 Kbytes. Larger record sizes (such as the 128 Kbyte default) is better for streaming workloads. The L2ARC doesn''t currently touch streaming workloads (l2arc_noprefetch=1). - The best performance from SSDs is with smaller I/O sizes, not larger. I get about 3200 x 8 Kbyte read I/O from my current L2ARC devices, yet only about 750 x 128 Kbyte read I/O from the same devices. - smaller than 4 Kbyte record sizes leads to a lot of ARC headers and worse streaming performance. I wouldn''t tune it smaller unless I had to for some reason. So, from the table above I''d only really consider the 4 to 32 Kbyte size range. 4 Kbytes if you really wanted a smaller record size, and 32 Kbytes if you had limited DRAM you wanted to conserve (at the trade-off of SSD performance.) Brendan> combination with dedup, where DDT entries need to be cached. What''s > the recordsize for L2ARC-of-on-disk-DDT, and how does that bias the > overhead %age above? > > I''m also interested in a more precise answer to a different question, > later on. Lets say I already have an L2ARC, running and warm. How do > I tell how much is being used? Presumably, if it''s not full, RAM > to manage it is the constraint - how can I confirm that and how can I > tell how much RAM is currently used? > > If I can observe these figures, I can tell if I''m wasting ssd space > that can''t be used. Either I can reallocate that space or know that > adding RAM will have an even bigger benefit (increasing both primary > and secondary cache sizes). Maybe I can even decide that L2ARC is not > worth it for this box (especially if it can''t fit any more RAM). > > Finally, how smart is L2ARC at optimising this usage? If it''s under > memory pressure, does it prefer to throw out smaller records in favour > of larger more efficient ones? > > My current rule of thumb for all this, absent better information, is > that you should just have gobs of RAM (no surprise there) but that if > you can''t, then dedup seems to be most worthwhile when the pool itself > is on ssd, no l2arc. Say, a laptop. Here, you care most about saving > space and the IO overhead costs least. > > We need some thumbs in between these extremes. :-( > > -- > Dan.> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Brendan Gregg, Fishworks http://blogs.sun.com/brendan