Are any of you using the Intel 320 as ZIL? It''s MLC based, but I understand its wear and performance characteristics can be bumped up significantly by increasing the overprovisioning to 20% (dropping usable capacity to 80%). Anyone have experience with this? Ray
On 08/12/11 08:00 AM, Ray Van Dolson wrote:> Are any of you using the Intel 320 as ZIL? It''s MLC based, but I > understand its wear and performance characteristics can be bumped up > significantly by increasing the overprovisioning to 20% (dropping > usable capacity to 80%). >A log device doesn''t have to be larger than a few GB, so that shouldn''t be a problem. I''ve found even low cost SSDs make a huge difference to the NFS write performance of a pool. -- Ian.
On Thu, Aug 11, 2011 at 01:10:07PM -0700, Ian Collins wrote:> On 08/12/11 08:00 AM, Ray Van Dolson wrote: > > Are any of you using the Intel 320 as ZIL? It''s MLC based, but I > > understand its wear and performance characteristics can be bumped up > > significantly by increasing the overprovisioning to 20% (dropping > > usable capacity to 80%). > > > A log device doesn''t have to be larger than a few GB, so that shouldn''t > be a problem. I''ve found even low cost SSDs make a huge difference to > the NFS write performance of a pool.We''ve been using the X-25E (SLC-based). It''s getting hard to find, and since we''re trying to stick to Intel drives (Nexenta certifies them), and Intel doesn''t have a new SLC drive available until late September, we''re hoping an overprovisioned 320 could fill the gap until then and perform at least as well as the X-25E. Ray
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Ray Van Dolson > > Are any of you using the Intel 320 as ZIL? It''s MLC based, but I > understand its wear and performance characteristics can be bumped up > significantly by increasing the overprovisioning to 20% (dropping > usable capacity to 80%). > > Anyone have experience with this?I think most purposes are actually better suited to disabling the ZIL completely. But of course you need to understand it and make an intelligent decision yourself in your particular case. Figure it like this... Suppose you have a 6Gbit bus. Suppose you have an old OS which flushes TXG''s maximum every 30 sec (as opposed to the more current 5 sec)... that means the absolute max data you could possibly have sitting in the log device is 6gbit * 30sec = 180Gbit = 22 Gbytes. Leave yourself some breathing room, and figure a comfortable size is 30G usable. Intel 320''s look like they start at 40G, so you''re definitely safe overprovisioning 25% or higher. I cannot speak to any actual performance increase resulting from this tweak.
Which 320 series drive are you targeting, specifically? The ~$100 80GB variant should perform as well as the more expensive versions if your workload is more random from what I''ve seen/read. -- This message posted from opensolaris.org
On Thu, Aug 11, 2011 at 09:17:38PM -0700, Cooper Hubbell wrote:> Which 320 series drive are you targeting, specifically? The ~$100 > 80GB variant should perform as well as the more expensive versions if > your workload is more random from what I''ve seen/read.ESX NFS-attached datastore activity. Probably up to 100 VM''s (about the same as we did with the X-25E). Larger drives would let us set overcommit pretty high :) For ZIL, I suppose we could get the 300GB drive and overcommit to 95%! Ray
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Ray Van Dolson > > For ZIL, I > suppose we could get the 300GB drive and overcommit to 95%!What kind of benefit does that offer? I suppose, if you have a 300G drive and the OS can only see 30G of it, then the drive can essentially treat all the other 290G as having been TRIM''d implicitly, even if your OS doesn''t support TRIM. It is certainly conceivable this could make a big difference. Have you already tested it? Anybody? Or is it still just theoretical performance enhancement, compared to using a "normal" sized drive in a normal mode?
On 08/13/11 01:53 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Ray Van Dolson >> >> For ZIL, I >> suppose we could get the 300GB drive and overcommit to 95%! > What kind of benefit does that offer? I suppose, if you have a 300G drive > and the OS can only see 30G of it, then the drive can essentially treat all > the other 290G as having been TRIM''d implicitly, even if your OS doesn''t > support TRIM. It is certainly conceivable this could make a big difference. > > > Have you already tested it? Anybody? Or is it still just theoretical > performance enhancement, compared to using a "normal" sized drive in a > normal mode? >How would you test it? I guess you would need two pools, 2 SSDs (to compare) and a lot of small, sync writes. -- Ian.
> From: Ian Collins [mailto:ian at ianshome.com] > Sent: Friday, August 12, 2011 11:24 PM > > >> For ZIL, I > >> suppose we could get the 300GB drive and overcommit to 95%! > > What kind of benefit does that offer? I suppose, if you have a 300Gdrive> > and the OS can only see 30G of it, then the drive can essentially treatall> > the other 290G as having been TRIM''d implicitly, even if your OS doesn''t > > support TRIM. It is certainly conceivable this could make a bigdifference.> > > > > > Have you already tested it? Anybody? Or is it still just theoretical > > performance enhancement, compared to using a "normal" sized drive in a > > normal mode? > > > How would you test it? I guess you would need two pools, 2 SSDs (to > compare) and a lot of small, sync writes.I would say, one pool, one SSD. Run iozone. Then swap the SSD, repeat iozone. Some people will say bonnie instead of iozone. In this case, I think either one is fine.
Over provisioning does not directly increase flash performance, but allows for greater reliability as the drive ages by improving garbage collection efforts and reducing write amplification. This article doesn''t provide any sources, but it explains the concept at a very basic level - http://thessdreview.com/ssd-guides/optimization-guides/ssd-performance-loss-and-its-solution/ . This thread contains quite a bit of testing and analysis regarding performance of several different SSDs under constant, 100% write workloads. Some of the drives have had close to 300TiB of writes and are still kicking - http://www.xtremesystems.org/forums/showthread.php?271063-SSD-Write-Endurance-25nm-Vs-34nm. The tests were all conducted under Windows with TRIM, however, so this isn''t directly applicable to using a SSD for a ZIL. On Fri, Aug 12, 2011 at 8:53 PM, Edward Ned Harvey < opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Ray Van Dolson > > > > For ZIL, I > > suppose we could get the 300GB drive and overcommit to 95%! > > What kind of benefit does that offer? I suppose, if you have a 300G drive > and the OS can only see 30G of it, then the drive can essentially treat all > the other 290G as having been TRIM''d implicitly, even if your OS doesn''t > support TRIM. It is certainly conceivable this could make a big > difference. > > > Have you already tested it? Anybody? Or is it still just theoretical > performance enhancement, compared to using a "normal" sized drive in a > normal mode? > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110813/9b0e24f4/attachment-0001.html>
On 08/14/11 12:51 AM, Edward Ned Harvey wrote:>> From: Ian Collins [mailto:ian at ianshome.com] >>> Have you already tested it? Anybody? Or is it still just theoretical >>> performance enhancement, compared to using a "normal" sized drive in a >>> normal mode? >> How would you test it? I guess you would need two pools, 2 SSDs (to >> compare) and a lot of small, sync writes. > I would say, one pool, one SSD. Run iozone. > Then swap the SSD, repeat iozone. > > Some people will say bonnie instead of iozone. In this case, I think either > one is fine. >But would that be an effective test? Unless your data volumes and rates are huge, only a few GB of the log device will be written, so any benefits from overprovisioning would not be seen. I thought wear and performance issues only show up over time, so a lengthy test would be required. I have a low cost Corsair SSD in use as a ZIL using an 8GB partition. I''ll try bonnie before and after filling the rest as a cache. -- Ian.
On Fri, Aug 12, 2011 at 06:53:22PM -0700, Edward Ned Harvey wrote:> > From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > > bounces at opensolaris.org] On Behalf Of Ray Van Dolson > > > > For ZIL, I > > suppose we could get the 300GB drive and overcommit to 95%! > > What kind of benefit does that offer? I suppose, if you have a 300G drive > and the OS can only see 30G of it, then the drive can essentially treat all > the other 290G as having been TRIM''d implicitly, even if your OS doesn''t > support TRIM. It is certainly conceivable this could make a big difference.Perhaps this is it. Pulled the recommendation from Intel''s Solid-State Drive 320 Series in Server Storage Applications whitepaper. Section 4.1: A small reduction in an SSD?s usable capacity can provide a large increase in random write performance and endurance. All Intel SSDs have more NAND capacity than what is available for user data. The unused capacity is called spare capacity. This area is reserved for internal operations. The larger the spare capacity, the more efficiently the SSD can perform random write operations and the higher the random write performance. On the Intel SSD 320 Series, the spare capacity reserved at the factory is 7% to 11% (depending on the SKU) of the full NAND capacity. For better random write performance and endurance, the spare capacity can be increased by reducing the usable capacity of the drive; this process is called over-provisioning.> > > Have you already tested it? Anybody? Or is it still just theoretical > performance enhancement, compared to using a "normal" sized drive in a > normal mode? >Haven''t yet tested it, but hope to shortly. Ray
On Aug 11, 2011, at 1:16 PM, Ray Van Dolson wrote:> On Thu, Aug 11, 2011 at 01:10:07PM -0700, Ian Collins wrote: >> On 08/12/11 08:00 AM, Ray Van Dolson wrote: >>> Are any of you using the Intel 320 as ZIL? It''s MLC based, but I >>> understand its wear and performance characteristics can be bumped up >>> significantly by increasing the overprovisioning to 20% (dropping >>> usable capacity to 80%). >>> >> A log device doesn''t have to be larger than a few GB, so that shouldn''t >> be a problem. I''ve found even low cost SSDs make a huge difference to >> the NFS write performance of a pool. > > We''ve been using the X-25E (SLC-based). It''s getting hard to find, and > since we''re trying to stick to Intel drives (Nexenta certifies them), > and Intel doesn''t have a new SLC drive available until late September, > we''re hoping an overprovisioned 320 could fill the gap until then and > perform at least as well as the X-25E.The 320 has not yet passed qualification testing at Nexenta. -- richard
On Mon, August 15, 2011 12:25, Ray Van Dolson wrote:> Perhaps this is it. Pulled the recommendation from Intel''s Solid-State > Drive 320 Series in Server Storage Applications whitepaper. > > Section 4.1:[...]> On the Intel SSD 320 Series, the spare capacity reserved at the > factory is 7% to 11% (depending on the SKU) of the full NAND > capacity. For better random write performance and endurance, the > spare capacity can be increased by reducing the usable capacity of > the drive; this process is called over-provisioning.So this is hard-coded at the factory, and one must ''decode'' the SKU to determine how much is set aside? Are the different SKU''s values documented somewhere?
On Thu, Aug 11, 2011 at 1:00 PM, Ray Van Dolson <rvandolson at esri.com> wrote:> Are any of you using the Intel 320 as ZIL? ?It''s MLC based, but I > understand its wear and performance characteristics can be bumped up > significantly by increasing the overprovisioning to 20% (dropping > usable capacity to 80%).Intel recently added the 311, a small SLC-based drive for use as a temp cache with their Z68 platform. It''s limited to 20GB, but it might be a better fit for use as a ZIL than the 320. -B -- Brandon High : bhigh at freaks.com
On Mon, Aug 15, 2011 at 01:38:36PM -0700, Brandon High wrote:> On Thu, Aug 11, 2011 at 1:00 PM, Ray Van Dolson <rvandolson at esri.com> wrote: > > Are any of you using the Intel 320 as ZIL? ?It''s MLC based, but I > > understand its wear and performance characteristics can be bumped up > > significantly by increasing the overprovisioning to 20% (dropping > > usable capacity to 80%). > > Intel recently added the 311, a small SLC-based drive for use as a > temp cache with their Z68 platform. It''s limited to 20GB, but it might > be a better fit for use as a ZIL than the 320. > > -BLooks interesting... specs around the same as the old X-25E. We have heard however, that Intel will be announcing a true successor to their X-25E line shortly. Ray
> From: Ray Van Dolson [mailto:rvandolson at esri.com] > Sent: Monday, August 15, 2011 12:26 PM > > On the Intel SSD 320 Series, the spare capacity reserved at the > factory is 7% to 11% (depending on the SKU) of the full NAND > capacity. For better random write performance and endurance, the > spare capacity can be increased by reducing the usable capacity of > the drive; this process is called over-provisioning.I have a sneaking suspicion that you''ll see the greatest performance when it''s more than 50% overprovisioned. (Say, 55% or so). That will guarantee at all times, there''s plenty of unused space available, which the drive can do GC on, even though the OS doesn''t say anything like "TRIM" to the drive. Specifically I say over 50% because 8k pages, and 4k blocks.
On Mon, Aug 15, 2011 at 2:07 PM, Ray Van Dolson <rvandolson at esri.com> wrote:> Looks interesting... specs around the same as the old X-25E. ?We have > heard however, that Intel will be announcing a true successor to their > X-25E line shortly.I think it''s the 710 and 720 that you''re referring to. The 710 is MLC-HET (high endurance) and will be in 100/200/300GB capacities. The 720 is SLC, but a PCIe interface and will be 200/400GB capacity. I don''t imagine either will be very cheap. -B -- Brandon High : bhigh at freaks.com
On Mon, Aug 15, 2011 at 01:38:36PM -0700, Brandon High wrote:> On Thu, Aug 11, 2011 at 1:00 PM, Ray Van Dolson <rvandolson at esri.com> wrote: > > Are any of you using the Intel 320 as ZIL? ?It''s MLC based, but I > > understand its wear and performance characteristics can be bumped up > > significantly by increasing the overprovisioning to 20% (dropping > > usable capacity to 80%). > > Intel recently added the 311, a small SLC-based drive for use as a > temp cache with their Z68 platform. It''s limited to 20GB, but it might > be a better fit for use as a ZIL than the 320.Works fine over here (Nexenta Core 3.1). -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
On 8/15/11 12:50 PM, David Magda wrote:> On Mon, August 15, 2011 12:25, Ray Van Dolson wrote: >>> On the Intel SSD 320 Series, the spare capacity reserved at the >> factory is 7% to 11% (depending on the SKU) of the full NAND >> capacity. For better random write performance and endurance, the >> spare capacity can be increased by reducing the usable capacity of >> the drive; this process is called over-provisioning. > > So this is hard-coded at the factory, and one must ''decode'' the SKU to > determine how much is set aside? Are the different SKU''s values documented > somewhere?Usually you can guess, given that there are 10 channels and flash chips are powers of two: the 80 GB model has 80 GiB of flash, the 300 GB model has 320 GiB, etc. Apparently the 120 GB model has something like 137 GB (?!), though. If you really want to know for sure you can find a teardown on the Web. You can then over-over-provision down to any size you want by setting the HPA (I''ve only done this under Linux using hdparm; not sure how it''s done in Solaris). Wes Felter