Hi all, I''ve run into the classic NFS performance bottleneck when the ZIL is enabled and having no fast, dedicated ZIL device. Being on a budget I concluded that an X25-E would be my best option, but there is still the concern that its write cache is not battery-backed and that a corrupt ZIL is a big problem. The server''s power supply is backed by a UPS, but of course the PSU could just burn out. (I am also considering ACARD but don''t yet have total confidence in its'' reliability) What about powering the X25-E by an external power source, one that is also solid-state and backed by a UPS? In my experience, smaller power supplies tend to be much more reliable than typical ATX supplies. For example, something like this- http://www.addonics.com/products/power_adapter/aasaps.asp or even more reliable would be a PicoPSU w/ a hack to make sure that the power is always on. Has anyone tried something like this? Powering ZILs using a second, more reliable PSU? Thoughts? Thanks, -- This message posted from opensolaris.org
> What about powering the X25-E by an external power source, one that is also solid-state and backed by a UPS? In my experience, smaller power supplies tend to be much more reliable than typical ATX supplies.I don''t think the different PSU would be an issue, The supply you''ve linked doesn''t seem to care about linking grounds together.> or even more reliable would be a PicoPSU w/ a hack to make sure that the power is always on. > > Has anyone tried something like this? Powering ZILs using a second, more reliable PSU? Thoughts?I hacked up a PicoPSU for robotics use (running off +24V and providing +5/+3.3); your "always-on" should be as easy as shorting the green-black wires (short Pin 14 to ground) with a little solder jumper. But wouldn''t you need some type of reset trigger for when the system is reset? Or is that performed by the SATA controller?
On 11/26/2010 1:11 PM, Krunal Desai wrote:>> What about powering the X25-E by an external power source, one that is also solid-state and backed by a UPS? In my experience, smaller power supplies tend to be much more reliable than typical ATX supplies. > I don''t think the different PSU would be an issue, The supply you''ve linked doesn''t seem to care about linking grounds together. > >> or even more reliable would be a PicoPSU w/ a hack to make sure that the power is always on. >> >> Has anyone tried something like this? Powering ZILs using a second, more reliable PSU? Thoughts? > I hacked up a PicoPSU for robotics use (running off +24V and providing +5/+3.3); your "always-on" should be as easy as shorting the green-black wires (short Pin 14 to ground) with a little solder jumper. > > But wouldn''t you need some type of reset trigger for when the system is reset? Or is that performed by the SATA controller? > _______________________________________________Frankly, adding something to the controller card (and, that''s where you''d have to put it, since just providing UPS power to the SSD wouldn''t be sufficient) is going to be a nightmare, and I would suspect ultimately creates more unreliability and failure than it solves. I''ve gone to using an OCZ Vertex 2 EX, which has a supercapacitor on-board to enable full consistency in case of a power outage. OCZSSD2-2VTXEX50G It''s not cheap ($800 / 50G), and you *really* want to make sure you get the 4k alignment right, but I haven''t had any real problems with it. I haven''t had a chance to test a Vertex 2 PRO against my 2 EX, and I''d be interested if anyone else has. The EX is SLC-based, and the PRO is MLC-based, but the claimed performance numbers are similar. If the PRO works well, it''s less than half the cost, and would be a nice solution for most users who don''t need ultra-super-performance from their ZIL. The DDRdrive is still the way to go for the ultimate ZIL accelleration, but it''s pricey as hell. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
> I haven''t had a chance to test a Vertex 2 PRO against my 2 EX, and I''d > be interested if anyone else has.I recently presented at the OpenStorage Summit 2010 and compared exactly the three devices you mention in your post (Vertex 2 EX, Vertex 2 Pro, and the DDRdrive X1) as ZIL Accelerators. Jump to slide 37 for the write IOPS benchmarks: http://www.ddrdrive.com/zil_accelerator.pdf> and you *really* want to make sure you get the 4k alignment rightExcellent point, starting on slide 66 the performance impact of partition misalignment is illustrated. Considering the results, longevity might be an even greater concern than decreased IOPS performance as ZIL acceleration is a worst case scenario for a Flash based SSD.> The DDRdrive is still the way to go for the ultimate ZIL accelleration, > but it''s pricey as hell.In addition to product cost, I believe IOPS/$ is a relevant point of comparison. Google products gives the price range for the OCZ 50GB SSDs: Vertex 2 EX (OCZSSD2-2VTXEX50G: $870 - $1,011 USD) Vertex 2 Pro (OCZSSD2-2VTXP50G: $399 - $525 USD) 4KB Sustained and Aligned Mixed Write IOPS results (See pdf above): Vertex 2 EX (6325 IOPS) Vertex 2 Pro (3252 IOPS) DDRdrive X1 (38701 IOPS) Using the lowest online price for both the Vertex 2 EX and Vertex 2 Pro, and the full list price (SRP) of the DDRdrive X1. IOPS/Dollar($): Vertex 2 EX (6325 IOPS / $870) = 7.27 Vertex 2 Pro (3252 IOPS / $399) = 8.15 DDRdrive X1 (38701 IOPS / $1,995) = 19.40 Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org
On Sat, Nov 27, 2010 at 9:34 AM, Christopher George <cgeorge at ddrdrive.com>wrote:> > I haven''t had a chance to test a Vertex 2 PRO against my 2 EX, and I''d > > be interested if anyone else has. > > I recently presented at the OpenStorage Summit 2010 and compared > exactly the three devices you mention in your post (Vertex 2 EX, > Vertex 2 Pro, and the DDRdrive X1) as ZIL Accelerators. > > Jump to slide 37 for the write IOPS benchmarks: > > http://www.ddrdrive.com/zil_accelerator.pdf > > > and you *really* want to make sure you get the 4k alignment right > > Excellent point, starting on slide 66 the performance impact of partition > misalignment is illustrated. Considering the results, longevity might be > an even greater concern than decreased IOPS performance as ZIL > acceleration is a worst case scenario for a Flash based SSD. > > > The DDRdrive is still the way to go for the ultimate ZIL accelleration, > > but it''s pricey as hell. > > In addition to product cost, I believe IOPS/$ is a relevant point of > comparison. > > Google products gives the price range for the OCZ 50GB SSDs: > Vertex 2 EX (OCZSSD2-2VTXEX50G: $870 - $1,011 USD) > Vertex 2 Pro (OCZSSD2-2VTXP50G: $399 - $525 USD) > > 4KB Sustained and Aligned Mixed Write IOPS results (See pdf above): > Vertex 2 EX (6325 IOPS) > Vertex 2 Pro (3252 IOPS) > DDRdrive X1 (38701 IOPS) > > Using the lowest online price for both the Vertex 2 EX and Vertex 2 Pro, > and the full list price (SRP) of the DDRdrive X1. > > IOPS/Dollar($): > Vertex 2 EX (6325 IOPS / $870) = 7.27 > Vertex 2 Pro (3252 IOPS / $399) = 8.15 > DDRdrive X1 (38701 IOPS / $1,995) = 19.40 > > Best regards, >Why would you disable TRIM on an SSD benchmark? I can''t imagine anyone intentionally crippling their drive in the real-world. Furthermore, I don''t think "1 hour sustained" is a very accurate benchmark. Most workloads are bursty in nature. If you''re doing sustained high-IOPS workloads like that, the back-end is going to fall over and die long before the hour time-limit. Your 38k IOPS would need nearly 500 drives to sustain that workload with any kind of decent latency. If you''ve got 500 drives, you''re going to want a hell of a lot more ZIL space than the ddrdrive currently provides. I''m all for benchmarks, but try doing something a bit more realistic. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101127/ef41d161/attachment.html>
That''s a great deck, Chris. -marc Sent from my iPhone On 2010-11-27, at 10:34 AM, Christopher George <cgeorge at ddrdrive.com> wrote:>> I haven''t had a chance to test a Vertex 2 PRO against my 2 EX, and I''d >> be interested if anyone else has. > > I recently presented at the OpenStorage Summit 2010 and compared > exactly the three devices you mention in your post (Vertex 2 EX, > Vertex 2 Pro, and the DDRdrive X1) as ZIL Accelerators. > > Jump to slide 37 for the write IOPS benchmarks: > > http://www.ddrdrive.com/zil_accelerator.pdf > >> and you *really* want to make sure you get the 4k alignment right > > Excellent point, starting on slide 66 the performance impact of partition > misalignment is illustrated. Considering the results, longevity might be > an even greater concern than decreased IOPS performance as ZIL > acceleration is a worst case scenario for a Flash based SSD. > >> The DDRdrive is still the way to go for the ultimate ZIL accelleration, >> but it''s pricey as hell. > > In addition to product cost, I believe IOPS/$ is a relevant point of comparison. > > Google products gives the price range for the OCZ 50GB SSDs: > Vertex 2 EX (OCZSSD2-2VTXEX50G: $870 - $1,011 USD) > Vertex 2 Pro (OCZSSD2-2VTXP50G: $399 - $525 USD) > > 4KB Sustained and Aligned Mixed Write IOPS results (See pdf above): > Vertex 2 EX (6325 IOPS) > Vertex 2 Pro (3252 IOPS) > DDRdrive X1 (38701 IOPS) > > Using the lowest online price for both the Vertex 2 EX and Vertex 2 Pro, > and the full list price (SRP) of the DDRdrive X1. > > IOPS/Dollar($): > Vertex 2 EX (6325 IOPS / $870) = 7.27 > Vertex 2 Pro (3252 IOPS / $399) = 8.15 > DDRdrive X1 (38701 IOPS / $1,995) = 19.40 > > Best regards, > > Christopher George > Founder/CTO > www.ddrdrive.com > -- > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Why would you disable TRIM on an SSD benchmark?Because ZFS does *not* support TRIM, so the benchmarks are configured to replicate actual ZIL Accelerator workloads.> If you''re doing sustained high-IOPS workloads like that, the > back-end is going to fall over and die long before the hour time-limit.The reason the graphs are done in a time line fashion is so you look at any point in the 1 hour series to see how each device performs. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org
On Sat, Nov 27, 2010 at 2:24 PM, Christopher George <cgeorge at ddrdrive.com>wrote:> > Why would you disable TRIM on an SSD benchmark? > > Because ZFS does *not* support TRIM, so the benchmarks > are configured to replicate actual ZIL Accelerator workloads. > > > If you''re doing sustained high-IOPS workloads like that, the > > back-end is going to fall over and die long before the hour time-limit. > > The reason the graphs are done in a time line fashion is so you look > at any point in the 1 hour series to see how each device performs. > > Best regards, > > >TRIM was putback in July... You''re telling me it didn''t make it into S11 Express? http://mail.opensolaris.org/pipermail/onnv-notify/2010-July/012674.html --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101127/d6bf9aa0/attachment.html>
http://bugs.opensolaris.org/view_bug.do?bug_id=6866610 Based on this, it should be in OpenIndiana already. But just because support exists does not mean ZFS (specifically the ZIL) takes advantage of it, right? I do think that Chris''s slides, while doing a good job of showing how the DDRDrive is better than some good flash SSDs, also suggest that TRIM support would be very helpful to SATA SSDs as SLOG devices. -Will ________________________________ From: zfs-discuss-bounces at opensolaris.org [zfs-discuss-bounces at opensolaris.org] on behalf of Tim Cook [tim at cook.ms] Sent: Saturday, November 27, 2010 3:30 PM To: Christopher George Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Ext. UPS-backed SATA SSD ZIL? On Sat, Nov 27, 2010 at 2:24 PM, Christopher George <cgeorge at ddrdrive.com<mailto:cgeorge at ddrdrive.com>> wrote:> Why would you disable TRIM on an SSD benchmark?Because ZFS does *not* support TRIM, so the benchmarks are configured to replicate actual ZIL Accelerator workloads.> If you''re doing sustained high-IOPS workloads like that, the > back-end is going to fall over and die long before the hour time-limit.The reason the graphs are done in a time line fashion is so you look at any point in the 1 hour series to see how each device performs. Best regards, TRIM was putback in July... You''re telling me it didn''t make it into S11 Express? http://mail.opensolaris.org/pipermail/onnv-notify/2010-July/012674.html --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101127/d9ae5f84/attachment-0001.html>
> TRIM was putback in July... You''re telling me it didn''t make it into S11 > Express?Without top level ZFS TRIM support, SATA Framework (sata.c) support has no bearing on this discussion. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Christopher George > > Jump to slide 37 for the write IOPS benchmarks: > > http://www.ddrdrive.com/zil_accelerator.pdfAnybody who designs or works with NAND (flash) at a low level knows it can''t possibly come close to the sustainable speed of ram, except in corner cases where all the stars are aligned perfectly in favor of the NAND. Think how fast your system can fill its system ram, and then think how fast it can fill an equivalently sized hard drive. If bus speed was actually the limiting factor (and it isn''t for any SSD that I know) ... You''ve got NUMA to system ram, you''ve got NUMA to PCIe to DDRDrive, and you''ve got NUM to PCIe to SATA to the SSD. Where you can''t even fully utilize the SATA bus because the SSD can''t keep up. The above result isn''t the slightest bit surprising to me. The SSD manufacturers report maximum statistics that aren''t typical or sustainable under anything resembling typical usage. I think the SSD''s can actually live up to their claims if (a) they have a read-mostly workload, and either (b)(1) they have large sequential operations mostly, or (b)(2) they have random operations which are suitably sized to match the geometry of the NAND cells internally.
> Furthermore, I don''t think "1 hour sustained" is a very accurate benchmark. > Most workloads are bursty in nature.The IOPS degradation is additive, the length of the first and second one hour sustained period is completely arbitrary. The take away from slides 1 and 2 is drive inactivity has no effect on the eventual outcome. So with either a bursty or sustained workload the end result is always the same, dramatic write IOPS degradation after unpackaging or secure erase of the tested Flash based SSDs. Best regards, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org
On 11/27/2010 6:50 PM, Christopher George wrote:>> Furthermore, I don''t think "1 hour sustained" is a very accurate benchmark. >> Most workloads are bursty in nature. > The IOPS degradation is additive, the length of the first and second one hour > sustained period is completely arbitrary. The take away from slides 1 and 2 is > drive inactivity has no effect on the eventual outcome. So with either a bursty > or sustained workload the end result is always the same, dramatic write IOPS > degradation after unpackaging or secure erase of the tested Flash based SSDs. > > Best regards, > > Christopher George > Founder/CTO > www.ddrdrive.comWithout commenting on other threads, I often seen sustained IO in my setups for extended periods of time - particularly, small IO which eats up my IOPS. At this moment, I run with ZIL turned off for that pool, as it''s a scratch pool and I don''t care if it gets corrupted. I suspect that a DDRdrive or one of the STEC Zeus drives might help me, but I can overwhelm any other SSD quickly. I''m doing compiles of the JDK, with a single backed ZFS system handing the files for 20-30 clients, each trying to compile a 15 million-line JDK at the same time. Lots and lots of small I/O. :-) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Sat, Nov 27, 2010 at 9:29 PM, Erik Trimble <erik.trimble at oracle.com>wrote:> On 11/27/2010 6:50 PM, Christopher George wrote: > >> Furthermore, I don''t think "1 hour sustained" is a very accurate >>> benchmark. >>> Most workloads are bursty in nature. >>> >> The IOPS degradation is additive, the length of the first and second one >> hour >> sustained period is completely arbitrary. The take away from slides 1 and >> 2 is >> drive inactivity has no effect on the eventual outcome. So with either a >> bursty >> or sustained workload the end result is always the same, dramatic write >> IOPS >> degradation after unpackaging or secure erase of the tested Flash based >> SSDs. >> >> Best regards, >> >> Christopher George >> Founder/CTO >> www.ddrdrive.com >> > > Without commenting on other threads, I often seen sustained IO in my setups > for extended periods of time - particularly, small IO which eats up my IOPS. > At this moment, I run with ZIL turned off for that pool, as it''s a scratch > pool and I don''t care if it gets corrupted. I suspect that a DDRdrive or one > of the STEC Zeus drives might help me, but I can overwhelm any other SSD > quickly. > > I''m doing compiles of the JDK, with a single backed ZFS system handing the > files for 20-30 clients, each trying to compile a 15 million-line JDK at the > same time. > > Lots and lots of small I/O. > > :-) > > >Sounds like you need lots and lots of 15krpm drives instead of 7200rpm SATA ;) --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101127/e10a1a03/attachment-0001.html>
>>TRIM was putback in July... You''re telling me it didn''t make it into S11 >Express? > >http://mail.opensolaris.org/pipermail/onnv-notify/2010-July/012674.htmlIt looks like this refers to the ability to use the TRIM command, but ZFS doesn''t: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6957655
> I''m doing compiles of the JDK, with a single backed ZFS system handing > the files for 20-30 clients, each trying to compile a 15 million-line > JDK at the same time.Very cool application! Can you share any metrics, such as the aggregate size of source files compiled and the size of the resultant binaries? Thanks, Christopher George Founder/CTO www.ddrdrive.com -- This message posted from opensolaris.org
On 11/27/2010 7:42 PM, Tim Cook wrote:> > > On Sat, Nov 27, 2010 at 9:29 PM, Erik Trimble <erik.trimble at oracle.com > <mailto:erik.trimble at oracle.com>> wrote: > > On 11/27/2010 6:50 PM, Christopher George wrote: > > Furthermore, I don''t think "1 hour sustained" is a very > accurate benchmark. > Most workloads are bursty in nature. > > The IOPS degradation is additive, the length of the first and > second one hour > sustained period is completely arbitrary. The take away from > slides 1 and 2 is > drive inactivity has no effect on the eventual outcome. So > with either a bursty > or sustained workload the end result is always the same, > dramatic write IOPS > degradation after unpackaging or secure erase of the tested > Flash based SSDs. > > Best regards, > > Christopher George > Founder/CTO > www.ddrdrive.com <http://www.ddrdrive.com> > > > Without commenting on other threads, I often seen sustained IO in > my setups for extended periods of time - particularly, small IO > which eats up my IOPS. At this moment, I run with ZIL turned off > for that pool, as it''s a scratch pool and I don''t care if it gets > corrupted. I suspect that a DDRdrive or one of the STEC Zeus > drives might help me, but I can overwhelm any other SSD quickly. > > I''m doing compiles of the JDK, with a single backed ZFS system > handing the files for 20-30 clients, each trying to compile a 15 > million-line JDK at the same time. > > Lots and lots of small I/O. > > :-) > > > > Sounds like you need lots and lots of 15krpm drives instead of 7200rpm > SATA ;) > > --Tim >That''s the scary part. I''ve got 24 2.5" 15k SAS drives with a 512MB caching raid controller. Still gets hammered on my workload. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101128/393a06b4/attachment.html>
On 11/27/2010 11:08 PM, Christopher George wrote:>> I''m doing compiles of the JDK, with a single backed ZFS system handing >> the files for 20-30 clients, each trying to compile a 15 million-line >> JDK at the same time. > Very cool application! > > Can you share any metrics, such as the aggregate size of source files > compiled and the size of the resultant binaries? > > Thanks, > > Christopher George > Founder/CTO > www.ddrdrive.comMY biggest issue is that I eventually flood my network bandwidth. I''ve got 4 bonded GigE into my NFS server, and I''ll still overwhelm them all with my clients. It''s the JDK. Figure copy a 700MB tarball to each client machine, then explode that on an NFS-mounted directory. About 50,000 files, averaging under 4k each. Final binary size is not that big: figure 400MB total size, but intermediary size of ~4GB. Tar up the results and save them elsewhere. Erase the whole filesystem after the build is complete. Figure, 1 build on 8 platforms over 20 machines total takes 3 hours. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
On Sat, Nov 27, 2010 at 03:04:27AM -0800, Erik Trimble wrote: Hi,> > I haven''t had a chance to test a Vertex 2 PRO against my 2 EX, and I''d > be interested if anyone else has. The EX is SLC-based, and the PRO is > MLC-based, but the claimed performance numbers are similar. If the PRO > works well, it''s less than half the cost, and would be a nice solution > for most users who don''t need ultra-super-performance from their ZIL.Well, we''ll get some toys to play with in the next couple of weeks (i.e before x-mas), which probably allows us to mimic your env and do some testing, before they go into production (probably in March 2011). So if you or anybody else has a special setup to test, feel free to send me a note ... HW Details: a new X4540 + 9x OCZSSD2-2VTXP50G + 3x OCZSSD2-2VTXP50G + a SM server with an LSI 620J and 24x 15K SAS2-Drives (see http://iws.cs.uni-magdeburg.de/~elkner/supermicro/server.html) as well as a bunch of HP z400 Xeon W3680 based WS. Estimated delivery of 10G components is end of january 2011 (Nexus 5010 + some 3560X-nT-Ls).> The DDRdrive is still the way to go for the ultimate ZIL accelleration,Well, not for us, since full height cards are a "no go" for us and PCIe 1.x x1 ... Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768