I''m looking for alternatives SSD options to the Intel X25-E and the ZEUS IOPS. The ZEUS IOPS would probably cost as much as my entire current disk system (80 15k SAS drives)- and that''s just silly. The Intel is much less expensive, and while fast- pales in comparison to the ZEUS. I''ve allocated 4 disk slots in my array for ZIL SSD''s and I''m trying to find the best performance for my dollar. With that in mind- Is anyone using the new OCZ Vertex 2 SSD''s as a ZIL? http://www.ocztechnology.com/products/solid-state-drives/2-5--sata-ii/performance-enterprise-solid-state-drives/ocz-vertex-2-sata-ii-2-5--ssd.html They''re claiming 50k IOPS (4k Write- Aligned), 2 million hour MTBF, TRIM support, etc. That''s more write IOPS than the ZEUS (40k IOPS, $$$$$) but at half the price of an Intel X25-E (3.3k IOPS, $400). Needless to say I''d love to know if anyone has evaluated these drives to see if they make sense as a ZIL- for example- do they honor cache flush requests? Are those sustained IOPS numbers? -- This message posted from opensolaris.org
40k IOPS sounds like "best in case, you''ll never see it in the real world" marketing to me. There are a few benchmarks if you google and they all seem to indicate the performance is probably +/- 10% of an intel x25-e. I would personally trust intel over one of these drives. Is it even possible to buy a zeus iops anywhere? I haven''t been able to find one. I get the impression they mostly sell to other vendors like sun? I''d be curious what the price is on a 9GB zeus iops is these days? -- This message posted from opensolaris.org
Don wrote:> > With that in mind- Is anyone using the new OCZ Vertex 2 SSD''s as a ZIL? > > They''re claiming 50k IOPS (4k Write- Aligned), 2 million hour MTBF, TRIM support, etc. That''s more write IOPS than the ZEUS (40k IOPS, $$$$$) but at half the price of an Intel X25-E (3.3k IOPS, $400). > > Needless to say I''d love to know if anyone has evaluated these drives to see if they make sense as a ZIL- for example- do they honor cache flush requests? Are those sustained IOPS numbers?In my understanding nearly the only relevant number is the number of cache flushes a drive can handle per second, as this determines my single thread performance. Has anyone an idea what numbers I can expect from an Intel X25-E or an OCZ Vertex 2? -Arne
On Tue, May 18, 2010 at 4:28 PM, Don <don at blacksun.org> wrote:> With that in mind- Is anyone using the new OCZ Vertex 2 SSD''s as a ZIL?The current Sandforce drives out don''t have an ultra-capacitor on them, so they could lose data if the system crashed. There are supposed to be enterprise class drives based on the chipset out that do have an ultra-cap released "any day now".> Needless to say I''d love to know if anyone has evaluated these drives to see if they make sense as a ZIL- for example- do they honor cache flush requests? Are those sustained IOPS numbers?I don''t think they do, the chipset was designed to use an ultra-cap to avoid having to honor flushes. Then again, the X25-E has the same problem. -B -- Brandon High : bhigh at freaks.com
On 2010-05-19 08.32, sensille wrote:> Don wrote: >> >> With that in mind- Is anyone using the new OCZ Vertex 2 SSD''s as a ZIL? >> >> They''re claiming 50k IOPS (4k Write- Aligned), 2 million hour MTBF, TRIM support, etc. That''s more write IOPS than the ZEUS (40k IOPS, $$$$$) but at half the price of an Intel X25-E (3.3k IOPS, $400). >> >> Needless to say I''d love to know if anyone has evaluated these drives to see if they make sense as a ZIL- for example- do they honor cache flush requests? Are those sustained IOPS numbers? > > In my understanding nearly the only relevant number is the number > of cache flushes a drive can handle per second, as this determines > my single thread performance. > Has anyone an idea what numbers I can expect from an Intel X25-E or > an OCZ Vertex 2?I don''t know about OCZ Vertex 2, but the Intel X25-E roughly halves it''s IOPS number when you disable it''s write cache (IIRC, it was in the range 1300-1600 writes/s or so). Since it ignores Cache Flush command and it doesn''t have any persistant buffer storage, disabling the write cache is the best you can do. Note that there were reports of the Intel X25-E loosing a write even though you had the write cache disabled! Since they still haven''t fixed this, after more than a year on the market, I believe it rather qualifies into the "hardly usable toy" class. I am very disappointed, I had hopes for a new class of cheap but usable flash drives. Maybe some day... /ragge
Well- 40k IOPS is the current claim from ZEUS- and they''re the benchmark. They use to be 17k IOPS. How real any of these numbers are from any manufacturer is a guess. Given the Intel''s refusal to honor a cache flush, and their performance problems with the cache disabled- I don''t trust them any more than anyone else right now. As for the Vertex drives- if they are within +-10% of the Intel they''re still doing it for half of what the Intel drive costs- so it''s an option- not a great option- but still an option. -- This message posted from opensolaris.org
> As for the Vertex drives- if they are within +-10% of the Intel they''re still doing it for half of what the Intel drive costs- so it''s an option- not a great option- but still an option.Yes, but Intel is SLC. Much more endurance.
On Wed, May 19, 2010 02:09, thomas wrote:> Is it even possible to buy a zeus iops anywhere? I haven''t been able to > find one. I get the impression they mostly sell to other vendors like sun? > I''d be curious what the price is on a 9GB zeus iops is these days?Correct, their Zeus products are only available to OEMs.
Well the larger size of the Vertex, coupled with their smaller claimed write amplification should result in sufficient service life for my needs. Their claimed MTBF also matches the Intel X25-E''s. -- This message posted from opensolaris.org
"Since it ignores Cache Flush command and it doesn''t have any persistant buffer storage, disabling the write cache is the best you can do." This actually brings up another question I had: What is the risk, beyond a few seconds of lost writes, if I lose power, there is no capacitor and the cache is not disabled? My ZFS system is shared storage for a large VMWare based QA farm. If I lose power then a few seconds of writes are the least of my concerns. All of the QA tests will need to be restarted and all of the file systems will need to be checked. A few seconds of writes won''t make any difference unless it has the potential to affect the integrity of the pool itself. Considering the performance trade-off, I''d happily give up a few seconds worth of writes for significantly improved IOPS. -- This message posted from opensolaris.org
On Wed, May 19, 2010 at 02:29:24PM -0700, Don wrote:> "Since it ignores Cache Flush command and it doesn''t have any > persistant buffer storage, disabling the write cache is the best you > can do." > > This actually brings up another question I had: What is the risk, > beyond a few seconds of lost writes, if I lose power, there is no > capacitor and the cache is not disabled?You can lose all writes from the last committed transaction (i.e., the one before the currently open transaction). (You also lose writes from the currently open transaction, but that''s unavoidable in any system.) Nowadays the system will let you know at boot time that the last transaction was not committed properly and you''ll have a chance to go back to the previous transaction. For me, getting much-better-than-disk performance out of an SSD with cache disabled is enough to make that SSD worthwhile, provided the price is right of course. Nico --
On May 19, 2010, at 2:29 PM, Don wrote:> "Since it ignores Cache Flush command and it doesn''t have any persistant buffer storage, disabling the write cache is the best you can do." > > This actually brings up another question I had: What is the risk, beyond a few seconds of lost writes, if I lose power, there is no capacitor and the cache is not disabled?The data risk is a few moments of data loss. However, if the order of the uberblock updates is not preserved (which is why the caches are flushed) then recovery from a reboot may require manual intervention. The amount of manual intervention could be significant for builds prior to b128.> My ZFS system is shared storage for a large VMWare based QA farm. If I lose power then a few seconds of writes are the least of my concerns. All of the QA tests will need to be restarted and all of the file systems will need to be checked. A few seconds of writes won''t make any difference unless it has the potential to affect the integrity of the pool itself. > > Considering the performance trade-off, I''d happily give up a few seconds worth of writes for significantly improved IOPS.Space, dependability, performance: pick two :-) -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
"You can lose all writes from the last committed transaction (i.e., the one before the currently open transaction)." And I don''t think that bothers me. As long as the array itself doesn''t go belly up- then a few seconds of lost transactions are largely irrelevant- all of the QA virtual machines are going to have to be rolled back to their initial states anyway. -- This message posted from opensolaris.org
"You can lose all writes from the last committed transaction (i.e., the one before the currently open transaction)." I''ll pick one- performance :) Honestly- I wish I had a better grasp on the real world performance of these drives. 50k IOPS is nice- and considering the incredible likelihood of data duplication in my environment- the SandForce controller seems like a win. That said- does anyone have a good set of real world performance numbers for these drives that you can link to? -- This message posted from opensolaris.org
On 20 maj 2010, at 00.20, Don wrote:> "You can lose all writes from the last committed transaction (i.e., the > one before the currently open transaction)." > > And I don''t think that bothers me. As long as the array itself doesn''t go belly up- then a few seconds of lost transactions are largely irrelevant- all of the QA virtual machines are going to have to be rolled back to their initial states anyway.Ok - then you are in the dream situation, and your solution could be free of charge, a one-liner command, and perform better than any SSD on the market: Disable the ZIL. You will loose up to 30 seconds of the lastly written data, and if you use it as a NFS server your clients may get confused after a crash since the server is not in the state it should be in. You could also turn down the ZFS transaction timeout to loose less than 30 seconds if you want. Your pool will always be in a consistent shape on disk (if you have hardware that behaves). Remember to NEVER use this pool to anything that actually want better data persistency, that this is a pool tuned specifically for a very special case. In very recent opensolaris there is a zpool property for this, earlier you had to set a kernel flag when mounting the pool (and having it unset when mounting other pools, if you want them to have ZIL enabled). /ragge
> On May 19, 2010, at 2:29 PM, Don wrote:> The data risk is a few moments of data loss. However, > if the order of the > uberblock updates is not preserved (which is why the > caches are flushed) > then recovery from a reboot may require manual > intervention. The amount > of manual intervention could be significant for > builds prior to b128.This risk is mostly mitigated by UPS backup and auto-shutdown when the UPS detects power loss, correct? Outside of pulling the plug that should solve power related problems. Kernel panics should only be caused by hardware issues, which might corrupt the disk data anyway. Obviously software can and does fail, but the biggest problem I hear about with ZIL devices is behavior in a sudden power loss situation. It seems to me that UPS backup along with starting a shutdown cycle before complete power failure should prevent most issues. Seems like that should help with issues like the X25-E not honoring cache flush as well, the UPS would give it time to finish the writes. Again, without a firmware issue in the drive itself. Should be about the same as a supercap anyway. -- This message posted from opensolaris.org
On Thu, May 20, 2010 14:12, Travis Tabbal wrote:>> On May 19, 2010, at 2:29 PM, Don wrote: > >> The data risk is a few moments of data loss. However, >> if the order of the >> uberblock updates is not preserved (which is why the >> caches are flushed) >> then recovery from a reboot may require manual >> intervention. The amount >> of manual intervention could be significant for >> builds prior to b128. > > > This risk is mostly mitigated by UPS backup and auto-shutdown when the UPS > detects power loss, correct?Unless you have a contractor working in the server room that bumps into the UPS and causes a power glitch which causes a whole bunch of equipment to cycle. Happened at $WORK (in another office) just two weeks ago. It all depends on your level of paranoia.
On 20 maj 2010, at 20.35, David Magda wrote:> On Thu, May 20, 2010 14:12, Travis Tabbal wrote: >>> On May 19, 2010, at 2:29 PM, Don wrote: >> >>> The data risk is a few moments of data loss. However, >>> if the order of the >>> uberblock updates is not preserved (which is why the >>> caches are flushed) >>> then recovery from a reboot may require manual >>> intervention. The amount >>> of manual intervention could be significant for >>> builds prior to b128. >> >> >> This risk is mostly mitigated by UPS backup and auto-shutdown when the UPS >> detects power loss, correct? > > Unless you have a contractor working in the server room that bumps into > the UPS and causes a power glitch which causes a whole bunch of equipment > to cycle. > > Happened at $WORK (in another office) just two weeks ago.Or, a zillion of other problem modes with that setup, all from problems with the UPS, to the auto-shutdown communication signaling system, a problem with the computer system, the electrical distribution, or anything else. Building complex solutions to solve critical issues is IMHO seldom a very good solution. If you care about data integrity, buy stuff that do what they are supposed to do, and keep everything simple. Redundancy is often good, but keep the switchover mechanisms as simple and as few as possible. Choose mechanisms that can and will be tested regularly - and don''t use systems that are almost never used and/or tested. Complex systems tend to fail, especially after some time when things have changed a bit, or even cause more outages in themselves. They are hard to test, maintain and understand, and they are often costly to buy too. KISS, you know. In the Intel X25 case - bug them until they release new firmware - they have sold you a defect product that they still haven''t fixed. If they don''t fix it and you need it, get another drive.> It all depends on your level of paranoia.Either that, or you may have some kind of protocol, policy, contract, SLA or similar that you have to follow. (In any case it is often really hard to even guess how much a certain change gives or takes in availability numbers.) Just my 5 ?re. /ragge
>>>>> "d" == Don <don at blacksun.org> writes:d> "Since it ignores Cache Flush command and it doesn''t have any d> persistant buffer storage, disabling the write cache is the d> best you can do." This actually brings up another question I d> had: What is the risk, beyond a few seconds of lost writes, if d> I lose power, there is no capacitor and the cache is not d> disabled? why use a slog at all if it''s not durable? You should disable the ZIL instead. Compared to a slog that ignores cache flush, disabling the ZIL will provide the same guarantees to the application w.r.t. write ordering preserved, and the same problems with NFS server reboots, replicated databases, mail servers. It''ll be faster than the fake-slog. It''ll be less risk of losing the pool because the slog went bad and then you accidentally exported the pool while trying to fix things. The only case where you are ahead with the fake-slog, is the host''s going down because of kernel panics rather than power loss. I don''t know, though, what to do about these reports of devices that almost respect cache flushes but seem to lose exactly one transaction. AFAICT this should be a works/doesntwork situation, not a continuum. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100520/7a5ffa8b/attachment.bin>
On 05/20/10 12:26, Miles Nordin wrote:> I don''t know, though, what to do about these reports of devices that > almost respect cache flushes but seem to lose exactly one transaction. > AFAICT this should be a works/doesntwork situation, not a continuum.But there''s so much brokenness out there. I''ve seen similar "tail drop" behavior before -- last write or two before a hardware reset goes into the bit bucket, but ones before that are durable. So, IMHO, a cheap consumer ssd used as a zil may still be worth it (for some use cases) to narrow the window of data loss from ~30 seconds to a sub-second value. - Bill
Miles Nordin
2010-May-20 20:35 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
>>>>> "rsk" == Roy Sigurd Karlsbakk <roy at karlsbakk.net> writes: >>>>> "dm" == David Magda <dmagda at ee.ryerson.ca> writes: >>>>> "tt" == Travis Tabbal <travis at tabbal.net> writes:rsk> Disabling ZIL is, according to ZFS best practice, NOT rsk> recommended. dm> As mentioned, you do NOT want to run with this in production, dm> but it is a quick way to check. REPEAT: I disagree. Once you associate the disasterizing and dire warnings from the developer''s advice-wiki with the specific problems that ZIL-disabling causes for real sysadmins rather than abstract notions of ``POSIX'''' or ``the application'''', a lot more people end up wanting to disable their ZIL''s. In fact, most of the SSD''s sold seem to be relying on exactly the trick disabled-ZIL ZFS does for much of their high performance, if not their feasibility within their price bracket period: provide a guarantee of write ordering without durability, and many applications are just, poof, happy. If the SSD''s arrange that no writes are reordered across a SYNC CACHE, but don''t bother actually providing durability, end uzarZ will ``OMG windows fast and no corruption.'''' --> ssd sales. The ``do-not-disable-buy-SSD!!!1!'''' advice thus translates to ``buy one of these broken SSD''s, and you will be basically happy. Almost everyone is. When you aren''t, we can blame the SSD instead of ZFS.'''' all that bottlenecked SATA traffic host<->SSD is just CYA and of no real value (except for kernel panics). Now, if someone would make a Battery FOB, that gives broken SSD 60 seconds of power, then we could use the consumer crap SSD''s in servers again with real value instead of CYA value. FOB should work like this: == RUNNING = battery ,-------> SATA port: pass -----. recharged? / power to SSD: on \ input / \ power ( . lost | | . input ,---\ v power / v restored / =power lost =power restored= . =hold-down =hold down = -- SATA port: block power to SSD: off power to SSD: on ^ | | | . . 60 seconds input \ / elapsed power . =power off= , restored -------- power to SSD: off <- The device must know when its battery has gone bad and stick itself in ``power restored hold down'''' state. Knowing when the battery is bad may require more states to test the battery, but this is the general idea. I think it would be much cheaper to build an SSD with supercap, and simpler because you can assume the supercap is good forever instead of testing it. However because of ``market forces'''' the FOB approach might sell for cheaper because the FOB cannot be tied to the SSD and used as a way to segment the market. If there are 2 companies making only FOB''s and not making SSD''s, only then competition will work like people want it to. Otherwise FOBs will be $1000 or something because only ``enterprise'''' users are smart/dumb enough to demand them. Normally I would have a problem that the FOB and SSD are separable, but see, the FOB and SSD can be put together with double-sided tape: the tape only has to hold for 60 seconds after $event, and there''s no way to separate the two by tripping over a cord. You can safely move SSD+FOB from one chassis to another without fearing all is lost if you jiggle the connection. I think it''s okay overall. tt> This risk is mostly mitigated by UPS backup and auto-shutdown tt> when the UPS detects power loss, correct? no no it''s about cutting off a class of failure cases and constraining ourselves to relatively sane forms of failure. We are not haggling about NO FAILURES EVAR yet. First, for STEP 1 we isolate the insane kinds of failure that cost us days or months of data rather than just a few seconds, the kinds that call for crazy unplannable ad-hoc recovery methods like `Viktor plz help me'' and ``is anyone here a Postgres data recovery expert?'''' and ``is there a way I can invalidate the batch of billing auth requests I uploaded yesterday so I can rerun it without double-billing anyone?'''' For STEP 1 we make the insane fail almost impossible through clever software and planning. A UPS never never ever qualifies as ``almost impossible''''. Then, once that''s done, we come back for STEP 2 where we try to minimize the sane failures also, and for step 2 things like UPS might be useful. For STEP 2 it makes sense to talk about percent availability, probability of failure, length of time to recover from Scenario X. but in STEP 1 all the failures are insane ones, so you cannot measure any of these things. UPS is not about how ``paranoid'''' you are or how far you want to take STEP 1. you take STEP 1 all the way to completion before worrying about STEP 2. For NFS, the STEP 1 risk on the table is ``server reboots, client does not.'''' It is okay if both reboot at once. It is okay if neither reboots. but if you disable ZIL OR have broken SSD like X25 AND NFS server reboots, client doesn''t then you have a STEP 1 insane failure case that can cause corrupted database files or virtual disk images on the NFS clients. For example if you fail to complete STEP 1, and then you plug the NFS clients into a more expensive UPS with proper transfer switches for maintenance and A/B power, and the server into a rather ordinary UPS, then you will be at greater risk of this particular NFS problem than if you used no UPS at all. That''s not intuitive! But it''s true! This comes from putting step 2 before step 1. You must do them in order if you want to stay sane. If you do not care about this NFS problem (or the others) then maybe you can just disable the ZIL. It is a matter of working through step 1. Working through STEP 1 might be ``doesn''t affect us. Disable ZIL.'''' Or it might be ``get slog with supercap''''. STEP 1 will never be ``plug in OCZ Vertex cheaposlog that ignores cacheflush'''' if you are doing it right. And Step 2 has nothing to do with anything yet until we finish STEP 1 and the insane failure cases. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100520/f118b607/attachment.bin>
Miika Vesti
2010-May-20 21:23 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
> If you do not care about this NFS problem (or the others) then maybe > you can just disable the ZIL. It is a matter of working through step > 1. Working through STEP 1 might be ``doesn''t affect us. Disable > ZIL.'''' Or it might be ``get slog with supercap''''. STEP 1 will never > be ``plug in OCZ Vertex cheaposlog that ignores cacheflush'''' if you > are doing it right. And Step 2 has nothing to do with anything yet > until we finish STEP 1 and the insane failure cases.AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND grid. Whether it respects or ignores the cache flush seems irrelevant. There has been previous discussion about this: http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 "I''m pretty sure that all SandForce-based SSDs don''t use DRAM as their cache, but take a hunk of flash to use as scratch space instead. Which means that they''ll be OK for ZIL use." Also: http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html "Another benefit of SandForce''s architecture is that the SSD keeps information on the NAND grid and removes the need for a separate cache buffer DRAM module. The result is a faster transaction, albeit at the expense of total storage capacity." "So if I interpret them correctly, what they chose to do with the current incarnation of the architecture is actually reserve some of the primary memory capacity for I/O transaction management." "In plain English, if the system gets interrupted either by power or by a crash, when it initializes the next time, it can read from its transaction space and "resume" where it left off. This makes it durable." So, OCZ Vertex 2 seems to be a good choice for ZIL.
> use a slog at all if it''s not durable? You should > disable the ZIL > instead.This is basically where I was going. There only seems to be one SSD that is considered "working", the Zeus IOPS. Even if I had the money, I can''t buy it. As my application is a home server, not a datacenter, things like NFS breaking if I don''t reboot the clients is a non-issue. As long as the on-disk data is consistent so I don''t have to worry about the entire pool going belly-up, I''m happy enough. I might lose 30 seconds of data, worst case, as a result of running without ZIL. Considering that I can''t buy a proper ZIL at a cost I can afford, and an improper ZIL is not worth much, I don''t see a reason to bother with ZIL at all. I''ll just get a cheap large SSD for L2ARC, disable ZIL, and call it a day. For my use, I''d want a device in the $200 range to even consider an slog device. As nothing even remotely close to that price range exists that will work properly at all, let alone with decent performance, I see no point in ZIL for my application. The performance hit is just too severe to continue using it without an slog, and there''s no slog device I can afford that works properly, even if I ignore performance. -- This message posted from opensolaris.org
On May 20, 2010, at 6:25 PM, Travis Tabbal <travis at tabbal.net> wrote:>> use a slog at all if it''s not durable? You should >> disable the ZIL >> instead. > > > This is basically where I was going. There only seems to be one SSD > that is considered "working", the Zeus IOPS. Even if I had the > money, I can''t buy it. As my application is a home server, not a > datacenter, things like NFS breaking if I don''t reboot the clients > is a non-issue. As long as the on-disk data is consistent so I don''t > have to worry about the entire pool going belly-up, I''m happy > enough. I might lose 30 seconds of data, worst case, as a result of > running without ZIL. Considering that I can''t buy a proper ZIL at a > cost I can afford, and an improper ZIL is not worth much, I don''t > see a reason to bother with ZIL at all. I''ll just get a cheap large > SSD for L2ARC, disable ZIL, and call it a day. > > For my use, I''d want a device in the $200 range to even consider an > slog device. As nothing even remotely close to that price range > exists that will work properly at all, let alone with decent > performance, I see no point in ZIL for my application. The > performance hit is just too severe to continue using it without an > slog, and there''s no slog device I can afford that works properly, > even if I ignore performance.Just buy a caching RAID controller and run it in JBOD mode and have the ZIL integrated with the pool. A 512MB-1024MB card with battery backup should do the trick. It might not have the capacity of an SSD, but in my experience it works well in the 1TB data moderately loaded range. Have more data/activity then try more cards and more pools, otherwise pony up the $$$$ for a capacitor backed SSD. -Ross
On 21 maj 2010, at 00.53, Ross Walker wrote:> On May 20, 2010, at 6:25 PM, Travis Tabbal <travis at tabbal.net> wrote: > >>> use a slog at all if it''s not durable? You should >>> disable the ZIL >>> instead. >> >> >> This is basically where I was going. There only seems to be one SSD that is considered "working", the Zeus IOPS. Even if I had the money, I can''t buy it. As my application is a home server, not a datacenter, things like NFS breaking if I don''t reboot the clients is a non-issue. As long as the on-disk data is consistent so I don''t have to worry about the entire pool going belly-up, I''m happy enough. I might lose 30 seconds of data, worst case, as a result of running without ZIL. Considering that I can''t buy a proper ZIL at a cost I can afford, and an improper ZIL is not worth much, I don''t see a reason to bother with ZIL at all. I''ll just get a cheap large SSD for L2ARC, disable ZIL, and call it a day. >> >> For my use, I''d want a device in the $200 range to even consider an slog device. As nothing even remotely close to that price range exists that will work properly at all, let alone with decent performance, I see no point in ZIL for my application. The performance hit is just too severe to continue using it without an slog, and there''s no slog device I can afford that works properly, even if I ignore performance. > > Just buy a caching RAID controller and run it in JBOD mode and have the ZIL integrated with the pool. > > A 512MB-1024MB card with battery backup should do the trick. It might not have the capacity of an SSD, but in my experience it works well in the 1TB data moderately loaded range. > > Have more data/activity then try more cards and more pools, otherwise pony up the $$$$ for a capacitor backed SSD.It - again - depends on what problem you are trying to solve. If the RAID controller goes bad on you so that you loose the data in the write cache, your file system could be in pretty bad shape. Most RAID controllers can''t be mirrored. That would hardly make a good replacement for a mirrored ZIL. As far as I know, there is no single silver bullet to this issue. /ragge
On May 20, 2010, at 1:12 PM, Bill Sommerfeld wrote:> On 05/20/10 12:26, Miles Nordin wrote: >> I don''t know, though, what to do about these reports of devices that >> almost respect cache flushes but seem to lose exactly one transaction. >> AFAICT this should be a works/doesntwork situation, not a continuum. > > But there''s so much brokenness out there. I''ve seen similar "tail drop" behavior before -- last write or two before a hardware reset goes into the bit bucket, but ones before that are durable. > > So, IMHO, a cheap consumer ssd used as a zil may still be worth it (for some use cases) to narrow the window of data loss from ~30 seconds to a sub-second value.+1 -- richard -- ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/
> So, IMHO, a cheap consumer ssd used as a zil may still be worth it (for > some use cases) to narrow the window of data loss from ~30 seconds to a > sub-second value.There are lots of reasons to enable the ZIL now- I can throw four very inexpensive SSD''s in there now in a pair of mirrors, and then when a better drive comes along I can replace each half of the mirror without bringing anything down. My slots are already allocated and it would be nice to save a few extra seconds of writes- just in case. It''s not a great solution- but nothing is. I don''t have access to a ZEUS- and even if I did- I wouldn''t pay that kind of money for what amounts to a Vertex 2 Pro but with SLC flash. I''m kind of flabbergasted that no one has simply stuck a capacitor on a more reasonable drive. I guess the market just isn''t big enough- but I find that hard to believe. Right now it seems like the options are all or nothing. There''s just no %^$#^ middle ground. -- This message posted from opensolaris.org
On the PCIe side, I noticed there''s a new card coming from LSI that claims 150,000 4k random writes. Unfortunately this might end up being an OEM-only card. I also notice on the ddrdrive site that they now have an opensolaris driver and are offering it in a beta program. -- This message posted from opensolaris.org
On May 20, 2010, at 7:17 PM, Ragnar Sundblad <ragge at csc.kth.se> wrote:> > On 21 maj 2010, at 00.53, Ross Walker wrote: > >> On May 20, 2010, at 6:25 PM, Travis Tabbal <travis at tabbal.net> wrote: >> >>>> use a slog at all if it''s not durable? You should >>>> disable the ZIL >>>> instead. >>> >>> >>> This is basically where I was going. There only seems to be one >>> SSD that is considered "working", the Zeus IOPS. Even if I had the >>> money, I can''t buy it. As my application is a home server, not a >>> datacenter, things like NFS breaking if I don''t reboot the clients >>> is a non-issue. As long as the on-disk data is consistent so I >>> don''t have to worry about the entire pool going belly-up, I''m >>> happy enough. I might lose 30 seconds of data, worst case, as a >>> result of running without ZIL. Considering that I can''t buy a >>> proper ZIL at a cost I can afford, and an improper ZIL is not >>> worth much, I don''t see a reason to bother with ZIL at all. I''ll >>> just get a cheap large SSD for L2ARC, disable ZIL, and call it a >>> day. >>> >>> For my use, I''d want a device in the $200 range to even consider >>> an slog device. As nothing even remotely close to that price range >>> exists that will work properly at all, let alone with decent >>> performance, I see no point in ZIL for my application. The >>> performance hit is just too severe to continue using it without an >>> slog, and there''s no slog device I can afford that works properly, >>> even if I ignore performance. >> >> Just buy a caching RAID controller and run it in JBOD mode and have >> the ZIL integrated with the pool. >> >> A 512MB-1024MB card with battery backup should do the trick. It >> might not have the capacity of an SSD, but in my experience it >> works well in the 1TB data moderately loaded range. >> >> Have more data/activity then try more cards and more pools, >> otherwise pony up the $$$$ for a capacitor backed SSD. > > It - again - depends on what problem you are trying to solve. > > If the RAID controller goes bad on you so that you loose the > data in the write cache, your file system could be in pretty bad > shape. Most RAID controllers can''t be mirrored. That would hardly > make a good replacement for a mirrored ZIL. > > As far as I know, there is no single silver bullet to this issue.That is true, and there at finite budgets as well and as all things in life one must make a trade-off somewhere. If you have 2 mirrored SSDs that don''t support cache flush and your power goes out your file system will be in the same bad shape. Difference is in the first place you paid a lot less to have your data hosed. -Ross
Attila Mravik
2010-May-21 14:09 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND > grid. Whether it respects or ignores the cache flush seems irrelevant. > > There has been previous discussion about this: > http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 > > "I''m pretty sure that all SandForce-based SSDs don''t use DRAM as their > cache, but take a hunk of flash to use as scratch space instead. Which > means that they''ll be OK for ZIL use." > > Also: > http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html > > "Another benefit of SandForce''s architecture is that the SSD keeps > information on the NAND grid and removes the need for a separate cache > buffer DRAM module. The result is a faster transaction, albeit at the > expense of total storage capacity." > > "So if I interpret them correctly, what they chose to do with the current > incarnation of the architecture is actually reserve some of the primary > memory capacity for I/O transaction management." > > "In plain English, if the system gets interrupted either by power or by a > crash, when it initializes the next time, it can read from its transaction > space and "resume" where it left off. This makes it durable." >Here is a detailed explanation of the SandForce controllers: http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal So the SF-1500 is enterprise class and relies on a supercap, the SF-1200 is consumer class and does not rely on a supercap. "The SF-1200 firmware on the other hand doesn?t assume the presence of a large capacitor to keep the controller/NAND powered long enough to complete all writes in the event of a power failure. As such it does more frequent check pointing and doesn?t guarantee the write in progress will complete before it?s acknowledged." As I understand it, the SF-1200 will ack the sync write only after it is written to flash thus reducing write performance. There is an interesting part about firmwares and OCZ having an exclusive firmware in the Vertex 2 series which based on the SF-1200 but its random write IOPS is not capped at 10K (while other vendors and other SSDs from OCZ using the SF-1200 are capped, unless they sell the drive with the RC firmware which is for OEM evaluation and not production ready but does not contain the IOPS cap).
Miika Vesti
2010-May-21 15:14 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
This is intresting. I thought all Vertex 2 SSDs are good choices for ZIL but this does not seem to be the case. According to http://www.legitreviews.com/article/1208/1/ Vertex 2 LE, Vertex 2 Pro and Vertex 2 EX are SF-1500 based but Vertex 2 (without any suffix) is SF-1200 based. Here is the table: Model Controller Max Read Max Write IOPS Vertex 2 SF-1200 270MB/s 260MB/s 9500 Vertex 2 LE SF-1500 270MB/s 250MB/s ? Vertex 2 Pro SF-1500 280MB/s 270MB/s 19000 Vertex 2 EX SF-1500 280MB/s 270MB/s 25000 21.05.2010 17:09, Attila Mravik kirjoitti:>> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND >> grid. Whether it respects or ignores the cache flush seems irrelevant. >> >> There has been previous discussion about this: >> http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 >> >> "I''m pretty sure that all SandForce-based SSDs don''t use DRAM as their >> cache, but take a hunk of flash to use as scratch space instead. Which >> means that they''ll be OK for ZIL use." >> >> Also: >> http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html >> >> "Another benefit of SandForce''s architecture is that the SSD keeps >> information on the NAND grid and removes the need for a separate cache >> buffer DRAM module. The result is a faster transaction, albeit at the >> expense of total storage capacity." >> >> "So if I interpret them correctly, what they chose to do with the current >> incarnation of the architecture is actually reserve some of the primary >> memory capacity for I/O transaction management." >> >> "In plain English, if the system gets interrupted either by power or by a >> crash, when it initializes the next time, it can read from its transaction >> space and "resume" where it left off. This makes it durable." >> > > Here is a detailed explanation of the SandForce controllers: > http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal > > So the SF-1500 is enterprise class and relies on a supercap, the > SF-1200 is consumer class and does not rely on a supercap. > > "The SF-1200 firmware on the other hand doesn?t assume the presence of > a large capacitor to keep the controller/NAND powered long enough to > complete all writes in the event of a power failure. As such it does > more frequent check pointing and doesn?t guarantee the write in > progress will complete before it?s acknowledged." > > As I understand it, the SF-1200 will ack the sync write only after it > is written to flash thus reducing write performance. > > There is an interesting part about firmwares and OCZ having an > exclusive firmware in the Vertex 2 series which based on the SF-1200 > but its random write IOPS is not capped at 10K (while other vendors > and other SSDs from OCZ using the SF-1200 are capped, unless they sell > the drive with the RC firmware which is for OEM evaluation and not > production ready but does not contain the IOPS cap). > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bob Friesenhahn
2010-May-21 15:19 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, Miika Vesti wrote:> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND > grid. Whether it respects or ignores the cache flush seems irrelevant. > > There has been previous discussion about this: > http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 > > "I''m pretty sure that all SandForce-based SSDs don''t use DRAM as their > cache, but take a hunk of flash to use as scratch space instead. Which > means that they''ll be OK for ZIL use." > > So, OCZ Vertex 2 seems to be a good choice for ZIL.There seem to be quite a lot of blind assumptions in the above. The only good choice for ZIL is when you know for a certainty and not assumptions based on 3rd party articles and blog postings. Otherwise it is like assuming that if you jump through an open window that there will be firemen down below to catch you. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
David Dyer-Bennet
2010-May-21 16:45 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, May 21, 2010 10:19, Bob Friesenhahn wrote:> On Fri, 21 May 2010, Miika Vesti wrote: > >> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile >> NAND >> grid. Whether it respects or ignores the cache flush seems irrelevant. >> >> There has been previous discussion about this: >> http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 >> >> "I''m pretty sure that all SandForce-based SSDs don''t use DRAM as their >> cache, but take a hunk of flash to use as scratch space instead. Which >> means that they''ll be OK for ZIL use." >> >> So, OCZ Vertex 2 seems to be a good choice for ZIL. > > There seem to be quite a lot of blind assumptions in the above. The > only good choice for ZIL is when you know for a certainty and not > assumptions based on 3rd party articles and blog postings. Otherwise > it is like assuming that if you jump through an open window that there > will be firemen down below to catch you.Just how DOES one know something for a certainty, anyway? I''ve seen LOTS of people mess up performance testing in ways that gave them very wrong answers; relying solely on your own testing is as foolish as relying on a couple of random blog posts. To be comfortable (I don''t ask for "know for a certainty"; I''m not sure that exists outside of "faith"), I want a claim by the manufacturer and multiple outside tests in "significant" journals -- which could be the blog of somebody I trusted, as well as actual magazines and such. Ideally, certainly if it''s important, I''d then verify the tests myself. There aren''t enough hours in the day, so I often get by with less. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
Brandon High
2010-May-21 18:29 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Thu, May 20, 2010 at 2:23 PM, Miika Vesti <miika.vesti at trivore.com> wrote:> "I''m pretty sure that all SandForce-based SSDs don''t use DRAM as their > cache, but take a hunk of flash to use as scratch space instead. Which > means that they''ll be OK for ZIL use."I''ve read conflicting reports that the controller contains a small DRAM cache. So while it doesn''t rely on an external DRAM cache, it does have one: http://www.legitreviews.com/article/1299/2/ "As we noted, the Vertex 2 doesn''t have any cache chips on it as that is because the SandForce controller itself is said to carry a small cache inside that is a number of megabytes in size."> "Another benefit of SandForce''s architecture is that the SSD keeps > information on the NAND grid and removes the need for a separate cache > buffer DRAM module. The result is a faster transaction, albeit at the > expense of total storage capacity."Again, conflicting reports indicate otherwise. http://www.legitreviews.com/article/1299/2/ "That adds up to 128GB of storage space, but only 93.1GB of it will be usable space! The ''hidden'' capacity is used for wear leveling, which is crucial to keeping SSDs running as long as possible." My understanding is that the controller contains enough cache to buffer enough data to write a complete erase block size, eliminating the need to read / erase / write that a partial block write entails. It''s reported to do a copy-on-write, so it doesn''t need to do a read of existing blocks when making changes, which gives it such high iops - Even random writes are turned into sequential writes (much like how ZFS works) of entire erase blocks. The excessive spare area is used to ensure that there are always full pages free to write to. (Some vendors are releasing consumer drives with 60/120/240 GB, using 7% reserved space rather than the 27% that the original drives ship with.) With an unexpected power loss, you could still lose any data that''s cached in the controller, or any uncommitted changes that have been partially written to the NAND I hate having to rely on sites like Legit Reviews and Anandtech for technical data, but there don''t seem to be non-fanboy sites doing comprehensive reviews of the drives ... -B -- Brandon High : bhigh at freaks.com
Miles Nordin
2010-May-21 18:36 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
>>>>> "dd" == David Dyer-Bennet <dd-b at dd-b.net> writes:dd> Just how DOES one know something for a certainty, anyway? science. Do a test like Lutz did on X25M G2. see list archives 2010-01-10. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100521/7575baeb/attachment.bin>
Don
2010-May-21 18:48 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
> Now, if someone would make a Battery FOB, that gives broken SSD 60 > seconds of power, then we could use the consumer **** SSD''s in servers > again with real value instead of CYA value.You know- it would probably be sufficient to provide the SSD with _just_ a big capacitor bank. If the host lost power it would stop writing and if the SSD still had power it would probably use the idle time to flush it''s buffers. Then there would be world peace! Yeah- got a little carried away there. Still this seems like an experiment I''m going to have to try on my home server out of curiosity more than anything else :) -- This message posted from opensolaris.org
On Thu, May 20, 2010 at 8:46 PM, Don <don at blacksun.org> wrote:> I''m kind of flabbergasted that no one has simply stuck a capacitor on a more reasonable drive. I guess the market just isn''t big enough- but I find that hard to believe.I just spoke with a co-worker about doing something about it. He says he can design a small in-line "UPS" that will deliver 20-30 seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50 in parts. It would be even less if only one voltage was needed. That should be enough for most any SSD to finish any pending writes. Any design that we come up with will be made publicly available under a Creative Commons or other similar license. -B -- Brandon High : bhigh at freaks.com
> I just spoke with a co-worker about doing something about it. > > He says he can design a small in-line "UPS" that will deliver 20-30 > seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50 > in parts. It would be even less if only one voltage was needed. That > should be enough for most any SSD to finish any pending writes.Oh I wasn''t kidding when I said I was going to have to try this with my home server. I actually do some circuit board design and this would be an amusing project. All you probably need is 5v- I''ll look into it. -- This message posted from opensolaris.org
On 05/22/10 12:31 PM, Don wrote:>> I just spoke with a co-worker about doing something about it. >> >> He says he can design a small in-line "UPS" that will deliver 20-30 >> seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50 >> in parts. It would be even less if only one voltage was needed. That >> should be enough for most any SSD to finish any pending writes. >> > Oh I wasn''t kidding when I said I was going to have to try this with my home server. I actually do some circuit board design and this would be an amusing project. All you probably need is 5v- I''ll look into it. >Two Supercaps should do the trick. Dive connectors only have 5 and 12v. -- Ian.
On Fri, May 21, 2010 at 5:31 PM, Don <don at blacksun.org> wrote:> Oh I wasn''t kidding when I said I was going to have to try this with my home server. I actually do some circuit board design and this would be an amusing project. All you probably need is 5v- I''ll look into it.The SATA power connector supplies 3.3, 5 and 12v. A "complete" solution will have all three. Most drives use just the 5v, so you can probably ignore 3.3v and 12v. You''ll need to use a step up DC-DC converter and be able to supply ~ 100mA at 5v. (I can''t find any specific numbers on power consumption. Intel claims 75mW - 150mW for the X25-M. USB is rated at 500mA at 5v, and all drives that I''ve seen can run in an un-powered USB case.) It''s actually easier/cheaper to use a LiPoly battery & charger and get a few minutes of power than to use an ultracap for a few seconds of power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you''ll need a step up converter in either case. If you''re supplying more than one voltage, you should use a microcontroller to shut off all the charge pumps at once when the battery / ultracap runs low. If you''re only supplying 5V, it doesn''t matter. Cost for a 5v only system should be $30 - $35 in one-off prototype-ready components with a 1100mAH battery (using prices from Sparkfun.com), plus the cost for an enclosure, etc. A larger buy, a custom PCB, and a smaller battery would probably reduce the cost 20-50%. -B -- Brandon High : bhigh at freaks.com
> The SATA power connector supplies 3.3, 5 and 12v. A "complete" > solution will have all three. Most drives use just the 5v, so you can > probably ignore 3.3v and 12v.I''m not interested in building something that''s going to work for every possible drive config- just my config :) Both the Intel X25-e and the OCZ only uses the 5V rail.> You''ll need to use a step up DC-DC converter and be able to supply ~ > 100mA at 5v. > It''s actually easier/cheaper to use a LiPoly battery & charger and get a > few minutes of power than to use an ultracap for a few seconds of > power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you''ll need a > step up converter in either case.Ultracapacitors are available in voltage ratings beyond 12volts so there is no reason to use a boost converter with them. That eliminates high frequency switching transients right next to our SSD which is always helpful. In this case- we have lots of room. We have a 3.5" x 1" drive bay, but a 2.5" x 1/4" hard drive. There is ample room for several of the 6.3V ELNA 1F capacitors (and our SATA power rail is a 5V regulated rail so they should suffice)- either in series or parallel (Depending on voltage or runtime requirements). http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf You could 2 caps in series for better voltage tolerance or in parallel for longer runtimes. Either way you probably don''t need a charge controller, a boost or buck converter, or in fact any IC''s at all. It''s just a small board with some caps on it.> Cost for a 5v only system should be $30 - $35 in one-off > prototype-ready components with a 1100mAH battery (using prices from > Sparkfun.com),You could literally split a sata cable and add in some capacitors for just the cost of the caps themselves. The issue there is whether the caps would present too large a current drain on initial charge up- If they do then you need to add in charge controllers and you''ve got the same problems as with a LiPo battery- although without the shorter service life. At the end of the day the real problem is whether we believe the drives themselves will actually use the quiet period on the now dead bus to write out their caches. This is something we should ask the manufacturers, and test for ourselves. -- This message posted from opensolaris.org
On 22 maj 2010, at 07.40, Don wrote:>> The SATA power connector supplies 3.3, 5 and 12v. A "complete" >> solution will have all three. Most drives use just the 5v, so you can >> probably ignore 3.3v and 12v. > I''m not interested in building something that''s going to work for every possible drive config- just my config :) Both the Intel X25-e and the OCZ only uses the 5V rail. > >> You''ll need to use a step up DC-DC converter and be able to supply ~ >> 100mA at 5v. >> It''s actually easier/cheaper to use a LiPoly battery & charger and get a >> few minutes of power than to use an ultracap for a few seconds of >> power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you''ll need a >> step up converter in either case. > Ultracapacitors are available in voltage ratings beyond 12volts so there is no reason to use a boost converter with them. That eliminates high frequency switching transients right next to our SSD which is always helpful. > > In this case- we have lots of room. We have a 3.5" x 1" drive bay, but a 2.5" x 1/4" hard drive. There is ample room for several of the 6.3V ELNA 1F capacitors (and our SATA power rail is a 5V regulated rail so they should suffice)- either in series or parallel (Depending on voltage or runtime requirements). > http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf > > You could 2 caps in series for better voltage tolerance or in parallel for longer runtimes. Either way you probably don''t need a charge controller, a boost or buck converter, or in fact any IC''s at all. It''s just a small board with some caps on it.I know they have a certain internal resistance, but I am not familiar with the characteristics; is it high enough so you don''t need to limit the inrush current, and is it low enough so that you don''t need a voltage booster for output?>> Cost for a 5v only system should be $30 - $35 in one-off >> prototype-ready components with a 1100mAH battery (using prices from >> Sparkfun.com), > You could literally split a sata cable and add in some capacitors for just the cost of the caps themselves. The issue there is whether the caps would present too large a current drain on initial charge up- If they do then you need to add in charge controllers and you''ve got the same problems as with a LiPo battery- although without the shorter service life. > > At the end of the day the real problem is whether we believe the drives themselves will actually use the quiet period on the now dead bus to write out their caches. This is something we should ask the manufacturers, and test for ourselves.Indeed! /ragge
Basic electronics, go! The linked capacitor from Elna ( http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf) has an internal resistance of 30 ohms. Intel rate their 32GB X25-E at 2.4W active (we aren''t interested in idle power usage, if its idle, we don''t need the capacitor in the first place) on the +5V rail, thats 0.48A. (P=VI) V=IR, supply is 5V, current through load is 480mA, hence R=10.4 ohms. The resistance of the X25-E under load is 10.4 ohms. Now if you have a capacitor discharge circuit with the charged Elna DK-6R3D105T - the largest and most suitable from that datasheet - you have 40.4 ohms around the loop (cap and load). +5V over 40.4 ohms. The maximum current you can pull from that is I=V/R = 124mA. Around a quarter what the X25-E wants in order to write. The setup won''t work. I''d suggest something more along the lines of: http://www.cap-xx.com/products/products.htm Which have an ESR around 3 orders of magnitude lower. t On 22 May 2010 18:58, Ragnar Sundblad <ragge at csc.kth.se> wrote:> > On 22 maj 2010, at 07.40, Don wrote: > > >> The SATA power connector supplies 3.3, 5 and 12v. A "complete" > >> solution will have all three. Most drives use just the 5v, so you can > >> probably ignore 3.3v and 12v. > > I''m not interested in building something that''s going to work for every > possible drive config- just my config :) Both the Intel X25-e and the OCZ > only uses the 5V rail. > > > >> You''ll need to use a step up DC-DC converter and be able to supply ~ > >> 100mA at 5v. > >> It''s actually easier/cheaper to use a LiPoly battery & charger and get a > >> few minutes of power than to use an ultracap for a few seconds of > >> power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you''ll need a > >> step up converter in either case. > > Ultracapacitors are available in voltage ratings beyond 12volts so there > is no reason to use a boost converter with them. That eliminates high > frequency switching transients right next to our SSD which is always > helpful. > > > > In this case- we have lots of room. We have a 3.5" x 1" drive bay, but a > 2.5" x 1/4" hard drive. There is ample room for several of the 6.3V ELNA 1F > capacitors (and our SATA power rail is a 5V regulated rail so they should > suffice)- either in series or parallel (Depending on voltage or runtime > requirements). > > http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf > > > > You could 2 caps in series for better voltage tolerance or in parallel > for longer runtimes. Either way you probably don''t need a charge controller, > a boost or buck converter, or in fact any IC''s at all. It''s just a small > board with some caps on it. > > I know they have a certain internal resistance, but I am not familiar > with the characteristics; is it high enough so you don''t need to > limit the inrush current, and is it low enough so that you don''t need > a voltage booster for output? > > >> Cost for a 5v only system should be $30 - $35 in one-off > >> prototype-ready components with a 1100mAH battery (using prices from > >> Sparkfun.com), > > You could literally split a sata cable and add in some capacitors for > just the cost of the caps themselves. The issue there is whether the caps > would present too large a current drain on initial charge up- If they do > then you need to add in charge controllers and you''ve got the same problems > as with a LiPo battery- although without the shorter service life. > > > > At the end of the day the real problem is whether we believe the drives > themselves will actually use the quiet period on the now dead bus to write > out their caches. This is something we should ask the manufacturers, and > test for ourselves. > > Indeed! > > /ragge > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100522/9975be7f/attachment.html>
Bob Friesenhahn
2010-May-22 14:41 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, David Dyer-Bennet wrote:> > To be comfortable (I don''t ask for "know for a certainty"; I''m not sure > that exists outside of "faith"), I want a claim by the manufacturer and > multiple outside tests in "significant" journals -- which could be the > blog of somebody I trusted, as well as actual magazines and such. > Ideally, certainly if it''s important, I''d then verify the tests myself.For me, "know for a certainty" means that the feature is clearly specified in the formal specification sheet for the product, and the vendor has historically published reliable specification sheets. This may not be the same as money in the bank, but it is better than relying on thoughts from some blog posting. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn
2010-May-22 14:50 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, Brandon High wrote:> > My understanding is that the controller contains enough cache to > buffer enough data to write a complete erase block size, eliminating > the need to read / erase / write that a partial block write entails. > It''s reported to do a copy-on-write, so it doesn''t need to do a read > of existing blocks when making changes, which gives it such high iops > - Even random writes are turned into sequential writes (much like how > ZFS works) of entire erase blocks. The excessive spare area is used to > ensure that there are always full pages free to write to. (Some > vendors are releasing consumer drives with 60/120/240 GB, using 7% > reserved space rather than the 27% that the original drives ship > with.)FLASH is useless as working space since it does not behave like RAM so every SSD needs to have some RAM for temporary storage of data. This COW approach seems nice except that it would appear to inflate performance by only considering a specific magic block size and alignment. Other block sizes and alignments would require that existing data be read so that the new block content can be constructed. Also, the blazing fast write speed (which depends on plenty of already erased blocks) would stop once the spare space in the SSD has been consumed. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn
2010-May-22 15:00 UTC
[zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, 21 May 2010, Don wrote:> You know- it would probably be sufficient to provide the SSD with > _just_ a big capacitor bank. If the host lost power it would stop > writing and if the SSD still had power it would probably use the > idle time to flush it''s buffers. Then there would be world peace!This makes the assumption that an SSD will want to flush its write cache as soon as possible rather than just letting it sit there waiting for more data. This is probably not a good assumption. If the OS sends 512 bytes of data but the SSD block size is 4K, it is reasonable for the SSD to wait for 3584 more contiguous bytes of data before it bothers to write anything. Writes increase the wear on the flash and writes require a slow erase cycle so it is reasonable for SSDs to buffer as much data in their write cache as possible before writing anything. An advanced SSD could write non-contiguous sectors in a SSD page and then use a sort of lookup table to know where the sectors actually are. Regardless, under slow write conditions, it is is definitely valuable to buffer the data for a while in the hope that more related data will appear, or the data might even be overwritten. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Fri, 21 May 2010, Don wrote:> You could literally split a sata cable and add in some capacitors > for just the cost of the caps themselves. The issue there is whether > the caps would present too large a current drain on initial charge > up- If they do then you need to add in charge controllers and you''ve > got the same problems as with a LiPo battery- although without the > shorter service life.Electricity does run both directions down a wire and the capacitor would look like a short circuit to the supply when it is first turned on. You would need some circuitry which delays applying power to the drive before the capacitor is sufficiently charged, and some circuitry which shuts off the flow of energy back into the power supply when the power supply shuts off (could be a silicon diode if you don''t mind the 0.7 V drop). Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Bob Friesenhahn wrote:> On Fri, 21 May 2010, Don wrote: > >> You could literally split a sata cable and add in some capacitors for >> just the cost of the caps themselves. The issue there is whether the >> caps would present too large a current drain on initial charge up- If >> they do then you need to add in charge controllers and you''ve got the >> same problems as with a LiPo battery- although without the shorter >> service life. > > Electricity does run both directions down a wire and the capacitor > would look like a short circuit to the supply when it is first turned > on. You would need some circuitry which delays applying power to the > drive before the capacitor is sufficiently charged, and some circuitry > which shuts off the flow of energy back into the power supply when the > power supply shuts off (could be a silicon diode if you don''t mind the > 0.7 V drop). > > BobYou can also use an appropriately wired field effect transistor (FET) / MOSFET of sufficient current carrying capacity as a one-way valve (diode) that has minimal voltage drop. More: http://electronicdesign.com/article/power/fet-supplies-low-voltage-reverse-polarity-protecti.aspx http://www.electro-tech-online.com/general-electronics-chat/32118-using-mosfet-diode-replacement.html In regard to how long do you need to continue supplying power...that comes down to how long does the SSD wait before flushing cache to flash. If you can identify the maximum write cache flush interval, and size the battery or capacitor to exceed that maximum interval, you should be okay. The maximum write cache flush interval is determined by a timer that says something like "okay, we''ve waited 5 seconds for additional data to arrive to be written. None has arrived in the last 5 seconds, so we''re going to write what we already have to better ensure data integrity, even though it is suboptimal from a absolute performance perspective." In conventional terms of filling city buses...the bus leaves when it is full of people, or 15 minutes has passed since the last bus left. Does anyone know if there is a way to directly or indirectly measure the write caching flush interval? I know cache sizes can be found via performance testing, but what about write intervals?
>>>>> "d" == Don <don at blacksun.org> writes: >>>>> "hk" == Haudy Kazemi <kaze0010 at umn.edu> writes:d> You could literally split a sata cable and add in some d> capacitors for just the cost of the caps themselves. no, this is no good. The energy only flows in and out of the capacitor when the voltage across it changes. In this respect they are different from batteries. It''s normal to use (non-super) capacitors as you describe for filters next to things drawing power in a high-frequency noisy way, but to use them for energy storage across several seconds you need a switching supply to drain the energy from it. the step-down and voltage-pump kinds of switchers are non-isolated and might do fine, and are cheaper than full-fledged DC-DC that are isolated (meaning the input and output can float wrt each other). you can charge from 12V and supply 5V if that''s cheaper. :) hope it works. hk> "okay, we''ve waited 5 seconds for additional data to arrive to hk> be written. None has arrived in the last 5 seconds, so we''re hk> going to write what we already have to better ensure data hk> integrity, yeah, I am worried about corner cases like this. ex: input power to the SSD becomes scratchy or sags, but power to the host and controller remain fine. Writes arrive continuously. The SSD sees nothing wrong with its power and continues to accept and acknowledge writes. Meanwhile you burn through your stored power hiding the sagging supply until you can''t, then the SSD loses power suddenly and drops a bunch of writes on the floor. That is why I drew that complicated state diagram in which the pod disables and holds-down the SATA connection once it''s running on reserve power. Probably y''all don''t give a fuck about such corners though, nor do many of the manufacturers selling this stuff, so, whatever. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100524/598363c2/attachment.bin>
This thread has grown giant, so apologies for screwing up threading with an out of place reply. :) So, as far as SF-1500 based SSD''s, the only ones currently in existence are the Vertex 2 LE and Vertex 2 EX, correct (I understand the Vertex 2 Pro was never mass produced)? Both of these are based on MLC and not SLC -- why isn''t that an issue for longevity? Any other SF-1500 options out there? We continue to use UPS-backed Intel X-25E''s for ZIL. Ray
On Mon, May 24, 2010 at 11:30:20AM -0700, Ray Van Dolson wrote:> This thread has grown giant, so apologies for screwing up threading > with an out of place reply. :) > > So, as far as SF-1500 based SSD''s, the only ones currently in existence > are the Vertex 2 LE and Vertex 2 EX, correct (I understand the Vertex 2 > Pro was never mass produced)? > > Both of these are based on MLC and not SLC -- why isn''t that an issue > for longevity? > > Any other SF-1500 options out there? > > We continue to use UPS-backed Intel X-25E''s for ZIL.>From earlier in the thread, it sounds like none of the SF-1500 baseddrives even have a supercap, so it doesn''t seem that they''d necessarily be a better choice than the SLC-based X-25E at this point unless you need more write IOPS... Ray
> > > > From earlier in the thread, it sounds like none of the SF-1500 based > drives even have a supercap, so it doesn''t seem that they''d necessarily > be a better choice than the SLC-based X-25E at this point unless you > need more write IOPS... > > Ray >I think the upcoming OCZ Vertex 2 Pro will have a supercap. I just bought a ocz vertex le, it doesn''t have a supercap but it DOES have some awesome specs otherwise.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100524/9a499b15/attachment.html>
Hi, 1): Is it possible to do it? 2): What is the backplane hardware requirement for "luxadm led_blink" to work to bring Disk LED to the Blink Mode. Thanks. Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100524/5bb0eaaa/attachment.html>