thr3ads.net - zfs discuss - [zfs-discuss] New SSD options [May 2010]

If this information is useful, please help other people find it:
Share via:

Don

2010-May-18 23:28 UTC

[zfs-discuss] New SSD options

I''m looking for alternatives SSD options to the Intel X25-E and the
ZEUS IOPS.

The ZEUS IOPS would probably cost as much as my entire current disk system (80
15k SAS drives)- and that''s just silly.

The Intel is much less expensive, and while fast- pales in comparison to the
ZEUS.

I''ve allocated 4 disk slots in my array for ZIL SSD''s and
I''m trying to find the best performance for my dollar.

With that in mind- Is anyone using the new OCZ Vertex 2 SSD''s as a ZIL?

http://www.ocztechnology.com/products/solid-state-drives/2-5--sata-ii/performance-enterprise-solid-state-drives/ocz-vertex-2-sata-ii-2-5--ssd.html

They''re claiming 50k IOPS (4k Write- Aligned), 2 million hour MTBF,
TRIM support, etc. That''s more write IOPS than the ZEUS (40k IOPS,
$$$$$) but at half the price of an Intel X25-E (3.3k IOPS, $400).

Needless to say I''d love to know if anyone has evaluated these drives
to see if they make sense as a ZIL- for example- do they honor cache flush
requests? Are those sustained IOPS numbers?
-- 
This message posted from opensolaris.org

thomas

2010-May-19 06:09 UTC

head link

[zfs-discuss] New SSD options

40k IOPS sounds like "best in case, you''ll never see it in the
real world" marketing to me. There are a few benchmarks if you google and
they all seem to indicate the performance is probably +/- 10% of an intel x25-e.
I would personally trust intel over one of these drives.

Is it even possible to buy a zeus iops anywhere? I haven''t been able to
find one. I get the impression they mostly sell to other vendors like sun?
I''d be curious what the price is on a 9GB zeus iops is these days?
-- 
This message posted from opensolaris.org

sensille

2010-May-19 06:32 UTC

head link

[zfs-discuss] New SSD options

Don wrote:> 
> With that in mind- Is anyone using the new OCZ Vertex 2 SSD''s as a
ZIL?
> 
> They''re claiming 50k IOPS (4k Write- Aligned), 2 million hour
MTBF, TRIM support, etc. That''s more write IOPS than the ZEUS (40k
IOPS, $$$$$) but at half the price of an Intel X25-E (3.3k IOPS, $400).
> 
> Needless to say I''d love to know if anyone has evaluated these
drives to see if they make sense as a ZIL- for example- do they honor cache
flush requests? Are those sustained IOPS numbers?
In my understanding nearly the only relevant number is the number
of cache flushes a drive can handle per second, as this determines
my single thread performance.
Has anyone an idea what numbers I can expect from an Intel X25-E or
an OCZ Vertex 2?

-Arne

Brandon High

2010-May-19 06:43 UTC

head link

[zfs-discuss] New SSD options

On Tue, May 18, 2010 at 4:28 PM, Don <don at blacksun.org>
wrote:> With that in mind- Is anyone using the new OCZ Vertex 2 SSD''s as a
ZIL?
The current Sandforce drives out don''t have an ultra-capacitor on
them, so they could lose data if the system crashed. There are
supposed to be enterprise class drives based on the chipset out that
do have an ultra-cap released "any day now".
> Needless to say I''d love to know if anyone has evaluated these
drives to see if they make sense as a ZIL- for example- do they honor cache
flush requests? Are those sustained IOPS numbers?
I don''t think they do, the chipset was designed to use an ultra-cap to
avoid having to honor flushes. Then again, the X25-E has the same
problem.

-B

-- 
Brandon High : bhigh at freaks.com

Ragnar Sundblad

2010-May-19 07:18 UTC

head link

[zfs-discuss] New SSD options

On 2010-05-19 08.32, sensille wrote:> Don wrote:
>>
>> With that in mind- Is anyone using the new OCZ Vertex 2 SSD''s
as a ZIL?
>>
>> They''re claiming 50k IOPS (4k Write- Aligned), 2 million hour
MTBF, TRIM support, etc. That''s more write IOPS than the ZEUS (40k
IOPS, $$$$$) but at half the price of an Intel X25-E (3.3k IOPS, $400).
>>
>> Needless to say I''d love to know if anyone has evaluated these
drives to see if they make sense as a ZIL- for example- do they honor cache
flush requests? Are those sustained IOPS numbers?
>
> In my understanding nearly the only relevant number is the number
> of cache flushes a drive can handle per second, as this determines
> my single thread performance.
> Has anyone an idea what numbers I can expect from an Intel X25-E or
> an OCZ Vertex 2?
I don''t know about OCZ Vertex 2, but the Intel X25-E
roughly halves it''s IOPS number when you disable it''s
write cache (IIRC, it was in the range 1300-1600
writes/s or so).
Since it ignores Cache Flush command and it doesn''t
have any persistant buffer storage, disabling the write
cache is the best you can do.
Note that there were reports of the Intel X25-E loosing
a write even though you had the write cache disabled!
Since they still haven''t fixed this, after more than a
year on the market, I believe it rather qualifies into
the "hardly usable toy" class. I am very disappointed,
I had hopes for a new class of cheap but usable flash
drives. Maybe some day...

/ragge

Don

2010-May-19 12:29 UTC

head link

[zfs-discuss] New SSD options

Well- 40k IOPS is the current claim from ZEUS- and they''re the
benchmark. They use to be 17k IOPS. How real any of these numbers are from any
manufacturer is a guess.

Given the Intel''s refusal to honor a cache flush, and their performance
problems with the cache disabled- I don''t trust them any more than
anyone else right now.

As for the Vertex drives- if they are within +-10% of the Intel they''re
still doing it for half of what the Intel drive costs- so it''s an
option- not a great option- but still an option.
-- 
This message posted from opensolaris.org

Yuri Vorobyev

2010-May-19 13:03 UTC

head link

[zfs-discuss] New SSD options

> As for the Vertex drives- if they are within +-10% of the Intel
they''re still doing it for half of what the Intel drive costs- so
it''s an option- not a great option- but still an option.Yes, but Intel is SLC. Much more endurance.

David Magda

2010-May-19 14:29 UTC

head link

[zfs-discuss] New SSD options

On Wed, May 19, 2010 02:09, thomas wrote:
> Is it even possible to buy a zeus iops anywhere? I haven''t been
able to
> find one. I get the impression they mostly sell to other vendors like sun?
> I''d be curious what the price is on a 9GB zeus iops is these days?
Correct, their Zeus products are only available to OEMs.

Don

2010-May-19 21:22 UTC

head link

[zfs-discuss] New SSD options

Well the larger size of the Vertex, coupled with their smaller claimed write
amplification should result in sufficient service life for my needs. Their
claimed MTBF also matches the Intel X25-E''s.
-- 
This message posted from opensolaris.org

Don

2010-May-19 21:29 UTC

head link

[zfs-discuss] New SSD options

"Since it ignores Cache Flush command and it doesn''t have any
persistant buffer storage, disabling the write cache is the best you can
do."

This actually brings up another question I had: What is the risk, beyond a few
seconds of lost writes, if I lose power, there is no capacitor and the cache is
not disabled?

My ZFS system is shared storage for a large VMWare based QA farm. If I lose
power then a few seconds of writes are the least of my concerns. All of the QA
tests will need to be restarted and all of the file systems will need to be
checked. A few seconds of writes won''t make any difference unless it
has the potential to affect the integrity of the pool itself.

Considering the performance trade-off, I''d happily give up a few
seconds worth of writes for significantly improved IOPS.
-- 
This message posted from opensolaris.org

Nicolas Williams

2010-May-19 21:41 UTC

head link

[zfs-discuss] New SSD options

On Wed, May 19, 2010 at 02:29:24PM -0700, Don wrote:> "Since it ignores Cache Flush command and it doesn''t have any
> persistant buffer storage, disabling the write cache is the best you
> can do."
> 
> This actually brings up another question I had: What is the risk,
> beyond a few seconds of lost writes, if I lose power, there is no
> capacitor and the cache is not disabled?
You can lose all writes from the last committed transaction (i.e., the
one before the currently open transaction).  (You also lose writes from
the currently open transaction, but that''s unavoidable in any system.)

Nowadays the system will let you know at boot time that the last
transaction was not committed properly and you''ll have a chance to go
back to the previous transaction.

For me, getting much-better-than-disk performance out of an SSD with
cache disabled is enough to make that SSD worthwhile, provided the price
is right of course.

Nico
--

Richard Elling

2010-May-19 21:41 UTC

head link

[zfs-discuss] New SSD options

On May 19, 2010, at 2:29 PM, Don wrote:
> "Since it ignores Cache Flush command and it doesn''t have any
persistant buffer storage, disabling the write cache is the best you can
do."
> 
> This actually brings up another question I had: What is the risk, beyond a
few seconds of lost writes, if I lose power, there is no capacitor and the cache
is not disabled?
The data risk is a few moments of data loss. However, if the order of the
uberblock updates is not preserved (which is why the caches are flushed)
then recovery from a reboot may require manual intervention.  The amount
of manual intervention could be significant for builds prior to b128.
> My ZFS system is shared storage for a large VMWare based QA farm. If I lose
power then a few seconds of writes are the least of my concerns. All of the QA
tests will need to be restarted and all of the file systems will need to be
checked. A few seconds of writes won''t make any difference unless it
has the potential to affect the integrity of the pool itself.
> 
> Considering the performance trade-off, I''d happily give up a few
seconds worth of writes for significantly improved IOPS.
Space, dependability, performance: pick two :-)
 -- richard

-- 
Richard Elling
richard at nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/

Don

2010-May-19 22:20 UTC

head link

[zfs-discuss] New SSD options

"You can lose all writes from the last committed transaction (i.e., the
one before the currently open transaction)."

And I don''t think that bothers me. As long as the array itself
doesn''t go belly up- then a few seconds of lost transactions are
largely irrelevant- all of the QA virtual machines are going to have to be
rolled back to their initial states anyway.
-- 
This message posted from opensolaris.org

Don

2010-May-19 22:24 UTC

head link

[zfs-discuss] New SSD options

"You can lose all writes from the last committed transaction (i.e., the
one before the currently open transaction)."

I''ll pick one- performance :)

Honestly- I wish I had a better grasp on the real world performance of these
drives. 50k IOPS is nice- and considering the incredible likelihood of data
duplication in my environment- the SandForce controller seems like a win. That
said- does anyone have a good set of real world performance numbers for these
drives that you can link to?
-- 
This message posted from opensolaris.org

Ragnar Sundblad

2010-May-20 06:44 UTC

head link

[zfs-discuss] New SSD options

On 20 maj 2010, at 00.20, Don wrote:
> "You can lose all writes from the last committed transaction (i.e.,
the
> one before the currently open transaction)."
> 
> And I don''t think that bothers me. As long as the array itself
doesn''t go belly up- then a few seconds of lost transactions are
largely irrelevant- all of the QA virtual machines are going to have to be
rolled back to their initial states anyway.
Ok - then you are in the dream situation, and your solution could be
free of charge, a one-liner command, and perform better than any
SSD on the market:

Disable the ZIL. You will loose up to 30 seconds of the lastly
written data, and if you use it as a NFS server your clients may
get confused after a crash since the server is not in the state
it should be in. 
You could also turn down the ZFS transaction timeout to loose
less than 30 seconds if you want.
Your pool will always be in a consistent shape on disk (if you
have hardware that behaves).

Remember to NEVER use this pool to anything that actually want
better data persistency, that this is a pool tuned specifically
for a very special case.

In very recent opensolaris there is a zpool property for this,
earlier you had to set a kernel flag when mounting the pool
(and having it unset when mounting other pools, if you want
them to have ZIL enabled).

/ragge

Travis Tabbal

2010-May-20 18:12 UTC

head link

[zfs-discuss] New SSD options

> On May 19, 2010, at 2:29 PM, Don wrote:
 > The data risk is a few moments of data loss. However,
> if the order of the
> uberblock updates is not preserved (which is why the
> caches are flushed)
> then recovery from a reboot may require manual
> intervention.  The amount
> of manual intervention could be significant for
> builds prior to b128.

This risk is mostly mitigated by UPS backup and auto-shutdown when the UPS
detects power loss, correct? Outside of pulling the plug that should solve power
related problems. Kernel panics should only be caused by hardware issues, which
might corrupt the disk data anyway. Obviously software can and does fail, but
the biggest problem I hear about with ZIL devices is behavior in a sudden power
loss situation. It seems to me that UPS backup along with starting a shutdown
cycle before complete power failure should prevent most issues.

Seems like that should help with issues like the X25-E not honoring cache flush
as well, the UPS would give it time to finish the writes. Again, without a
firmware issue in the drive itself. Should be about the same as a supercap
anyway.
-- 
This message posted from opensolaris.org

David Magda

2010-May-20 18:35 UTC

head link

[zfs-discuss] New SSD options

On Thu, May 20, 2010 14:12, Travis Tabbal wrote:>> On May 19, 2010, at 2:29 PM, Don wrote:
>
>> The data risk is a few moments of data loss. However,
>> if the order of the
>> uberblock updates is not preserved (which is why the
>> caches are flushed)
>> then recovery from a reboot may require manual
>> intervention.  The amount
>> of manual intervention could be significant for
>> builds prior to b128.
>
>
> This risk is mostly mitigated by UPS backup and auto-shutdown when the UPS
> detects power loss, correct?
Unless you have a contractor working in the server room that bumps into
the UPS and causes a power glitch which causes a whole bunch of equipment
to cycle.

Happened at $WORK (in another office) just two weeks ago.

It all depends on your level of paranoia.

Ragnar Sundblad

2010-May-20 19:19 UTC

head link

[zfs-discuss] New SSD options

On 20 maj 2010, at 20.35, David Magda wrote:
> On Thu, May 20, 2010 14:12, Travis Tabbal wrote:
>>> On May 19, 2010, at 2:29 PM, Don wrote:
>> 
>>> The data risk is a few moments of data loss. However,
>>> if the order of the
>>> uberblock updates is not preserved (which is why the
>>> caches are flushed)
>>> then recovery from a reboot may require manual
>>> intervention.  The amount
>>> of manual intervention could be significant for
>>> builds prior to b128.
>> 
>> 
>> This risk is mostly mitigated by UPS backup and auto-shutdown when the
UPS
>> detects power loss, correct?
> 
> Unless you have a contractor working in the server room that bumps into
> the UPS and causes a power glitch which causes a whole bunch of equipment
> to cycle.
> 
> Happened at $WORK (in another office) just two weeks ago.
Or, a zillion of other problem modes with that setup, all from problems
with the UPS, to the auto-shutdown communication signaling system,
a problem with the computer system, the electrical distribution, or
anything else.

Building complex solutions to solve critical issues is IMHO seldom a
very good solution. If you care about data integrity, buy stuff
that do what they are supposed to do, and keep everything simple.
Redundancy is often good, but keep the switchover mechanisms as
simple and as few as possible. Choose mechanisms that can and will
be tested regularly - and don''t use systems that are almost never
used and/or tested.
Complex systems tend to fail, especially after some time when things
have changed a bit, or even cause more outages in themselves. They
are hard to test, maintain and understand, and they are often costly
to buy too. KISS, you know.

In the Intel X25 case - bug them until they release new firmware - they
have sold you a defect product that they still haven''t fixed.
If they don''t fix it and you need it, get another drive.
> It all depends on your level of paranoia.
Either that, or you may have some kind of protocol, policy,
contract, SLA or similar that you have to follow.

(In any case it is often really hard to even guess how much a
certain change gives or takes in availability numbers.)

Just my 5 ?re.

/ragge

Miles Nordin

2010-May-20 19:26 UTC

head link

[zfs-discuss] New SSD options

>>>>> "d" == Don <don at blacksun.org> writes:
d> "Since it ignores Cache Flush command and it doesn''t
have any
d> persistant buffer storage, disabling the write cache is the
d> best you can do." This actually brings up another question I
d> had: What is the risk, beyond a few seconds of lost writes, if
d> I lose power, there is no capacitor and the cache is not
d> disabled?

why use a slog at all if it''s not durable? You should disable the ZIL
instead. Compared to a slog that ignores cache flush, disabling the
ZIL will provide the same guarantees to the application w.r.t. write
ordering preserved, and the same problems with NFS server reboots,
replicated databases, mail servers. It''ll be faster than the
fake-slog. It''ll be less risk of losing the pool because the slog
went bad and then you accidentally exported the pool while trying to
fix things.

The only case where you are ahead with the fake-slog, is the host''s
going down because of kernel panics rather than power loss.

I don''t know, though, what to do about these reports of devices that
almost respect cache flushes but seem to lose exactly one transaction.
AFAICT this should be a works/doesntwork situation, not a continuum.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100520/7a5ffa8b/attachment.bin>

Bill Sommerfeld

2010-May-20 20:12 UTC

head link

[zfs-discuss] New SSD options

On 05/20/10 12:26, Miles Nordin wrote:> I don''t know, though, what to do about these reports of devices
that
> almost respect cache flushes but seem to lose exactly one transaction.
> AFAICT this should be a works/doesntwork situation, not a continuum.
But there''s so much brokenness out there.  I''ve seen similar
"tail drop"
behavior before -- last write or two before a hardware reset goes into 
the bit bucket, but ones before that are durable.

So, IMHO, a cheap consumer ssd used as a zil may still be worth it (for 
some use cases) to narrow the window of data loss from ~30 seconds to a 
sub-second value.

					- Bill

Miles Nordin

2010-May-20 20:35 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

>>>>> "rsk" == Roy Sigurd Karlsbakk <roy at
karlsbakk.net> writes:
>>>>> "dm" == David Magda <dmagda at
ee.ryerson.ca> writes:
>>>>> "tt" == Travis Tabbal <travis at
tabbal.net> writes:
rsk> Disabling ZIL is, according to ZFS best practice, NOT
rsk> recommended.

dm> As mentioned, you do NOT want to run with this in production,
dm> but it is a quick way to check.

REPEAT: I disagree.

Once you associate the disasterizing and dire warnings from the
developer''s advice-wiki with the specific problems that ZIL-disabling
causes for real sysadmins rather than abstract notions of
``POSIX'''' or
``the application'''', a lot more people end up wanting to
disable their
ZIL''s.

In fact, most of the SSD''s sold seem to be relying on exactly the
trick disabled-ZIL ZFS does for much of their high performance, if not
their feasibility within their price bracket period: provide a
guarantee of write ordering without durability, and many applications
are just, poof, happy.

If the SSD''s arrange that no writes are reordered across a SYNC CACHE,
but don''t bother actually providing durability, end uzarZ will ``OMG
windows fast and no corruption.'''' --> ssd sales.

The ``do-not-disable-buy-SSD!!!1!'''' advice thus translates to
``buy
one of these broken SSD''s, and you will be basically happy. Almost
everyone is. When you aren''t, we can blame the SSD instead of
ZFS.''''
all that bottlenecked SATA traffic host<->SSD is just CYA and of no
real value (except for kernel panics).

Now, if someone would make a Battery FOB, that gives broken SSD 60
seconds of power, then we could use the consumer crap SSD''s in servers
again with real value instead of CYA value. FOB should work like
this:

== RUNNING = battery ,-------> SATA port:
pass -----.
recharged? / power to SSD: on \ input
/ \ power
( . lost
| |
. input ,---\ v
power / v
restored / =power lost =power restored=
. =hold-down =hold down = -- SATA port: block
power to SSD: off power to SSD: on
^ |
| |
. . 60 seconds
input \ / elapsed
power . =power off= ,
restored -------- power to SSD: off <-

The device must know when its battery has gone bad and stick itself in
``power restored hold down'''' state. Knowing when the battery
is bad
may require more states to test the battery, but this is the general
idea.

I think it would be much cheaper to build an SSD with supercap, and
simpler because you can assume the supercap is good forever instead of
testing it. However because of ``market forces'''' the FOB
approach
might sell for cheaper because the FOB cannot be tied to the SSD and
used as a way to segment the market. If there are 2 companies making
only FOB''s and not making SSD''s, only then competition will
work like
people want it to. Otherwise FOBs will be $1000 or something because
only ``enterprise'''' users are smart/dumb enough to demand
them.

Normally I would have a problem that the FOB and SSD are separable,
but see, the FOB and SSD can be put together with double-sided tape:
the tape only has to hold for 60 seconds after $event, and there''s no
way to separate the two by tripping over a cord. You can safely move
SSD+FOB from one chassis to another without fearing all is lost if you
jiggle the connection. I think it''s okay overall.

tt> This risk is mostly mitigated by UPS backup and auto-shutdown
tt> when the UPS detects power loss, correct?

no no it''s about cutting off a class of failure cases and constraining
ourselves to relatively sane forms of failure. We are not haggling
about NO FAILURES EVAR yet. First, for STEP 1 we isolate the insane
kinds of failure that cost us days or months of data rather than just
a few seconds, the kinds that call for crazy unplannable ad-hoc
recovery methods like `Viktor plz help me'' and ``is anyone here a
Postgres data recovery expert?'''' and ``is there a way I can
invalidate
the batch of billing auth requests I uploaded yesterday so I can rerun
it without double-billing anyone?'''' For STEP 1 we make the
insane
fail almost impossible through clever software and planning. A UPS
never never ever qualifies as ``almost impossible''''.

Then, once that''s done, we come back for STEP 2 where we try to
minimize the sane failures also, and for step 2 things like UPS might
be useful. For STEP 2 it makes sense to talk about percent
availability, probability of failure, length of time to recover from
Scenario X. but in STEP 1 all the failures are insane ones, so you
cannot measure any of these things. UPS is not about how
``paranoid''''
you are or how far you want to take STEP 1. you take STEP 1 all the
way to completion before worrying about STEP 2.

For NFS, the STEP 1 risk on the table is ``server reboots, client does
not.'''' It is okay if both reboot at once. It is okay if
neither
reboots. but if you

disable ZIL
OR
have broken SSD like X25
AND
NFS server reboots, client doesn''t

then you have a STEP 1 insane failure case that can cause corrupted
database files or virtual disk images on the NFS clients.

For example if you fail to complete STEP 1, and then you plug the NFS
clients into a more expensive UPS with proper transfer switches for
maintenance and A/B power, and the server into a rather ordinary UPS,
then you will be at greater risk of this particular NFS problem than
if you used no UPS at all. That''s not intuitive! But it''s
true! This
comes from putting step 2 before step 1. You must do them in order if
you want to stay sane.

If you do not care about this NFS problem (or the others) then maybe
you can just disable the ZIL. It is a matter of working through step
1. Working through STEP 1 might be ``doesn''t affect us. Disable
ZIL.'''' Or it might be ``get slog with
supercap''''. STEP 1 will never
be ``plug in OCZ Vertex cheaposlog that ignores cacheflush''''
if you
are doing it right. And Step 2 has nothing to do with anything yet
until we finish STEP 1 and the insane failure cases.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100520/f118b607/attachment.bin>

Miika Vesti

2010-May-20 21:23 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

> If you do not care about this NFS problem (or the others) then maybe
> you can just disable the ZIL.  It is a matter of working through step
> 1.  Working through STEP 1 might be ``doesn''t affect us.  Disable
> ZIL.''''  Or it might be ``get slog with
supercap''''.  STEP 1 will never
> be ``plug in OCZ Vertex cheaposlog that ignores
cacheflush'''' if you
> are doing it right.  And Step 2 has nothing to do with anything yet
> until we finish STEP 1 and the insane failure cases.
AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile 
NAND grid. Whether it respects or ignores the cache flush seems irrelevant.

There has been previous discussion about this: 
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702

"I''m pretty sure that all SandForce-based SSDs don''t use
DRAM as their
cache, but take a hunk of flash to use as scratch space instead. Which
means that they''ll be OK for ZIL use."

Also:
http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html

"Another benefit of SandForce''s architecture is that the SSD keeps
information on the NAND grid and removes the need for a separate cache 
buffer DRAM module. The result is a faster transaction, albeit at the 
expense of total storage capacity."

"So if I interpret them correctly, what they chose to do with the 
current incarnation of the architecture is actually reserve some of the 
primary memory capacity for I/O transaction management."

"In plain English, if the system gets interrupted either by power or by 
a crash, when it initializes the next time, it can read from its 
transaction space and "resume" where it left off. This makes it
durable."

So, OCZ Vertex 2 seems to be a good choice for ZIL.

Travis Tabbal

2010-May-20 22:25 UTC

head link

[zfs-discuss] New SSD options

> use a slog at all if it''s not durable?  You should
> disable the ZIL
> instead. 

This is basically where I was going. There only seems to be one SSD that is
considered "working", the Zeus IOPS. Even if I had the money, I
can''t buy it. As my application is a home server, not a datacenter,
things like NFS breaking if I don''t reboot the clients is a non-issue.
As long as the on-disk data is consistent so I don''t have to worry
about the entire pool going belly-up, I''m happy enough. I might lose 30
seconds of data, worst case, as a result of running without ZIL. Considering
that I can''t buy a proper ZIL at a cost I can afford, and an improper
ZIL is not worth much, I don''t see a reason to bother with ZIL at all.
I''ll just get a cheap large SSD for L2ARC, disable ZIL, and call it a
day.

For my use, I''d want a device in the $200 range to even consider an
slog device. As nothing even remotely close to that price range exists that will
work properly at all, let alone with decent performance, I see no point in ZIL
for my application. The performance hit is just too severe to continue using it
without an slog, and there''s no slog device I can afford that works
properly, even if I ignore performance.
-- 
This message posted from opensolaris.org

Ross Walker

2010-May-20 22:53 UTC

head link

[zfs-discuss] New SSD options

On May 20, 2010, at 6:25 PM, Travis Tabbal <travis at tabbal.net> wrote:
>> use a slog at all if it''s not durable?  You should
>> disable the ZIL
>> instead.
>
>
> This is basically where I was going. There only seems to be one SSD  
> that is considered "working", the Zeus IOPS. Even if I had the  
> money, I can''t buy it. As my application is a home server, not a  
> datacenter, things like NFS breaking if I don''t reboot the clients
> is a non-issue. As long as the on-disk data is consistent so I
don''t
> have to worry about the entire pool going belly-up, I''m happy  
> enough. I might lose 30 seconds of data, worst case, as a result of  
> running without ZIL. Considering that I can''t buy a proper ZIL at
a
> cost I can afford, and an improper ZIL is not worth much, I don''t
> see a reason to bother with ZIL at all. I''ll just get a cheap
large
> SSD for L2ARC, disable ZIL, and call it a day.
>
> For my use, I''d want a device in the $200 range to even consider
an
> slog device. As nothing even remotely close to that price range  
> exists that will work properly at all, let alone with decent  
> performance, I see no point in ZIL for my application. The  
> performance hit is just too severe to continue using it without an  
> slog, and there''s no slog device I can afford that works properly,
> even if I ignore performance.
Just buy a caching RAID controller and run it in JBOD mode and have  
the ZIL integrated with the pool.

A 512MB-1024MB card with battery backup should do the trick. It might  
not have the capacity of an SSD, but in my experience it works well in  
the 1TB data moderately loaded range.

Have more data/activity then try more cards and more pools, otherwise  
pony up the $$$$ for a capacitor backed SSD.

-Ross

Ragnar Sundblad

2010-May-20 23:17 UTC

head link

[zfs-discuss] New SSD options

On 21 maj 2010, at 00.53, Ross Walker wrote:
> On May 20, 2010, at 6:25 PM, Travis Tabbal <travis at tabbal.net>
wrote:
> 
>>> use a slog at all if it''s not durable?  You should
>>> disable the ZIL
>>> instead.
>> 
>> 
>> This is basically where I was going. There only seems to be one SSD
that is considered "working", the Zeus IOPS. Even if I had the money,
I can''t buy it. As my application is a home server, not a datacenter,
things like NFS breaking if I don''t reboot the clients is a non-issue.
As long as the on-disk data is consistent so I don''t have to worry
about the entire pool going belly-up, I''m happy enough. I might lose 30
seconds of data, worst case, as a result of running without ZIL. Considering
that I can''t buy a proper ZIL at a cost I can afford, and an improper
ZIL is not worth much, I don''t see a reason to bother with ZIL at all.
I''ll just get a cheap large SSD for L2ARC, disable ZIL, and call it a
day.
>> 
>> For my use, I''d want a device in the $200 range to even
consider an slog device. As nothing even remotely close to that price range
exists that will work properly at all, let alone with decent performance, I see
no point in ZIL for my application. The performance hit is just too severe to
continue using it without an slog, and there''s no slog device I can
afford that works properly, even if I ignore performance.
> 
> Just buy a caching RAID controller and run it in JBOD mode and have the ZIL
integrated with the pool.
> 
> A 512MB-1024MB card with battery backup should do the trick. It might not
have the capacity of an SSD, but in my experience it works well in the 1TB data
moderately loaded range.
> 
> Have more data/activity then try more cards and more pools, otherwise pony
up the $$$$ for a capacitor backed SSD.
It - again - depends on what problem you are trying to solve.

If the RAID controller goes bad on you so that you loose the
data in the write cache, your file system could be in pretty bad
shape. Most RAID controllers can''t be mirrored. That would hardly
make a good replacement for a mirrored ZIL.

As far as I know, there is no single silver bullet to this issue.

/ragge

Richard Elling

2010-May-21 00:43 UTC

head link

[zfs-discuss] New SSD options

On May 20, 2010, at 1:12 PM, Bill Sommerfeld wrote:
> On 05/20/10 12:26, Miles Nordin wrote:
>> I don''t know, though, what to do about these reports of
devices that
>> almost respect cache flushes but seem to lose exactly one transaction.
>> AFAICT this should be a works/doesntwork situation, not a continuum.
> 
> But there''s so much brokenness out there.  I''ve seen
similar "tail drop" behavior before -- last write or two before a
hardware reset goes into the bit bucket, but ones before that are durable.
> 
> So, IMHO, a cheap consumer ssd used as a zil may still be worth it (for
some use cases) to narrow the window of data loss from ~30 seconds to a
sub-second value.
+1
 -- richard

-- 
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/

Don

2010-May-21 03:46 UTC

head link

[zfs-discuss] New SSD options

> So, IMHO, a cheap consumer ssd used as a zil may still be worth it (for
> some use cases) to narrow the window of data loss from ~30 seconds to a
> sub-second value.There are lots of reasons to enable the ZIL now- I can throw four very
inexpensive SSD''s in there now in a pair of mirrors, and then when a
better drive comes along I can replace each half of the mirror without bringing
anything down. My slots are already allocated and it would be nice to save a few
extra seconds of writes- just in case. It''s not a great solution- but
nothing is. I don''t have access to a ZEUS- and even if I did- I
wouldn''t pay that kind of money for what amounts to a Vertex 2 Pro but
with SLC flash.

I''m kind of flabbergasted that no one has simply stuck a capacitor on a
more reasonable drive. I guess the market just isn''t big enough- but I
find that hard to believe.

Right now it seems like the options are all or nothing. There''s just no
%^$#^ middle ground.
-- 
This message posted from opensolaris.org

thomas

2010-May-21 11:09 UTC

head link

[zfs-discuss] New SSD options

On the PCIe side, I noticed there''s a new card coming from LSI that
claims 150,000 4k random writes. Unfortunately this might end up being an
OEM-only card.

I also notice on the ddrdrive site that they now have an opensolaris driver and
are offering it in a beta program.
-- 
This message posted from opensolaris.org

Ross Walker

2010-May-21 13:50 UTC

head link

[zfs-discuss] New SSD options

On May 20, 2010, at 7:17 PM, Ragnar Sundblad <ragge at csc.kth.se> wrote:
>
> On 21 maj 2010, at 00.53, Ross Walker wrote:
>
>> On May 20, 2010, at 6:25 PM, Travis Tabbal <travis at tabbal.net>
wrote:
>>
>>>> use a slog at all if it''s not durable?  You should
>>>> disable the ZIL
>>>> instead.
>>>
>>>
>>> This is basically where I was going. There only seems to be one  
>>> SSD that is considered "working", the Zeus IOPS. Even if
I had the
>>> money, I can''t buy it. As my application is a home server,
not a
>>> datacenter, things like NFS breaking if I don''t reboot the
clients
>>> is a non-issue. As long as the on-disk data is consistent so I  
>>> don''t have to worry about the entire pool going belly-up,
I''m
>>> happy enough. I might lose 30 seconds of data, worst case, as a  
>>> result of running without ZIL. Considering that I can''t
buy a
>>> proper ZIL at a cost I can afford, and an improper ZIL is not  
>>> worth much, I don''t see a reason to bother with ZIL at
all. I''ll
>>> just get a cheap large SSD for L2ARC, disable ZIL, and call it a  
>>> day.
>>>
>>> For my use, I''d want a device in the $200 range to even
consider
>>> an slog device. As nothing even remotely close to that price range
>>> exists that will work properly at all, let alone with decent  
>>> performance, I see no point in ZIL for my application. The  
>>> performance hit is just too severe to continue using it without an
>>> slog, and there''s no slog device I can afford that works
properly,
>>> even if I ignore performance.
>>
>> Just buy a caching RAID controller and run it in JBOD mode and have  
>> the ZIL integrated with the pool.
>>
>> A 512MB-1024MB card with battery backup should do the trick. It  
>> might not have the capacity of an SSD, but in my experience it  
>> works well in the 1TB data moderately loaded range.
>>
>> Have more data/activity then try more cards and more pools,  
>> otherwise pony up the $$$$ for a capacitor backed SSD.
>
> It - again - depends on what problem you are trying to solve.
>
> If the RAID controller goes bad on you so that you loose the
> data in the write cache, your file system could be in pretty bad
> shape. Most RAID controllers can''t be mirrored. That would hardly
> make a good replacement for a mirrored ZIL.
>
> As far as I know, there is no single silver bullet to this issue.
That is true, and there at finite budgets as well and as all things in  
life one must make a trade-off somewhere.

If you have 2 mirrored SSDs that don''t support cache flush and your  
power goes out your file system will be in the same bad shape.  
Difference is in the first place you paid a lot less to have your data  
hosed.

-Ross

Attila Mravik

2010-May-21 14:09 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND
> grid. Whether it respects or ignores the cache flush seems irrelevant.
>
> There has been previous discussion about this:
> http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702
>
> "I''m pretty sure that all SandForce-based SSDs don''t
use DRAM as their
> cache, but take a hunk of flash to use as scratch space instead. Which
> means that they''ll be OK for ZIL use."
>
> Also:
> http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html
>
> "Another benefit of SandForce''s architecture is that the SSD
keeps
> information on the NAND grid and removes the need for a separate cache
> buffer DRAM module. The result is a faster transaction, albeit at the
> expense of total storage capacity."
>
> "So if I interpret them correctly, what they chose to do with the
current
> incarnation of the architecture is actually reserve some of the primary
> memory capacity for I/O transaction management."
>
> "In plain English, if the system gets interrupted either by power or
by a
> crash, when it initializes the next time, it can read from its transaction
> space and "resume" where it left off. This makes it
durable."
>
Here is a detailed explanation of the SandForce controllers:
http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal

So the SF-1500 is enterprise class and relies on a supercap, the
SF-1200 is consumer class and does not rely on a supercap.

"The SF-1200 firmware on the other hand doesn?t assume the presence of
a large capacitor to keep the controller/NAND powered long enough to
complete all writes in the event of a power failure. As such it does
more frequent check pointing and doesn?t guarantee the write in
progress will complete before it?s acknowledged."

As I understand it, the SF-1200 will ack the sync write only after it
is written to flash thus reducing write performance.

There is an interesting part about firmwares and OCZ having an
exclusive firmware in the Vertex 2 series which based on the SF-1200
but its random write IOPS is not capped at 10K (while other vendors
and other SSDs from OCZ using the SF-1200 are capped, unless they sell
the drive with the RC firmware which is for OEM evaluation and not
production ready but does not contain the IOPS cap).

Miika Vesti

2010-May-21 15:14 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

This is intresting. I thought all Vertex 2 SSDs are good choices for ZIL 
but this does not seem to be the case.

According to http://www.legitreviews.com/article/1208/1/ Vertex 2 LE, 
Vertex 2 Pro and Vertex 2 EX are SF-1500 based but Vertex 2 (without any 
suffix) is SF-1200 based.

Here is the table:
Model        Controller Max Read Max Write IOPS
Vertex 2     SF-1200    270MB/s  260MB/s   9500
Vertex 2 LE  SF-1500    270MB/s  250MB/s   ?
Vertex 2 Pro SF-1500    280MB/s  270MB/s   19000
Vertex 2 EX  SF-1500    280MB/s  270MB/s   25000

21.05.2010 17:09, Attila Mravik kirjoitti:>> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile
NAND
>> grid. Whether it respects or ignores the cache flush seems irrelevant.
>>
>> There has been previous discussion about this:
>> http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702
>>
>> "I''m pretty sure that all SandForce-based SSDs
don''t use DRAM as their
>> cache, but take a hunk of flash to use as scratch space instead. Which
>> means that they''ll be OK for ZIL use."
>>
>> Also:
>>
http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html
>>
>> "Another benefit of SandForce''s architecture is that the
SSD keeps
>> information on the NAND grid and removes the need for a separate cache
>> buffer DRAM module. The result is a faster transaction, albeit at the
>> expense of total storage capacity."
>>
>> "So if I interpret them correctly, what they chose to do with the
current
>> incarnation of the architecture is actually reserve some of the primary
>> memory capacity for I/O transaction management."
>>
>> "In plain English, if the system gets interrupted either by power
or by a
>> crash, when it initializes the next time, it can read from its
transaction
>> space and "resume" where it left off. This makes it
durable."
>>
>
> Here is a detailed explanation of the SandForce controllers:
>
http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal
>
> So the SF-1500 is enterprise class and relies on a supercap, the
> SF-1200 is consumer class and does not rely on a supercap.
>
> "The SF-1200 firmware on the other hand doesn?t assume the presence of
> a large capacitor to keep the controller/NAND powered long enough to
> complete all writes in the event of a power failure. As such it does
> more frequent check pointing and doesn?t guarantee the write in
> progress will complete before it?s acknowledged."
>
> As I understand it, the SF-1200 will ack the sync write only after it
> is written to flash thus reducing write performance.
>
> There is an interesting part about firmwares and OCZ having an
> exclusive firmware in the Vertex 2 series which based on the SF-1200
> but its random write IOPS is not capped at 10K (while other vendors
> and other SSDs from OCZ using the SF-1200 are capped, unless they sell
> the drive with the RC firmware which is for OEM evaluation and not
> production ready but does not contain the IOPS cap).
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bob Friesenhahn

2010-May-21 15:19 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

On Fri, 21 May 2010, Miika Vesti wrote:
> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND 
> grid. Whether it respects or ignores the cache flush seems irrelevant.
>
> There has been previous discussion about this: 
> http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702
>
> "I''m pretty sure that all SandForce-based SSDs don''t
use DRAM as their
> cache, but take a hunk of flash to use as scratch space instead. Which
> means that they''ll be OK for ZIL use."
>
> So, OCZ Vertex 2 seems to be a good choice for ZIL.
There seem to be quite a lot of blind assumptions in the above.  The 
only good choice for ZIL is when you know for a certainty and not 
assumptions based on 3rd party articles and blog postings.  Otherwise 
it is like assuming that if you jump through an open window that there 
will be firemen down below to catch you.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

David Dyer-Bennet

2010-May-21 16:45 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

On Fri, May 21, 2010 10:19, Bob Friesenhahn wrote:> On Fri, 21 May 2010, Miika Vesti wrote:
>
>> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile
>> NAND
>> grid. Whether it respects or ignores the cache flush seems irrelevant.
>>
>> There has been previous discussion about this:
>> http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702
>>
>> "I''m pretty sure that all SandForce-based SSDs
don''t use DRAM as their
>> cache, but take a hunk of flash to use as scratch space instead. Which
>> means that they''ll be OK for ZIL use."
>>
>> So, OCZ Vertex 2 seems to be a good choice for ZIL.
>
> There seem to be quite a lot of blind assumptions in the above.  The
> only good choice for ZIL is when you know for a certainty and not
> assumptions based on 3rd party articles and blog postings.  Otherwise
> it is like assuming that if you jump through an open window that there
> will be firemen down below to catch you.
Just how DOES one know something for a certainty, anyway?  I''ve seen
LOTS
of people mess up performance testing in ways that gave them very wrong
answers; relying solely on your own testing is as foolish as relying on a
couple of random blog posts.

To be comfortable (I don''t ask for "know for a certainty";
I''m not sure
that exists outside of "faith"), I want a claim by the manufacturer
and
multiple outside tests in "significant" journals -- which could be the
blog of somebody I trusted, as well as actual magazines and such. 
Ideally, certainly if it''s important, I''d then verify the
tests myself.

There aren''t enough hours in the day, so I often get by with less.

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Brandon High

2010-May-21 18:29 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

On Thu, May 20, 2010 at 2:23 PM, Miika Vesti <miika.vesti at trivore.com>
wrote:> "I''m pretty sure that all SandForce-based SSDs don''t
use DRAM as their
> cache, but take a hunk of flash to use as scratch space instead. Which
> means that they''ll be OK for ZIL use."
I''ve read conflicting reports that the controller contains a small
DRAM cache. So while it doesn''t rely on an external DRAM cache, it
does have one: http://www.legitreviews.com/article/1299/2/
"As we noted, the Vertex 2 doesn''t have any cache chips on it as
that
is because the SandForce controller itself is said to carry a small
cache inside that is a number of megabytes in size."
> "Another benefit of SandForce''s architecture is that the SSD
keeps
> information on the NAND grid and removes the need for a separate cache
> buffer DRAM module. The result is a faster transaction, albeit at the
> expense of total storage capacity."
Again, conflicting reports indicate otherwise.
http://www.legitreviews.com/article/1299/2/
"That adds up to 128GB of storage space, but only 93.1GB of it will be
usable space! The ''hidden'' capacity is used for wear leveling,
which
is crucial to keeping SSDs running as long as possible."

My understanding is that the controller contains enough cache to
buffer enough data to write a complete erase block size, eliminating
the need to read / erase / write that a partial block write entails.
It''s reported to do a copy-on-write, so it doesn''t need to do
a read
of existing blocks when making changes, which gives it such high iops
- Even random writes are turned into sequential writes (much like how
ZFS works) of entire erase blocks. The excessive spare area is used to
ensure that there are always full pages free to write to. (Some
vendors are releasing consumer drives with 60/120/240 GB, using 7%
reserved space rather than the 27% that the original drives ship
with.)

With an unexpected power loss, you could still lose any data that''s
cached in the controller, or any uncommitted changes that have been
partially written to the NAND

I hate having to rely on sites like Legit Reviews and Anandtech for
technical data, but there don''t seem to be non-fanboy sites doing
comprehensive reviews of the drives ...

-B

-- 
Brandon High : bhigh at freaks.com

Miles Nordin

2010-May-21 18:36 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

>>>>> "dd" == David Dyer-Bennet <dd-b at
dd-b.net> writes:
    dd> Just how DOES one know something for a certainty, anyway?

science.

Do a test like Lutz did on X25M G2.  see list archives 2010-01-10.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100521/7575baeb/attachment.bin>

Don

2010-May-21 18:48 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

> Now, if someone would make a Battery FOB, that gives broken SSD 60
> seconds of power, then we could use the consumer **** SSD''s in
servers
> again with real value instead of CYA value.You know- it would probably be sufficient to provide the SSD with _just_ a big
capacitor bank. If the host lost power it would stop writing and if the SSD
still had power it would probably use the idle time to flush it''s
buffers. Then there would be world peace!

Yeah- got a little carried away there. Still this seems like an experiment
I''m going to have to try on my home server out of curiosity more than
anything else :)
-- 
This message posted from opensolaris.org

Brandon High

2010-May-21 23:14 UTC

head link

[zfs-discuss] New SSD options

On Thu, May 20, 2010 at 8:46 PM, Don <don at blacksun.org>
wrote:> I''m kind of flabbergasted that no one has simply stuck a capacitor
on a more reasonable drive. I guess the market just isn''t big enough-
but I find that hard to believe.
I just spoke with a co-worker about doing something about it.

He says he can design a small in-line "UPS" that will deliver 20-30
seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50
in parts. It would be even less if only one voltage was needed. That
should be enough for most any SSD to finish any pending writes.

Any design that we come up with will be made publicly available under
a Creative Commons or other similar license.

-B

-- 
Brandon High : bhigh at freaks.com

Don

2010-May-22 00:31 UTC

head link

[zfs-discuss] New SSD options

> I just spoke with a co-worker about doing something about it.
> 
> He says he can design a small in-line "UPS" that will deliver
20-30
> seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50
> in parts. It would be even less if only one voltage was needed. That
> should be enough for most any SSD to finish any pending writes.Oh I wasn''t kidding when I said I was going to have to try this with my
home server. I actually do some circuit board design and this would be an
amusing project. All you probably need is 5v- I''ll look into it.
-- 
This message posted from opensolaris.org

Ian Collins

2010-May-22 00:37 UTC

head link

[zfs-discuss] New SSD options

On 05/22/10 12:31 PM, Don wrote:>> I just spoke with a co-worker about doing something about it.
>>
>> He says he can design a small in-line "UPS" that will deliver
20-30
>> seconds of 3.3V, 5V, and 12V to the SATA power connector for about $50
>> in parts. It would be even less if only one voltage was needed. That
>> should be enough for most any SSD to finish any pending writes.
>>      
> Oh I wasn''t kidding when I said I was going to have to try this
with my home server. I actually do some circuit board design and this would be
an amusing project. All you probably need is 5v- I''ll look into it.
>    Two Supercaps should do the trick.  Dive connectors only have 5 and 12v.

-- 
Ian.

Brandon High

2010-May-22 01:01 UTC

head link

[zfs-discuss] New SSD options

On Fri, May 21, 2010 at 5:31 PM, Don <don at blacksun.org>
wrote:> Oh I wasn''t kidding when I said I was going to have to try this
with my home server. I actually do some circuit board design and this would be
an amusing project. All you probably need is 5v- I''ll look into it.
The SATA power connector supplies 3.3, 5 and 12v. A "complete"
solution will have all three. Most drives use just the 5v, so you can
probably ignore 3.3v and 12v.

You''ll need to use a step up DC-DC converter and be able to supply ~
100mA at 5v. (I can''t find any specific numbers on power consumption.
Intel claims 75mW - 150mW for the X25-M. USB is rated at 500mA at 5v,
and all drives that I''ve seen can run in an un-powered USB case.)
It''s
actually easier/cheaper to use a LiPoly battery & charger and get a
few minutes of power than to use an ultracap for a few seconds of
power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you''ll need a
step up converter in either case.

If you''re supplying more than one voltage, you should use a
microcontroller to shut off all the charge pumps at once when the
battery / ultracap runs low. If you''re only supplying 5V, it
doesn''t
matter.

Cost for a 5v only system should be $30 - $35 in one-off
prototype-ready components with a 1100mAH battery (using prices from
Sparkfun.com), plus the cost for an enclosure, etc. A larger buy, a
custom PCB, and a smaller battery would probably reduce the cost
20-50%.

-B

-- 
Brandon High : bhigh at freaks.com

Don

2010-May-22 05:40 UTC

head link

[zfs-discuss] New SSD options

> The SATA power connector supplies 3.3, 5 and 12v. A "complete"
> solution will have all three. Most drives use just the 5v, so you can
> probably ignore 3.3v and 12v.I''m not interested in building something that''s going to work
for every possible drive config- just my config :) Both the Intel X25-e and the
OCZ only uses the 5V rail.
> You''ll need to use a step up DC-DC converter and be able to supply
~
> 100mA at 5v.
> It''s actually easier/cheaper to use a LiPoly battery & charger
and get a
> few minutes of power than to use an ultracap for a few seconds of
> power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you''ll
need a
> step up converter in either case.Ultracapacitors are available in voltage ratings beyond 12volts so there is no
reason to use a boost converter with them. That eliminates high frequency
switching transients right next to our SSD which is always helpful.

In this case- we have lots of room. We have a 3.5" x 1" drive bay, but
a 2.5" x 1/4" hard drive. There is ample room for several of the 6.3V
ELNA 1F capacitors (and our SATA power rail is a 5V regulated rail so they
should suffice)- either in series or parallel (Depending on voltage or runtime
requirements).
http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf 

You could 2 caps in series for better voltage tolerance or in parallel for
longer runtimes. Either way you probably don''t need a charge
controller, a boost or buck converter, or in fact any IC''s at all.
It''s just a small board with some caps on it.
> Cost for a 5v only system should be $30 - $35 in one-off
> prototype-ready components with a 1100mAH battery (using prices from
> Sparkfun.com),You could literally split a sata cable and add in some capacitors for just the
cost of the caps themselves. The issue there is whether the caps would present
too large a current drain on initial charge up- If they do then you need to add
in charge controllers and you''ve got the same problems as with a LiPo
battery- although without the shorter service life.

At the end of the day the real problem is whether we believe the drives
themselves will actually use the quiet period on the now dead bus to write out
their caches. This is something we should ask the manufacturers, and test for
ourselves.
-- 
This message posted from opensolaris.org

Ragnar Sundblad

2010-May-22 08:58 UTC

head link

[zfs-discuss] New SSD options

On 22 maj 2010, at 07.40, Don wrote:
>> The SATA power connector supplies 3.3, 5 and 12v. A
"complete"
>> solution will have all three. Most drives use just the 5v, so you can
>> probably ignore 3.3v and 12v.
> I''m not interested in building something that''s going to
work for every possible drive config- just my config :) Both the Intel X25-e and
the OCZ only uses the 5V rail.
> 
>> You''ll need to use a step up DC-DC converter and be able to
supply ~
>> 100mA at 5v.
>> It''s actually easier/cheaper to use a LiPoly battery &
charger and get a
>> few minutes of power than to use an ultracap for a few seconds of
>> power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so you''ll
need a
>> step up converter in either case.
> Ultracapacitors are available in voltage ratings beyond 12volts so there is
no reason to use a boost converter with them. That eliminates high frequency
switching transients right next to our SSD which is always helpful.
> 
> In this case- we have lots of room. We have a 3.5" x 1" drive
bay, but a 2.5" x 1/4" hard drive. There is ample room for several of
the 6.3V ELNA 1F capacitors (and our SATA power rail is a 5V regulated rail so
they should suffice)- either in series or parallel (Depending on voltage or
runtime requirements).
> http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf 
> 
> You could 2 caps in series for better voltage tolerance or in parallel for
longer runtimes. Either way you probably don''t need a charge
controller, a boost or buck converter, or in fact any IC''s at all.
It''s just a small board with some caps on it.
I know they have a certain internal resistance, but I am not familiar
with the characteristics; is it high enough so you don''t need to
limit the inrush current, and is it low enough so that you don''t need
a voltage booster for output?
>> Cost for a 5v only system should be $30 - $35 in one-off
>> prototype-ready components with a 1100mAH battery (using prices from
>> Sparkfun.com),
> You could literally split a sata cable and add in some capacitors for just
the cost of the caps themselves. The issue there is whether the caps would
present too large a current drain on initial charge up- If they do then you need
to add in charge controllers and you''ve got the same problems as with a
LiPo battery- although without the shorter service life.
> 
> At the end of the day the real problem is whether we believe the drives
themselves will actually use the quiet period on the now dead bus to write out
their caches. This is something we should ask the manufacturers, and test for
ourselves.
Indeed!

/ragge

taemun

2010-May-22 13:38 UTC

head link

[zfs-discuss] New SSD options

Basic electronics, go!

The linked capacitor from Elna (
http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf) has an
internal resistance of 30 ohms.

Intel rate their 32GB X25-E at 2.4W active (we aren''t interested in
idle
power usage, if its idle, we don''t need the capacitor in the first
place) on
the +5V rail, thats 0.48A. (P=VI)

V=IR, supply is 5V, current through load is 480mA, hence R=10.4 ohms.
The resistance of the X25-E under load is 10.4 ohms.

Now if you have a capacitor discharge circuit with the charged Elna
DK-6R3D105T - the largest and most suitable from that datasheet - you have
40.4 ohms around the loop (cap and load). +5V over 40.4 ohms. The maximum
current you can pull from that is I=V/R = 124mA. Around a quarter what the
X25-E wants in order to write.

The setup won''t work.

I''d suggest something more along the lines of:
http://www.cap-xx.com/products/products.htm
Which have an ESR around 3 orders of magnitude lower.

t

On 22 May 2010 18:58, Ragnar Sundblad <ragge at csc.kth.se> wrote:
>
> On 22 maj 2010, at 07.40, Don wrote:
>
> >> The SATA power connector supplies 3.3, 5 and 12v. A
"complete"
> >> solution will have all three. Most drives use just the 5v, so you
can
> >> probably ignore 3.3v and 12v.
> > I''m not interested in building something that''s
going to work for every
> possible drive config- just my config :) Both the Intel X25-e and the OCZ
> only uses the 5V rail.
> >
> >> You''ll need to use a step up DC-DC converter and be able
to supply ~
> >> 100mA at 5v.
> >> It''s actually easier/cheaper to use a LiPoly battery
& charger and get a
> >> few minutes of power than to use an ultracap for a few seconds of
> >> power. Most ultracaps are ~ 2.5v and LiPoly is 3.7v, so
you''ll need a
> >> step up converter in either case.
> > Ultracapacitors are available in voltage ratings beyond 12volts so
there
> is no reason to use a boost converter with them. That eliminates high
> frequency switching transients right next to our SSD which is always
> helpful.
> >
> > In this case- we have lots of room. We have a 3.5" x 1"
drive bay, but a
> 2.5" x 1/4" hard drive. There is ample room for several of the
6.3V ELNA 1F
> capacitors (and our SATA power rail is a 5V regulated rail so they should
> suffice)- either in series or parallel (Depending on voltage or runtime
> requirements).
> > http://www.elna.co.jp/en/capacitor/double_layer/catalog/pdf/dk_e.pdf
> >
> > You could 2 caps in series for better voltage tolerance or in parallel
> for longer runtimes. Either way you probably don''t need a charge
controller,
> a boost or buck converter, or in fact any IC''s at all.
It''s just a small
> board with some caps on it.
>
> I know they have a certain internal resistance, but I am not familiar
> with the characteristics; is it high enough so you don''t need to
> limit the inrush current, and is it low enough so that you don''t
need
> a voltage booster for output?
>
> >> Cost for a 5v only system should be $30 - $35 in one-off
> >> prototype-ready components with a 1100mAH battery (using prices
from
> >> Sparkfun.com),
> > You could literally split a sata cable and add in some capacitors for
> just the cost of the caps themselves. The issue there is whether the caps
> would present too large a current drain on initial charge up- If they do
> then you need to add in charge controllers and you''ve got the same
problems
> as with a LiPo battery- although without the shorter service life.
> >
> > At the end of the day the real problem is whether we believe the
drives
> themselves will actually use the quiet period on the now dead bus to write
> out their caches. This is something we should ask the manufacturers, and
> test for ourselves.
>
> Indeed!
>
> /ragge
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100522/9975be7f/attachment.html>

Bob Friesenhahn

2010-May-22 14:41 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

On Fri, 21 May 2010, David Dyer-Bennet wrote:>
> To be comfortable (I don''t ask for "know for a
certainty"; I''m not sure
> that exists outside of "faith"), I want a claim by the
manufacturer and
> multiple outside tests in "significant" journals -- which could
be the
> blog of somebody I trusted, as well as actual magazines and such.
> Ideally, certainly if it''s important, I''d then verify the
tests myself.
For me, "know for a certainty" means that the feature is clearly 
specified in the formal specification sheet for the product, and the 
vendor has historically published reliable specification sheets. 
This may not be the same as money in the bank, but it is better than 
relying on thoughts from some blog posting.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2010-May-22 14:50 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

On Fri, 21 May 2010, Brandon High wrote:>
> My understanding is that the controller contains enough cache to
> buffer enough data to write a complete erase block size, eliminating
> the need to read / erase / write that a partial block write entails.
> It''s reported to do a copy-on-write, so it doesn''t need
to do a read
> of existing blocks when making changes, which gives it such high iops
> - Even random writes are turned into sequential writes (much like how
> ZFS works) of entire erase blocks. The excessive spare area is used to
> ensure that there are always full pages free to write to. (Some
> vendors are releasing consumer drives with 60/120/240 GB, using 7%
> reserved space rather than the 27% that the original drives ship
> with.)
FLASH is useless as working space since it does not behave like RAM so 
every SSD needs to have some RAM for temporary storage of data.  This 
COW approach seems nice except that it would appear to inflate 
performance by only considering a specific magic block size and 
alignment.  Other block sizes and alignments would require that 
existing data be read so that the new block content can be 
constructed.  Also, the blazing fast write speed (which depends on 
plenty of already erased blocks) would stop once the spare space in 
the SSD has been consumed.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2010-May-22 15:00 UTC

head link

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

On Fri, 21 May 2010, Don wrote:
> You know- it would probably be sufficient to provide the SSD with 
> _just_ a big capacitor bank. If the host lost power it would stop 
> writing and if the SSD still had power it would probably use the 
> idle time to flush it''s buffers. Then there would be world peace!
This makes the assumption that an SSD will want to flush its write 
cache as soon as possible rather than just letting it sit there 
waiting for more data.  This is probably not a good assumption.  If 
the OS sends 512 bytes of data but the SSD block size is 4K, it is 
reasonable for the SSD to wait for 3584 more contiguous bytes of data 
before it bothers to write anything.

Writes increase the wear on the flash and writes require a slow erase 
cycle so it is reasonable for SSDs to buffer as much data in their 
write cache as possible before writing anything.  An advanced SSD 
could write non-contiguous sectors in a SSD page and then use a sort 
of lookup table to know where the sectors actually are.  Regardless, 
under slow write conditions, it is is definitely valuable to buffer 
the data for a while in the hope that more related data will appear, 
or the data might even be overwritten.

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2010-May-22 15:43 UTC

head link

[zfs-discuss] New SSD options

On Fri, 21 May 2010, Don wrote:
> You could literally split a sata cable and add in some capacitors 
> for just the cost of the caps themselves. The issue there is whether 
> the caps would present too large a current drain on initial charge 
> up- If they do then you need to add in charge controllers and
you''ve
> got the same problems as with a LiPo battery- although without the 
> shorter service life.
Electricity does run both directions down a wire and the capacitor 
would look like a short circuit to the supply when it is first turned 
on.  You would need some circuitry which delays applying power to the 
drive before the capacitor is sufficiently charged, and some circuitry 
which shuts off the flow of energy back into the power supply when the 
power supply shuts off (could be a silicon diode if you don''t mind the 
0.7 V drop).

Bob
-- 
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Haudy Kazemi

2010-May-22 19:45 UTC

head link

[zfs-discuss] New SSD options

Bob Friesenhahn wrote:> On Fri, 21 May 2010, Don wrote:
>
>> You could literally split a sata cable and add in some capacitors for 
>> just the cost of the caps themselves. The issue there is whether the 
>> caps would present too large a current drain on initial charge up- If 
>> they do then you need to add in charge controllers and you''ve
got the
>> same problems as with a LiPo battery- although without the shorter 
>> service life.
>
> Electricity does run both directions down a wire and the capacitor 
> would look like a short circuit to the supply when it is first turned 
> on.  You would need some circuitry which delays applying power to the 
> drive before the capacitor is sufficiently charged, and some circuitry 
> which shuts off the flow of energy back into the power supply when the 
> power supply shuts off (could be a silicon diode if you don''t mind
the
> 0.7 V drop).
>
> Bob
You can also use an appropriately wired field effect transistor (FET) / 
MOSFET of sufficient current carrying capacity as a one-way valve 
(diode) that has minimal voltage drop.
More:
http://electronicdesign.com/article/power/fet-supplies-low-voltage-reverse-polarity-protecti.aspx
http://www.electro-tech-online.com/general-electronics-chat/32118-using-mosfet-diode-replacement.html

In regard to how long do you need to continue supplying power...that 
comes down to how long does the SSD wait before flushing cache to 
flash.  If you can identify the maximum write cache flush interval, and 
size the battery or capacitor to exceed that maximum interval, you 
should be okay.  The maximum write cache flush interval is determined by 
a timer that says something like "okay, we''ve waited 5 seconds for
additional data to arrive to be written.  None has arrived in the last 5 
seconds, so we''re going to write what we already have to better ensure 
data integrity, even though it is suboptimal from a absolute performance 
perspective."  In conventional terms of filling city buses...the bus 
leaves when it is full of people, or 15 minutes has passed since the 
last bus left.

Does anyone know if there is a way to directly or indirectly measure the 
write caching flush interval?  I know cache sizes can be found via 
performance testing, but what about write intervals?

Miles Nordin

2010-May-24 18:09 UTC

head link

[zfs-discuss] New SSD options

>>>>> "d" == Don <don at blacksun.org> writes:
>>>>> "hk" == Haudy Kazemi <kaze0010 at umn.edu>
writes:
d> You could literally split a sata cable and add in some
d> capacitors for just the cost of the caps themselves.

no, this is no good. The energy only flows in and out of the
capacitor when the voltage across it changes. In this respect they
are different from batteries. It''s normal to use (non-super)
capacitors as you describe for filters next to things drawing power in
a high-frequency noisy way, but to use them for energy storage across
several seconds you need a switching supply to drain the energy from
it. the step-down and voltage-pump kinds of switchers are
non-isolated and might do fine, and are cheaper than full-fledged
DC-DC that are isolated (meaning the input and output can float wrt
each other).

you can charge from 12V and supply 5V if that''s cheaper. :)

hope it works.

hk> "okay, we''ve waited 5 seconds for additional data to
arrive to
hk> be written. None has arrived in the last 5 seconds, so
we''re
hk> going to write what we already have to better ensure data
hk> integrity,

yeah, I am worried about corner cases like this. ex: input power to
the SSD becomes scratchy or sags, but power to the host and controller
remain fine. Writes arrive continuously. The SSD sees nothing wrong
with its power and continues to accept and acknowledge writes.
Meanwhile you burn through your stored power hiding the sagging supply
until you can''t, then the SSD loses power suddenly and drops a bunch
of writes on the floor. That is why I drew that complicated state
diagram in which the pod disables and holds-down the SATA connection
once it''s running on reserve power. Probably y''all
don''t give a fuck
about such corners though, nor do many of the manufacturers selling
this stuff, so, whatever.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100524/598363c2/attachment.bin>

Ray Van Dolson

2010-May-24 18:30 UTC

head link

[zfs-discuss] New SSD options

This thread has grown giant, so apologies for screwing up threading
with an out of place reply. :)

So, as far as SF-1500 based SSD''s, the only ones currently in existence
are the Vertex 2 LE and Vertex 2 EX, correct (I understand the Vertex 2
Pro was never mass produced)?

Both of these are based on MLC and not SLC -- why isn''t that an issue
for longevity?

Any other SF-1500 options out there?

We continue to use UPS-backed Intel X-25E''s for ZIL.

Ray

Ray Van Dolson

2010-May-24 18:33 UTC

head link

[zfs-discuss] New SSD options

On Mon, May 24, 2010 at 11:30:20AM -0700, Ray Van Dolson
wrote:> This thread has grown giant, so apologies for screwing up threading
> with an out of place reply. :)
> 
> So, as far as SF-1500 based SSD''s, the only ones currently in
existence
> are the Vertex 2 LE and Vertex 2 EX, correct (I understand the Vertex 2
> Pro was never mass produced)?
> 
> Both of these are based on MLC and not SLC -- why isn''t that an
issue
> for longevity?
> 
> Any other SF-1500 options out there?
> 
> We continue to use UPS-backed Intel X-25E''s for ZIL.
>From earlier in the thread, it sounds like none of the SF-1500 baseddrives even have a supercap, so it doesn''t seem that they''d
necessarily
be a better choice than the SLC-based X-25E at this point unless you
need more write IOPS...

Ray

Thomas Burgess

2010-May-24 22:23 UTC

head link

[zfs-discuss] New SSD options

>
>
>
> From earlier in the thread, it sounds like none of the SF-1500 based
> drives even have a supercap, so it doesn''t seem that
they''d necessarily
> be a better choice than the SLC-based X-25E at this point unless you
> need more write IOPS...
>
> Ray
>
I think the upcoming OCZ Vertex 2 Pro will have a supercap.

I just bought a ocz vertex le, it doesn''t have a supercap but it DOES
have
some awesome specs otherwise..
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100524/9a499b15/attachment.html>

Fred Liu

2010-May-25 01:30 UTC

head link

[zfs-discuss] aliase for MPxIO path

Hi,

   1): Is it possible to do it?
   2): What is the backplane hardware requirement for "luxadm
led_blink" to work  to bring Disk LED to the Blink Mode.

Thanks.

Fred



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100524/5bb0eaaa/attachment.html>

zfs discuss - May 2010 - New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] Interesting experience with Nexenta - anyone seen it?

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] New SSD options

[zfs-discuss] aliase for MPxIO path