With all the recent discussion of SSD''s that lack suitable power-failure cache protection, surely there''s an opportunity for a separate modular solution? I know there used to be (years and years ago) small internal UPS''s that fit in a few 5.25" drive bays. They were designed to power the motherboard and peripherals, with the advantage of simplicity and efficiency that comes from being behind the PC PSU and working entirely on DC. Something similar in a smaller form factor, similar to the drive bay sleds that mount one or two 2.5" disks in a 3.5" (or even 5.25") bay, with a small and simple power storage and circuit, would be great. Alternately, something that took up a drive bay and provided power for multiple disks in other bays, though that might be messier for cabling. It wouldn''t need to hold power long. We could then use any SSD selected on other design and performance and price criteria. Does anyone know of such a device being made and sold? Feel like designing and marketing one, or publising the design? -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100112/1ead9193/attachment.bin>
On 11-Jan-10, at 5:59 PM, Daniel Carosone wrote:> With all the recent discussion of SSD''s that lack suitable > power-failure cache protection, surely there''s an opportunity for a > separate modular solution? > > I know there used to be (years and years ago) small internal UPS''s > that fit in a few 5.25" drive bays. They were designed to power the > motherboard and peripherals, with the advantage of simplicity and > efficiency that comes from being behind the PC PSU and working > entirely on DC. > ... > Does anyone know of such a device being made and sold? Feel like > designing and marketing one, or publising the design?FWIW I think Google server farm uses something like this. --Toby> > -- > Dan._______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Jan 11, 2010, at 19:00, Toby Thain wrote:> On 11-Jan-10, at 5:59 PM, Daniel Carosone wrote: > >> Does anyone know of such a device being made and sold? Feel like >> designing and marketing one, or publising the design? > > FWIW I think Google server farm uses something like this.It looks slightly "ghetto", but it seems like it works for them: blogs.sun.com/geekism/entry/holy_battery_backup_batman tinyurl.com/cpt4yq arstechnica.com/hardware/news/2009/04/the-beast-unveiled-inside-a-google-server.ars
> [google server with batteries]These are cool, and a clever rethink of the typical data centre power supply paradigm. They keep the server running, until either a generator is started or a graceful shutdown can be done. Just to be clear, I''m talking about something much smaller, that provides power only for drives, for a few moments after the host powers down (for whatever reason) to let the drives sync their caches safely. Basically, just wrapping the drive with the supercap (or equivalent) the manufacturer didn''t include, plus whatever minimal power supply circuitry is needed (to avoid big inrush recharge currents on startup, to avoid sending power back out into the rest of the case, etc). Because there''s no integration for an emergency "sync now!" signal, we have to rely on timeouts and wait "long enough" for the cache to be sync''ed. It might be larger and need to hold longer than an on-board supercap, but not very long in absolute terms. There seems to be lots of room for a comfortable niche in the gap between common commodity hardware (that would be plenty good enough otherwise) and the $5k F20''s and LogZilla''s and similar. -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100112/ab4de09f/attachment.bin>
Actually for the ZIL you may use the a-card (memory sata disk + bbu + compact flash write out). For the data disks there is no solution yet - would be nice. However I prefer the "supercapacitor on disk" method. Why ? because the recharge logic is chellenging. There needs to be communication between the disk and the power supply. The interesting case is "fluctucating power" (see below) and battery maintenance. If you are charged you can run fine, but the corner cases are tricky. Image the following scenario: 1) Operations: Normal 2) Power outage: 1 hour 3) UPS failing after 30 minutes 4) Power comes back 5) ALL Servers power on at the same time (e.g. misconfiguration) 6) Peak -> Power goes down again At 3) your batteries are empty. At 6) your batteries are not fully charged, however because the device does not know the "status" of the local UPS, write cache is still enabled. Thus a simple design does not solve the problem well (eneugh). Another thing is maintenance of a battery. You have to check if your battery still works (charge cycle). You have to alarm if not (monitoring). You have to replace them online then. So in general - batteries are bad if your server lifes longer then 3 years :) For google it works fine, maybe because the server will life < 3 years anyhow and because they can "jus treplace" the server due to their internal redundancy options (google backend technology is designed to handle failure well). For a storage system I don''t see that. The BBU / Capcitor needs to implement the same logic a Raid BBU implements. If (not_working_fully(BBU)) { disable_write_cache(); } else { enable_write_cache(); } Or better (explict state whitelisting guaranteeing data integrity also for unexpected states): If (working_fully(BBU)) { enable_write_cache(); } else { disable_write_cache(); } p.s. While writing this I''m thinking if a-card handles this case well ? ... maybe not. -- This message posted from opensolaris.org
On Mon, Jan 11, 2010 at 10:10:37PM -0800, Lutz Schumann wrote:> p.s. While writing this I''m thinking if a-card handles this case well ? ... maybe not.apart from the fact that they seem to be hard to source, this is a big question about this interesting device for me too. I hope so, since it should be simpler to get right (and test) than all the complexity of a block-remapping flash device with (effectively) an internal filesystem. Really, this all comes down to the fact that there''s still apparently not a good answer to the very frequently asked question of "what''s a good slog device?" Of course, "good" varies by circumstance[*], but having a range of solutions means more chance of fitting a range of circumstances. There are several circumstances currently lacking a good solution, it seems. -- Dan. [*] I have slog and l2arc on partitions of generic 5400rpm sata disks, together with rpool, to great effect -- because in that case the raidz data pool is 4 disks behind a single usb port. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100112/e2570270/attachment.bin>