Ellis, Mike
2007-May-30 03:29 UTC
[zfs-discuss] Re: ZFS - Use h/w raid or not?Thoughts.Considerations.
Hey Richard, thanks for sparking the conversation... This is a very interesting topic.... (especially if you take it out of the HPC "we need 1000 servers to have this minimal boot image" space into general purpose/enterprise computing) -- Based on your earlier note, it appears you''re not planning to use cheapo "free after rebate" CF cards :-) (The cheap-ones would probably be perfect for ZFS a-la cheap-o-JBOD). Having boot disks mirrored across controllers has had sys-admins sleep better over the years (especially in FC-loop-cases with both drives on the same loop... Sigh....). If the USB-bus one might hang these fancy FC-cards on is robust enough then perhaps a single "battle hardened" CF-card will suffice... (although zfs ditto-blocks or some form of protection might still be considered a good thing?).... Having 2 cards would certainly make the "unlikely replacement" of a card a LOT more straight-forward than a single-card failure... Much of this would depend on the quality of these CF-cards and how they put up under load/stress/time.... -- If we''re going down this CF-boot path, many of us are going to have to re-think our boot-environment quite a bit. We''ve been "spoiled" with 36+ GB mirrored-boot drives for some time now.... (if you do a lot of PATCHING, you''ll find that even those can get tight.... But that''s a discussion for a different day) I don''t think most enterprise "boot disk layouts" are going to fit (even unmirrored) onto a single 4GB CF-card. So we''ll have to play some games where we start splitting off /opt, /var, (which is fairly read-write intensive when you have process-accounting etc. running) onto some "other" non-CF filesystem.... (likely a SAN of some variety). At some point the hackery a 4GB CF-card is going to force us to do, is going to become more complex than just biting the bullet and doing a full multipath-ed SAN-boot & calling it a day. (or perhaps some future iSCSI/NFS boot for the SAN-averse) Seriously though... If (say in some HPC/grid space?) you can stick your ENTIRE boot environment onto a 4GB CF-card, why not just do the SAN, NFS/iSCSI boot thing instead? (what ever happened to: http://blogs.sun.com/dweibel/entry/sprint_snw_2006#comments ) -- But lets explore the CF thing some more... There is something there, although I think Sun might have to provide some best-practices/suggestions as to how customers that don''t run a minimum-config-no-local-apps, pacct, monitoring, etc. solaris environment are best to use something like this. Use it as a pivot boot onto the real root-image? That would delegate the CF-card to little more than a "rescue/utility" image.... Kinda cool, but not earth-shattering I would think.... (especially for those already utilizing wanboot for such purposes) -- Splitting off /var and friends from the boot environment (and still packing the boot env say on a ditto-block 4GB FC card) is still going to leave a pretty tight boot env. Obviously you want to be able to do some fancy live-upgrade stuff in this space too, and all of a sudden a single 4GB flash-card "don''t look so big" anymore.... 2 of them, with some ZFS (and compression?) or even SDS mirroring between them would possibly go a long way to make replacement easier, give you redundancy (zfs/sds mirrors), some wiggle-room for live-upgrade scenarios, and who knows what else. Still tight though.... -- If it''s a choice between 1-CF or NONE, we''ll take 1-CF I guess.... Fear of the unknown (and field data showing how these guys hold up over time) would really determine uptake I guess. (( as you said, real data regarding these specialized CF-cards will be required... Is it going to vary greatly from vendor to vendor? Usecase to usecase? I''m not looking forward to blazing the trail here.... Something doesn''t seem right, especially without the safety net of a mirrored environment... But maybe that''s just old-school sys-admin superstition.... Lets get some data, set me straight...)) -- Right now we can stick 4x 4GB memory sticks into a x4200 (creating a cactus looking device :-) A single built-in CF is obviously cleaner/safer, but also somewhat limiting in terms of redundancy or even just capacity. Has anyone considered taking say 2x 4G CF cards, and sticking them inside one of the little sas-drive-enclosures? Customers could purchase upto 4 of those for certain servers, (t2000/x4200 etc.) and treat these as if they were really fast, lower-power/heat, (never fails no need to mirror?) ~9GB drives. In the long-run, is that "easier" and more flexible? -- It would be really interesting to hear how others out there might try to use a CF-boot-option in their environment. Good thread, lets bat this around some more. -- MikeE -----Original Message----- From: Richard.Elling at Sun.COM [mailto:Richard.Elling at Sun.COM] Sent: Tuesday, May 29, 2007 9:48 PM To: Ellis, Mike Cc: Carson Gaspar; zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Re: ZFS - Use h/w raid or not?Thoughts.Considerations. Ellis, Mike wrote:> Also the "unmirrored memory" for the rest of the system has ECC and > ChipKill, which provides at least SOME protection against random > bit-flips.CF devices, at least the ones we''d be interested in, do have ECC as well as spare sectors and write verification. Note: flash memories do not suffer from the same radiation-based bit-flip mechanisms as DRAMS or SRAMS. The main failure mode we worry about is write endurance.> Question: It appears that CF and friends would make a descentlive-boot> (but don''t run on me like I''m a disk) type of boot-media due to the > limited write/re-write limitations of flash-media. (at least the > non-exotic type of flash-media)Where we see current use is for boot devices, which have the expectation of read-mostly workloads. The devices also implement wear leveling.> Would something like future zfs-booting on a pair of CF-devices > reduce/lift that limitation? (does the COW nature of ZFS automatically > spread WRITES across the entire CF device?) [[ is tmp-fs/swap going to > remain a problem till zfs-swap adds some COW leveling to theswap-area?> ]]The belief is that COW file systems which implement checksums and data redundancy (eg, ZFS and the ZFS copies option) will be redundant over CF''s ECC and wear leveling *at the block level.* We believe ZFS will excel in this area, but has limited bootability today. This will become more interesting over time, especially when ZFS boot is ubiquitous. As for swap, it is a good idea if you are sized such that you don''t need to physically use swap. Most servers today are in this category. Actually, most servers today have much more memory than would fit in a reasonably priced CF, so it might be a good idea to swap elsewhere. In other words, it is more difficult to build the (technical) case for redundant CFs for boot than it is for disk drives. Real data would be greatly appreciated. -- richard