Jeff Bacon
2012-Apr-09 13:23 UTC
[zfs-discuss] What''s wrong with LSI 3081 (1068) + expander + (bad) SATA
> Out of curiosity, are there any third-party hardware vendors > that make server/storage chassis (Supermicro et al) who make > SATA backplanes with the SAS interposers soldered on?There doesn''t seem to be much out there, though I haven''t looked.> Would that make sense, or be cheaper/more reliable than > having extra junk between the disk and backplane connectors? > (if I correctly understand what the talk is about? ;)Honestly, probably not. Either the expander chip properly handles SATA-tunneling, or it doesn''t. Shoving an interposer in just throws a band-aid on the problem IMO.> ZFS was very attractive at first because of the claim that > "it returns Inexpensive into raId" and can do miracles > with SATA disks. Reality has shown to many of us that > many SATA implementations existing in the wild should > be avoided... so we''re back to good vendors'' higher end > expensive SATAs or better yet SAS drives. Not inexpensive > anymore again :(Many != all. Not that I''ve tried a whole bunch of them, mind you.However, I''ve found all of the SuperMicro SAS1 backplanes to be somewhat problematic with SATA drives, especially if you use the 1068-based controllers. It was horrible with the really old single-digit-Phase firmware. I find it... acceptable...with 2008-based controllers. I''ve finally settled on having one box based on 1068s (3081s) and I think it''s up to 4 or 5 expanders of 16 1TB drives. Basically, the box hangs when certain drives die in certain ways - it eventually gets over it, mostly, but it can hang for a bit. I might see the occasional "hang until you yank the bad disk", but drives don''t die THAT often - even 3yr-old-seagate-cudas. Granted, most all the firmware on the 333ASes has finally been updated to the CC1H version. I might note that that box represents most of the collection of 1068+SAS1-based expanders that I have. It''s an archival system that doesn''t do much at all (well, the CPU pounds like hell running rtgpoll but that''s a different matter having nothing to do with the ZFS pools). I also have a small pile of leftover 3081s and 3041s if anyone''s interested. :) Now, I suspect that there is improved LSI firmware available for the SAS1 expander chips. I could go chasing after it - SMC doesn''t have it public, but LSI probably has it somewhere and I know an expert I could ask to go through and tweak my controllers. However, I hadn''t met him 3 years ago, and now it just isn''t worth my time (or worth paying him to do it). On the other hand, I have two-hands-worth of CSE847-E26-RJBOD1s stuffed with ''cuda 2T and 3T SATA drives, connected to 9211-8es running the phase-10 firmware. One box is up to 170TB worth. It''s fine. Nary an issue. Granted, I''m not beating the arrays to death - again, that''s not what it''s for, it''s there to hang onto a bunch of data. But it does get used, and I''m writing 200-300GB/day to it. I have another such JBOD attached to a box with a pile of constellations, and it causes no issues. Frankly, I would say that yes ZFS _does_ do miracles with inexpensive disks. I can trust 100s of TB to it and not worry about losing any of it. But it wasn''t written by Jesus; it can''t turn water into wine or deal with all of the terrible variations of Crap out there into enterprise-level replacements for your EMC arrays. Nor can it cope with having Any Old Random Crap you have laying around thrown at it - but it does surprisingly well, IMO. My home box is actually just a bunch of random drives tied onto 3 3041 controllers on an old overclocked Q6600 on an ASUS board and it''s never had a problem, not ever.> So, is there really a fundamental requirement to avoid > cheap hardware, and are there no good ways to work around > its inherently higher instability and lack of dependability? > > Or is it just a harder goal (indefinitely far away on the > project roadmap)?ZFS is no replacement for doing your research, homework, and testing. God only knows I''ve gone through some crap - I had a 20pk of WD 2TB Blacks that I bought that turned out to work for $*%$&. I suppose with enough patience and effort I could have made them work, but Seagate''s firmware has just simply been more reliable, and the contents of that box have filtered their way into desktops. (Some of which are in that home machine mentioned above - attached directly to the controller, no problems at all.) If you''re going to do BYO for your enterprise needs, be prepared to fork over the additional cash you are going to need to spend on test kit - defined both as kit to test on, and kit you buy, test, and pitch because "it don''t work like Vendor said" or "A don''t play nice with B". Which sometimes is much cheaper than fighting with Vendor A and Vendor B about why they can''t work together. Not to mention the R&D time. I don''t avoid cheap hardware. Any number of my colleagues would say I am insane for running what they would say should be being run on EMC or NetApp on a handful of Solaris 10 fileservers built off raw SuperMicro boxes. But we can beat the living * out of these boxes and they work just fine. But there''s cheap, and Cheap. ZFS can perform miracles, but only about so many - at least not without work. And unfortunately as well, the work required to make it cope with Really Really Cheap S*** is probably somewhat orthogonal to the desires of paying users, for the perhaps obvious reason that it''s often enough cheaper to cough up the $ for the slightly-better hardware than to pay for the people-time to write the solution. Further, I might suggest that the solution already _has_ been written - in the form of meta-layers such as Ceph or GlusterFS, where the fundamental instabilities are simply compensated for through duplication. -bacon