On Jun 28, 2019, at 8:46 AM, Blake Hudson <blake at ispn.net> wrote:> > Linux software RAID?has only decreased availability for me. This has been due to a combination of hardware and software issues that are are generally handled well by HW RAID controllers, but are often handled poorly or unpredictably by desktop oriented hardware and Linux software.Would you care to be more specific? I have little experience with software RAID, other than ZFS, so I don?t know what these ?issues? might be. I do have a lot of experience with hardware RAID, and the grass isn?t very green on that side of the fence, either. Some of this will repeat others? points, but it?s worth repeating, since it means they?re not alone in their pain: 0. Hardware RAID is a product of the time it was produced. My old parallel IDE and SCSI RAID cards are useless because you can?t get disks with that port type any more; my oldest SATA and SAS RAID cards can?t talk to disks bigger than 2 TB; and of those older hardware RAID cards that still do work, they won?t accept a RAID created by a controller of another type, even if it?s from the same company. (Try attaching a 3ware 8000-series RAID to a 3ware 9000-series card, for example.) Typical software RAID never drops backwards compatibility. You can always attach an old array to new hardware. Or even new arrays to old hardware, within the limitations of the hardware, and those limitations aren?t the software RAID?s fault. 1. Hardware RAID requires hardware-specific utilities. Many hardware RAID systems don?t work under Linux at all, and of of those that do, not all provide sufficiently useful Linux-side utilities. If you have to reboot into the RAID BIOS to fix anything, that?s bad for availability. 2. The number of hardware RAID options is going down over time. Adaptec?s almost out of the game, 3ware was bought by LSI and then had their products all but discontinued, and most of the other options you list are rebadged LSI or Adaptec. Eventually it?s going to be LSI or software RAID, and then LSI will probably get out of the game, too. This market segment is dying because software RAID no longer has any practical limitations that hardware can fix. 3. When you do get good-enough Linux-side utilities, they?re often not well-designed. I don?t know anyone who likes the megaraid or megacli64 utilities. I have more experience with 3ware?s tw_cli, and I never developed facility with it beyond pidgin, so that to do anything even slightly uncommon, I have to go back to the manual to piece the command together, else risk roaching the still-working disks. By contrast, I find the zfs and zpool commands well-designed and easy to use. There?s no mystery why that should be so: hardware RAID companies have their expertise in hardware, not software. Also, ?man zpool? doesn?t suck. :) That coin does have an obverse face, which is that young software RAID systems go through a phase where they have to re-learn just how false, untrustworthy, unreliable, duplicitous, and mendacious the underlying hardware can be. But that expertise builds up over time, so that a mature software RAID system copes quite well with the underlying hardware?s failings. The inverse expertise in software design doesn?t build up on the hardware RAID side. I assume this is because they fire the software teams once they?ve produced a minimum viable product, then re-hire a new team when their old utilities and monitoring software gets so creaky that it has to be rebuilt from scratch. Then you get a *new* bag of ugliness in the world. Software RAID systems, by contrast, evolve continuously, and so usually tend towards perfection. The same problem *can* come up in the software RAID world: witness how much wheel reinvention is going on in the Stratis project! The same amount of effort put into ZFS would have been a better use of everyone?s time. That option doesn?t even exist on the hardware RAID side, though. Every hardware RAID provider must develop their command line utilities and monitoring software de novo, because even if the Other Company open-sourced its software, that other software can?t work with their proprietary hardware. 4. Because hardware RAID is abstracted below the OS layer, the OS and filesystem have no way to interact intelligently with it. ZFS is at the pinnacle of this technology here, but CentOS is finally starting to get this through Stratis and the extensions Stratis has required to XFS and LVM. I assume btrfs also provides some of these benefits, though that?s on track to becoming off-topic here. ZFS can tell you which file is affected by a block that?s bad across enough disks that redundancy can?t fix it. This gives you a new, efficient, recovery option: restore that file from backup or delete it, allowing the underlying filesystem to rewrite the bad block on all disks. With hardware RAID, fixing this requires picking one disk as the ?real? copy and telling the RAID card to blindly rewrite all the other copies. Another example is resilvering: because a hardware RAID has no knowledge of the filesystem, a resilver during disk replacement requires rewriting the entire disk, which takes 8-12 hours these days. If the volume has a lot of free space, a filesystem-aware software RAID resilver can copy only the blocks containing user data, greatly reducing recovery time. Anecdotally, I can tell you that the ECCs involved in NAS-grade SATA hardware aren?t good enough on their own. We had a ZFS server that would detect about 4-10 kB of bad data on one disk in the pool during every weekend scrub. We never figured out whether the problem was in the disk, its drive cage slot, or its cabling, but it was utterly repeatable. But also utterly unimportant to diagnose, because ZFS kept fixing the problem for us, automatically! The thing is, we?d have never known about this underlying hardware fault if ZFS?s 128-bit checksums weren?t able to reduce the chances of undetected error to practically-impossible levels. Since ZFS knows, by those same 128-bit hashes, which copy of the data is uncorrupted, it fixed it automatically for us each time for years on end. I doubt any hardware RAID system you favor would have fared as well. *That?s* uptime. :) 5. Hardware RAID made sense back when a PC motherboard rarely had more than 2 hard disk controller ports, and those were shared a single IDE lane. In those days, CPUs were slow enough that calculating parity was really costly, and hard drives were small enough that 8+ disk arrays were often required just to get enough space. Now that you can get 10+ SATA ports on a mobo, parity calculation costs only a tiny slice of a single core in your multicore CPU, and a mirrored pair of multi-terabyte disks is often plenty of space, hardware RAID is increasingly being pushed to the margins of the server world. Software RAID doesn?t have port count limits at all. With hardware RAID, I don?t buy a 4-port card when a 2-port card will do, because that costs me $100-200 more. With software RAID, I can usually find another place to plug in a drive temporarily, and that port was ?free? because it came with the PC. This matters when I have to replace a disk in my hardware RAID mirror, because now I?m out of ports. I have to choose one of the disks to drop out of the array, losing all redundancy before the recovery even starts, because I need to free up one of the two hardware connectors for the new disk. That?s fine when the disk I?m replacing is dead, dead, dead, but that isn?t usually the case in my experience. Instead, the disk I?m replacing is merely *dying*, and I?m hoping to get it replaced before it finally dies. What that means in practice is that with software RAID, I can have an internal mirror, then temporarily connect a replacement drive in a USB or Thunderbolt disk enclosure. Now the resilver operation proceeds with both original disks available, so that if we find that the ?good? disk in the original mirror has a bad sector, too, the software RAID system might find that it can pull a good copy from the ?bad? disk, saving the whole operation. Only once the resilver is complete do I have to choose which disk to drop out of the array in a software RAID system. If I choose incorrectly, the software RAID stops work and lets me choose again. With hardware RAID, if I choose incorrectly, it?s on the front end of the operation instead, so I?ll end up spending 8-12 hours to create a redundant copy of ?Wrong!? Bottom line: I will not shed a tear when my last hardware RAID goes away.
> > >IMHO, Hardware raid primarily exists because of Microsoft Windows and VMware esxi, neither of which have good native storage management. Because of this, it's fairly hard to order a major brand (HP, Dell, etc) server without raid cards. Raid cards do have the performance boost of nonvolatile write back cache. Newer/better cards use supercap flash for this, so battery life is no longer an issue That said, make my Unix boxes zfs or mdraid+xfs on jbod for all the reasons previously given.>
>> >> >> > IMHO, Hardware raid primarily exists because of Microsoft Windows and > VMware esxi, neither of which have good native storage management. > > Because of this, it's fairly hard to order a major brand (HP, Dell, etc) > server without raid cards. > > Raid cards do have the performance boost of nonvolatile write back cache. > Newer/better cards use supercap flash for this, so battery life is noThe supercaps may be more stable than batteries but they can also fail. Since I had to replace the supercap of a HP server I know they also do fail. That's why they are also built as a module connected to the controller :-) As for the write back cache, good SSDs do the same with integrated cache and supercaps, so you really don't need the RAID controller to do it anymore.> longer an issue > > That said, make my Unix boxes zfs or mdraid+xfs on jbod for all the > reasons > previously given.Same here, after long years of all kind of RAID hardware, I'm happy to run everything on mdraid+xfs. Software RAID on directly attached U.2 NMVe disks is all we use for new servers. It's fast, stable and also important, still KISS. Regards, Simon
Warren Young wrote on 6/28/2019 6:53 PM:> On Jun 28, 2019, at 8:46 AM, Blake Hudson <blake at ispn.net> wrote: >> Linux software RAID?has only decreased availability for me. This has been due to a combination of hardware and software issues that are are generally handled well by HW RAID controllers, but are often handled poorly or unpredictably by desktop oriented hardware and Linux software. > Would you care to be more specific? I have little experience with software RAID, other than ZFS, so I don?t know what these ?issues? might be.I've never used ZFS, as its Linux support has been historically poor. My comments are limited to mdadm. I've experienced three faults when using Linux software raid (mdadm) on RH/RHEL/CentOS and I believe all of them resulted in more downtime than would have been experienced without the RAID: ??? 1) A single drive failure in a RAID4 or 5 array (desktop IDE) caused the entire system to stop responding. The result was a degraded (from the dead drive) and dirty (from the crash) array that could not be rebuilt (either of the former conditions would have been fine, but not both due to buggy Linux software). ??? 2) A single drive failure in a RAID1 array (Supermicro SCSI) caused the system to be unbootable. We had to update the BIOS to boot from the working drive and possibly grub had to be repaired or reinstalled as I recall (it's been a long time). ??? 3) A single drive failure in a RAID 4 or 5 array (desktop IDE) was not clearly identified and required a bit of troubleshooting to pinpoint which drive had failed. Unfortunately, I've never had an experience where a drive just failed cleanly and was marked bad by Linux software RAID and could then be replaced without fanfare. This is in contrast to my HW raid experiences where a single drive failure is almost always handled in a reliable and predictable manner with zero downtime. Your points about having to use a clunky BIOS setup or CLI tools may be true for some controllers, as are your points about needing to maintain a spare of your RAID controller, ongoing driver support, etc. I've found the LSI brand cards have good Linux driver support, CLI tools, an easy to navigate BIOS, and are backwards compatible with RAID sets taken from older cards so I have no problem recommending them. LSI cards, by default, also regularly test all drives to predict failures (avoiding rebuild errors or double failures).
On July 1, 2019 8:56:35 AM CDT, Blake Hudson <blake at ispn.net> wrote:> > >Warren Young wrote on 6/28/2019 6:53 PM: >> On Jun 28, 2019, at 8:46 AM, Blake Hudson <blake at ispn.net> wrote: >>> Linux software RAID?has only decreased availability for me. This has >been due to a combination of hardware and software issues that are are >generally handled well by HW RAID controllers, but are often handled >poorly or unpredictably by desktop oriented hardware and Linux >software. >> Would you care to be more specific? I have little experience with >software RAID, other than ZFS, so I don?t know what these ?issues? >might be. > >I've never used ZFS, as its Linux support has been historically poor. >My >comments are limited to mdadm. I've experienced three faults when using > >Linux software raid (mdadm) on RH/RHEL/CentOS and I believe all of them > >resulted in more downtime than would have been experienced without the >RAID: > ??? 1) A single drive failure in a RAID4 or 5 array (desktop IDE) >caused the entire system to stop responding. The result was a degraded >(from the dead drive) and dirty (from the crash) array that could not >be >rebuilt (either of the former conditions would have been fine, but not >both due to buggy Linux software). >??? 2) A single drive failure in a RAID1 array (Supermicro SCSI) caused > >the system to be unbootable. We had to update the BIOS to boot from the > >working drive and possibly grub had to be repaired or reinstalled as I >recall (it's been a long time). >??? 3) A single drive failure in a RAID 4 or 5 array (desktop IDE) was >not clearly identified and required a bit of troubleshooting to >pinpoint >which drive had failed. > >Unfortunately, I've never had an experience where a drive just failed >cleanly and was marked bad by Linux software RAID and could then be >replaced without fanfare. This is in contrast to my HW raid experiences > >where a single drive failure is almost always handled in a reliable and > >predictable manner with zero downtime. Your points about having to use >a >clunky BIOS setup or CLI tools may be true for some controllers, as are > >your points about needing to maintain a spare of your RAID controller, >ongoing driver support, etc. I've found the LSI brand cards have good >Linux driver support, CLI tools, an easy to navigate BIOS, and are >backwards compatible with RAID sets taken from older cards so I have no > >problem recommending them. LSI cards, by default, also regularly test >all drives to predict failures (avoiding rebuild errors or double >failures).+1 in favor of hardware RAID. My usual argument is: in case of hardware RAID dedicated piece of hardware runs a single task: RAID function, which boils down to simple, short, easy to debug well program. In case of software RAID there is no dedicated hardware, and if kernel (big and buggy code) is panicked, current RAID operation will never be finished which leaves the mess. One does not need computer science degree to follow this simple logic. Valeri> > >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centos++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On Jul 1, 2019, at 7:56 AM, Blake Hudson <blake at ispn.net> wrote:> > I've never used ZFS, as its Linux support has been historically poor.When was the last time you checked? The ZFS-on-Linux (ZoL) code has been stable for years. In recent months, the BSDs have rebased their offerings from Illumos to ZoL. The macOS port, called O3X, is also mostly based on ZoL. That leaves Solaris as the only major OS with a ZFS implementation not based on ZoL.> 1) A single drive failure in a RAID4 or 5 array (desktop IDE)Can I take by ?IDE? that you mean ?before SATA?, so you?re giving a data point something like twenty years old?> 2) A single drive failure in a RAID1 array (Supermicro SCSI)Another dated tech reference, if by ?SCSI? you mean parallel SCSI, not SAS. I don?t mind old tech per se, but at some point the clock on bugs must reset.> We had to update the BIOS to boot from the working driveThat doesn?t sound like a problem with the Linux MD raid feature. It sounds like the system BIOS had a strange limitation about which drives it was willing to consider bootable.> and possibly grub had to be repaired or reinstalled as I recallThat sounds like you didn?t put GRUB on all disks in the array, which in turn means you probably set up the RAID manually, rather than through the OS installer, which should take care of details like that for you.> 3) A single drive failure in a RAID 4 or 5 array (desktop IDE) was not clearly identified and required a bit of troubleshooting to pinpoint which drive had failed.I don?t know about Linux MD RAID, but with ZFS, you can make it tell you the drive?s serial number when it?s pointing out a faulted disk. Software RAID also does something that I haven?t seen in typical PC-style hardware RAID: marry GPT partition drive labels to array status reports, so that instead of seeing something that?s only of indirect value like ?port 4 subunit 3? you can make it say ?left cage, 3rd drive down?.