On Jul 1, 2019, at 7:56 AM, Blake Hudson <blake at ispn.net> wrote:> > I've never used ZFS, as its Linux support has been historically poor.When was the last time you checked? The ZFS-on-Linux (ZoL) code has been stable for years. In recent months, the BSDs have rebased their offerings from Illumos to ZoL. The macOS port, called O3X, is also mostly based on ZoL. That leaves Solaris as the only major OS with a ZFS implementation not based on ZoL.> 1) A single drive failure in a RAID4 or 5 array (desktop IDE)Can I take by ?IDE? that you mean ?before SATA?, so you?re giving a data point something like twenty years old?> 2) A single drive failure in a RAID1 array (Supermicro SCSI)Another dated tech reference, if by ?SCSI? you mean parallel SCSI, not SAS. I don?t mind old tech per se, but at some point the clock on bugs must reset.> We had to update the BIOS to boot from the working driveThat doesn?t sound like a problem with the Linux MD raid feature. It sounds like the system BIOS had a strange limitation about which drives it was willing to consider bootable.> and possibly grub had to be repaired or reinstalled as I recallThat sounds like you didn?t put GRUB on all disks in the array, which in turn means you probably set up the RAID manually, rather than through the OS installer, which should take care of details like that for you.> 3) A single drive failure in a RAID 4 or 5 array (desktop IDE) was not clearly identified and required a bit of troubleshooting to pinpoint which drive had failed.I don?t know about Linux MD RAID, but with ZFS, you can make it tell you the drive?s serial number when it?s pointing out a faulted disk. Software RAID also does something that I haven?t seen in typical PC-style hardware RAID: marry GPT partition drive labels to array status reports, so that instead of seeing something that?s only of indirect value like ?port 4 subunit 3? you can make it say ?left cage, 3rd drive down?.
I haven't been following this thread closely, but some of them have left me puzzled. 1. Hardware RAID: other than Rocket RAID, who don't seem to support a card more than about 3 years (i used to have to update and rebuild the drivers), anything LSI based, which includes Dell PERC, have been pretty good. The newer models do even better at doing the right thing. 2. ZFS seems to be ok, though we were testing it with an Ubuntu system just a month or so ago. Note: ZFS with a zpoolZ2 - the equivalent of RAID 6, which we set up using the LSI card set to JBOD - took about 3 days and 8 hours for backing up a large project, while the same o/s, but with xfs on an LSI-hardware RAID 6, took about 10 hours less. Hardware RAID is faster. 3. Being in the middle of going through three days of hourly logs and the loghost reports, and other stuff, from the weekend (> 600 emails), I noted that we have something like 50 mdraids, and we've had very little trouble with them, almost all are either RAID 1 or RAID 6 (we may have a RAID 5 left), except for the system that had a h/d fail, and another starting to through errors (I suspect the server itself...). The biggest issue for me is that when one fails, "identify" rarely works, which means use smartctl or MegaCli64 (or the lsi script) to find the s/n of the drive, then guess... and if that doesn't work, bring the system down to find the right bloody bad drive. But... they rebuild, no problems. Oh, and I have my own workstation at home on a mdraid 1. mark
Speaking of ZFS, got a weird one: we were testing ZFS (ok, it was on Ubuntu, but that shouldn't make a difference, I would think). and I've got a zpool z2. I pulled one drive, to simulate a drive failure, and it rebuilt with the hot spare. Then I pushed the drive I'd pulled back in... and it does not look like I've got a hot spare. zpool status shows config: NAME STATE READ WRITE CKSUM export1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 sda ONLINE 0 0 0 spare-1 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdl ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0 spares sdl INUSE currently in use Does anyone know what I need to do to make the spare sdl back to being just a hot spare? mark
On 2019-07-01 10:10, mark wrote:> I haven't been following this thread closely, but some of them have left > me puzzled. > > 1. Hardware RAID: other than Rocket RAID, who don't seem to support a card > more than about 3 years (i used to have to update and rebuild the > drivers), anything LSI based, which includes Dell PERC, have been pretty > good. The newer models do even better at doing the right thing. > > 2. ZFS seems to be ok, though we were testing it with an Ubuntu system > just a month or so ago. Note: ZFS with a zpoolZ2 - the equivalent of RAID > 6, which we set up using the LSI card set to JBOD - took about 3 days and > 8 hours for backing up a large project, while the same o/s, but with xfs > on an LSI-hardware RAID 6, took about 10 hours less. Hardware RAID is > faster. > > 3. Being in the middle of going through three days of hourly logs and the > loghost reports, and other stuff, from the weekend (> 600 emails), I noted > that we have something like 50 mdraids, and we've had very little trouble > with them, almost all are either RAID 1 or RAID 6 (we may have a RAID 5 > left), except for the system that had a h/d fail, and another starting to > through errors (I suspect the server itself...). The biggest issue for me > is that when one fails, "identify" rarely works, which means use smartctl > or MegaCli64 (or the lsi script) to find the s/n of the drive, then > guess... and if that doesn't work, bring the system down to find the right > bloody bad drive.In my case I spend a bit of time before I roll out the system, so I know which physical drive (or which tray) the controller numbers with which number. They stay the same over the life of the system, those are just physical connections. Then when the controller tells drive number "N" failed, I know which tray to pull. Valeri> But... they rebuild, no problems. > > Oh, and I have my own workstation at home on a mdraid 1. > > mark > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >-- ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
Warren Young wrote on 7/1/2019 9:48 AM:> On Jul 1, 2019, at 7:56 AM, Blake Hudson <blake at ispn.net> wrote: >> I've never used ZFS, as its Linux support has been historically poor. > When was the last time you checked? > > The ZFS-on-Linux (ZoL) code has been stable for years. In recent months, the BSDs have rebased their offerings from Illumos to ZoL. The macOS port, called O3X, is also mostly based on ZoL. > > That leaves Solaris as the only major OS with a ZFS implementation not based on ZoL. > >> 1) A single drive failure in a RAID4 or 5 array (desktop IDE) > Can I take by ?IDE? that you mean ?before SATA?, so you?re giving a data point something like twenty years old? > >> 2) A single drive failure in a RAID1 array (Supermicro SCSI) > Another dated tech reference, if by ?SCSI? you mean parallel SCSI, not SAS. > > I don?t mind old tech per se, but at some point the clock on bugs must reset.Yes, this experience spans decades and a variety of hardware. I'm all for giving things another try, and would love to try ZFS again now that it's been ported to Linux. As far as mdadm goes, I'm happy with LSI hardware RAID controllers and have no desire to retry mdadm at this time. I have enough enterprise class drives fail on a regular basis (I manage a reasonable volume) that the predictability gained by standardizing on one vendor for HW RAID cards is worth a lot. I have no problem recommending LSI cards to folks that feel the improved availability outweighs the cost (~$500). This would assume those folks have already covered other aspects of availability and redundancy first (power, PSUs, cooling, backups, etc).
On Jul 1, 2019, at 9:10 AM, mark <m.roth at 5-cent.us> wrote:> > ZFS with a zpoolZ2You mean raidz2.> which we set up using the LSI card set to JBODSome LSI cards require a complete firmware re-flash to get them into ?IT mode? which completely does away with the RAID logic and turns them into dumb SATA controllers. Consequently, you usually do this on the lowest-end models, since there?s no point paying for the expensive RAID features on the higher-end cards when you do this. I point this out because there?s another path, which is to put each disk into a single-target ?JBOD?, which is less efficient, since it means each disk is addressed indirectly via the RAID chipset, rather than as just a plain SATA disk. You took the first path, I hope? We gave up on IT-mode LSI cards when motherboards with two SFF-8087 connectors became readily available, giving easy 8-drive arrays. No need for the extra board any more.> took about 3 days and > 8 hours for backing up a large project, while the same o/s, but with xfs > on an LSI-hardware RAID 6, took about 10 hours less. Hardware RAID is > faster.I doubt the speed difference is due to hardware vs software. The real difference you tested there is ZFS vs XFS, and you should absolutely expect to pay some performance cost with ZFS. You?re getting a lot of features in trade. I wouldn?t expect the difference to be quite that wide, by the way. That brings me back to my guess about IT mode vs RAID JBOD mode on your card. Anyway, one of those compensating benefits are snapshot-based backups. Before starting the first backup, set a ZFS snapshot. Do the backup with a ?zfs send? of the snapshot, rather than whatever file-level backup tool you were using before. When that completes, create another snapshot and send *that* snapshot. This will complete much faster, because ZFS uses the two snapshots to compute the set of changed blocks between the two snapshots and sends only the changed blocks. This is a sub-file level backup, so that if a 1 kB header changes in a 2 GB data file, you send only one block?s worth of data to the backup server, since you?ll be using a block size bigger than 1 kB, and that header ? being a *header* ? won?t straddle two blocks. This is excellent for filesystems with large files that change in small areas, like databases. You might say, ?I can do that with rsync already,? but with rsync, you have to compute this delta on each backup, which means reading all of the blocks on *both* sides of the backup. ZFS snapshots keep that information continuously as the filesystem runs, so there is nothing to compute at the beginning of the backup. rsync?s delta compression primarily saves time only when the link between the two machines is much slower than the disks on either side, so that the delta computation overhead gets swamped by the bottleneck?s delays. With ZFS, the inter-snapshot delta computation is so fast that you can use it even when you?ve got two servers sitting side by side with a high-bandwidth link between them. Once you?ve got a scheme like this rolling, you can do backups very quickly, possibly even sub-minute. And you don?t have to script all of this yourself. There are numerous pre-built tools to automate this. We?ve been happy users of Sanoid, which does both the automatic snapshot and automatic replication parts: https://github.com/jimsalterjrs/sanoid Another nice thing about snapshot-based backups is that they?re always consistent: just as you can reboot a ZFS based system at any time and have it reboot into a consistent state, you can take a snapshot and send it to another machine, and it will be just as consistent. Contrast something like rsync, which is making its decisions about what to send on a per-file basis, so that it simply cannot be consistent unless you stop all of the apps that can write to the data store you?re backing up. Snapshot based backups can occur while the system is under a heavy workload. A ZFS snapshot is nearly free to create, and once set, it freezes the data blocks in a consistent state. This benefit falls out nearly for free with a copy-on-write filesystem. Now that you?re doing snapshot-based backups, you?re immune to crypto malware, as long as you keep your snapshots long enough to cover your maximum detection window. Someone just encrypted all your stuff? Fine, roll it back. You don?t even have to go to the backup server.> when one fails, "identify" rarely works, which means use smartctl > or MegaCli64 (or the lsi script) to find the s/n of the drive, then > guess?It?s really nice when you get a disk status report and the missing disk is clear from the labels: left-1: OK left-2: OK left-4: OK right-1: OK right-2: OK right-3: OK right-4: OK Hmmm, which disk died, I wonder? Gotta be left-3! No need to guess, the system just told you in human terms, rather than in abstract hardware terms.