Hi there, I had a bit of a nasty experience this week with ataraid(4). I thought I would summarize the issues I ran into so hopefully others can benefit from my nasty experience, should they experience catastophic failure of a Pseudo-RAID. I was surprised that I was unable to find much in the way of similar experience in forums. Luckily, I didn't lose any data, thanks to mr ports/sysutils/scan_ffs. Bog simple setup. Gigabyte VM900M motherboard, Intel Core Duo T4300. JMicron JMB363 controller, two SATA ports, RAID mode. Two Seagate 160GB drives. I'll skip the stuff about the strange loop I got into with having to recreate the MBR and disklabels after they got trashed -- suffice to say, BACK THEM UP... BACK THEM UP... --- 1. atacontrol rebuild. There are a few issues here. I'm partly to blame here for not reading the documentation in the handbook about the rebuild process -- however -- hilarity ensued. Following the rebuild procedure in the Handbook, if you try to run "atacontrol rebuild" from the FreeBSD 7.1 LiveFS, it'll break. I ran it thinking that it had some kind of magic in it which I couldn't achieve using dd alone, which is partly true, but also partly not true. It has a hardcoded path to /usr/bin/nice, which it runs using the system() libc call, and unfortunately, the LiveFS is rooted at /mnt2. It does this after it issues an ioctl() to tell the ATA driver to copy and rewrite the meta-data to the new "spare" drive. Ooops. At this point the state of the array is inconsistent. "atacontrol status" will report the array as REBUILDING, despite the fact that the dd job never kicked off. Because the metadata has now been rewritten, and the ataraid driver has entered a REBUILDING state, you can't stop it, and it won't rebuild. I also found that the default dd block size it uses, 1m, didn't work with my drives -- I had to dd manually with a 64KB block size to get things to work, otherwise I got lots and lots of ATA read/write errors related to trying to write beyond the last part of the disk. The drives themselves are fine, though. HOMEWORK: "atacontrol rebuild" needs to be taught to play nice when run in a catastrophic recovery situation; the path stuff needs to be fixed, and perhaps some magic should be added to allow the metadata to be zapped when Really Bad Stuff happens. --- 2. raid metadata, and drive sizes. OK, the tradeoff with ataraid is that it is pseudo-raid. That's understood, however, it's easy for the metadata to be downright out of sync with After my bad experience with "atacontrol rebuild" from the LiveFS, to trick FreeBSD back into understanding that the array was in fact degraded, I had to read the ataraid driver code to figure out which LBA it was expecting to see the metadata at, and then wipe that out with dd. It doesn't help that the drives themselves are of different sizes. So. Imagine the hilarity when I just swap the drives and try to rebuild the array. Ooops. HOMEWORK: Is there a way to use the system partition stuff e.g. ATA SET MAX ADDRESS to get around this? Obviously it would mean losing a few sectors at the end of the drive, but it's a small price to pay for sanity with Pseudo-RAID. --- 2. RAID BIOS. I have been using a JMicron 36x PCI-e controller. Unfortunately, when stuff like the MBR is broken, it says nothing informative -- it will just skip to the next INT 19h handler. This is more something which should be thrown at the BIOS vendors -- I don't believe there isn't enough space in there to print a message which says "The drive geometry is invalid". HOMEWORK: Someone needs to throw a wobbly at the vendors. --- 3. fdisk and drive geometry. The root cause of my boot failure, it turned out, was that the drive geometry was "wrong" for the JMicron SATA RAID BIOS. It turns out sysinstall's guess at the drive geometry is "right" for LBA mode (C/H/S n/255/63), and thus "right" for my SATA controller BIOS. fdisk, however, wants to use a C/H/S of n/16/63 by default. Profanity ensues. HOMEWORK: Why does fdisk still assume 16 heads... ? Perhaps we should have a switch to tell it to use the LBA-style C/H/S converted geometry? --- Redux I understand why organisations pay good money for hardware RAID controllers, but given that time has moved on, this is kinda effed up. I shouldn't, in an ideal world, have to bust hours of ninja admin moves just to get a single RAID-ed home server back up and running. I also now understand that I can't rely on RAID alone to keep the integrity of my own data -- there is no substitute for backups, I just wish there were realistic backup solutions for individuals trying to do things with technology right now, without paying over the odds, or being ripped off. A "good enough" cheap solution is what individuals often end up using, to get things going in life with minimal resources or wastage. I hope others benefit from my experience. cheers BMS P.S. Big thanks to Doug White for helping me to get /etc recreated after bits of it got trashed.
Bruce M Simpson wrote:> Following the rebuild procedure in the Handbook, if you try to run > "atacontrol rebuild" from the FreeBSD 7.1 LiveFS, it'll break. I ran it > thinking that it had some kind of magic in it which I couldn't achieve > using dd alone, which is partly true, but also partly not true. > > It has a hardcoded path to /usr/bin/nice, which it runs using the > system() libc call, and unfortunately, the LiveFS is rooted at /mnt2. It > does this after it issues an ioctl() to tell the ATA driver to copy and > rewrite the meta-data to the new "spare" drive.I always use the LiveFS cdromin the following way: chroot /mnt2 set -o emacs mount -t devfs devfs /dev export PAGER=more so that i am exactly in the same situation as on a real machine.> HOMEWORK: Why does fdisk still assume 16 heads... ? Perhaps we should > have a switch to tell it to use the LBA-style C/H/S converted geometry?FreeBSD fdisk is a calamity.> I also now understand that I can't rely on RAID alone to keep the > integrity of my own data -- there is no substitute for backups, I just > wish there were realistic backup solutions for individuals trying to do > things with technology right now, without paying over the odds, or being > ripped off.A solution to keep backups without "paying over the odds" is to backup your data to another hard disk. This doesn't mean using RAID mirror, it means using rsync or similar to copy data regularly. It is preferable that this occurs on another machine, and even better in another geographic location. But even if the backup disk is on the same machine, this protects againts inadvertent deletions of a file or RAID misbehaviors. The risk being that some hardware problem simultaneously corrupts the main storage and the backup. Modern features such as UFS snapshots or better ZFS snapshots allow to produce better backups. -- Michel TALON
Bruce M Simpson wrote: > [...] > I also now understand that I can't rely on RAID alone to keep the > integrity of my own data -- there is no substitute for backups, That's 100% true. RAID -- even true hardware RAID -- is *never* a substitute for backup. Consider: - Fire. - Theft. - Lightning strike or power surge. - The cleaning woman knocks the tower over. - ... In all of those cases, chances are that both disks in the RAID mirror die. Also, as you mentioned, it doesn't protect agains human errors ("rm *" in the wrong directory and similar things). Backups should always be made to media that can be taken offline and stored in a safe place: Tape, optical disks, hard disks in swappable drive trays, or external drives. > I just wish there were realistic backup solutions for individuals > trying to do things with technology right now, without paying over > the odds, or being ripped off. I think the best solution is to use standard hard disks in external enclosures (USB, Firewire, eSATA). They're fairly cheap these days and easy to handle. Furthermore they're quite fast. For a very simple backup solution I recommend to buy at least two USB disks and use them alternating, so you still have a good backup on the shelf when something catastrophic happens while the other one is connected to your computer. As for the backup software, I simply use "cpdup" (from ports/sysutils/cpdup) to duplicate the file systems. It's fast (copies only changed files), and you can easily restore files after an "rm *" by simply copying them back with cp. (The only gotcha is that cpdup doesn't preserve "holes" in sparse files, but cases are rare where you need that.) Of course you can also treat the disks like tapes and dump(8) to the raw device (or tar, cpio, whatever), if you prefer. YMMV, of course. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "We will perhaps eventually be writing only small modules which are identi- fied by name as they are used to build larger ones, so that devices like indentation, rather than delimiters, might become feasible for expressing local structure in the source language." -- Donald E. Knuth, 1974
Bruce Simpson
2009-Jul-23 18:57 UTC
ataraid's revenge! (Was: Re: A nasty ataraid experience.)
6 months on, ataraid(4) did it again. This time, I was lucky -- I caught in in time, but the damage to the filesystem meant having to use fsdb to NULL out the affected inodes; mounting read-only, tarring, and untarring across the network, after a newfs, let me save the affected partition. All I was doing at the time was srm'ing a few sensitive files; all the processes wedged in WCHAN getblk. It seems ataraid(4) is not robust against temporary drive/controller problems. The SMART logs on the affected array drives all check out just fine, there are no bad block remaps. So, time to either buy a hardware RAID controller, or move to ZFS...